Trial access to the University of Cambridge WWW cache server
NOTE: Only systems within the University of Cambridge (hostnames ending
cam.ac.uk) have access to the cache server. Systems outside the
cam.ac.uk domain are not authorised to use the cache.
Table of Contents
Please read the whole of this document, not just the WWW client configuration
notes!
If you decide to try using the cache server, please send any problem reports or
comments to
webmaster@ucs.cam.ac.uk.
Announcements and other information relating to the WWW cache trial will
be made available through the local
ucam.comp.www.misc
newsgroup and the
cache status and news summary.
An
article in the
Computing Service newsletter no. 186 (April 1996)
described the purpose of a WWW proxy/cache server, and mentioned that the
Computing Service intended to set up a WWW cache server on a pilot trial basis.
See that article for more detailed background information. In summary,
though, if a WWW client ("browser") is configured to use a WWW proxy/cache
server ("cache server"), it will pass some (or all) requests for documents to
the proxy server instead of connecting direct to the server which holds
the master copy of the document. The proxy/cache server may then act as
intermediary ("proxy") and pass on the request, or it may instead send to the
client a copy of the document which it has saved ("cached") following an
earlier request for the same document.
The server takes care to ensure that documents supplied from the cache are
"recent" (see below for further details), though changes to the master copies
may not be noticed immediately. Documents which cannot be cached usefully or
safely (e.g. search results, output from forms, etc.) will not be retained in
the cache.
Overall, this should result in documents being received faster on average,
and reduce the volume of WWW traffic over busy international network links.
Since international network capacity to and from JANET (the UK academic
network) is often overloaded, there is strong pressure for all JANET sites to
use WWW cache servers.
The aims of the pilot trial are
- Identify any problems which need to be resolved before making the cache
available as a general service.
- Resolve those problems, or document them as limitations if they are
(at least currently) unavoidable.
- Collect performance and usage data to establish the scale of system
which would be needed to run a cache server adequate for use by the entire
University.
- As far as possible, provide reliable WWW access to sites for users of
the pilot WWW cache server.
While we would like as many people as possible using the cache (not least since
its effectiveness as a cache will be poor if usage is low), users during the
pilot trial should be willing to use the cache whenever possible, but know how
to disable use of the cache temporarily (connecting direct to the target
servers instead) if there are problems with the cache server (or with the
national HENSA cache, to which the local server may pass requests for documents
that are not available from the local cache).
In consequence, the pilot trial is aimed at individuals, and we would
not recommend configuring software on departmental or college systems to
use the cache automatically for all users, potentially leaving them frustrated
if there are problems with the cache, not realising they could bypass it or
even that they are using it, and (for example) that announcements of cache
server downtime affect them. Encouraging the more experienced or adventurous
users of departmental or college systems to read this document and then try
using the cache would be a much safer approach.
Anyone who decides to use the cache server is advised to watch for
trial-related announcements on the
ucam.comp.www.misc
newsgroup. General news and information about the pilot trial, including
details of cache server configuration changes, can be found in the
cache status and news summary.
Comments to
webmaster@ucs.cam.ac.uk.
are welcome on whether changes reported in either place seem to be
beneficial, neutral, or detrimental in their effect.
The cache server is intended to be available at all times, subject to the usual
exclusions such as hardware or software failure, disruption of network access
(whether scheduled or otherwise), and necessary downtime for software or
hardware maintenance. Problems arising outside working hours may not be noticed
or rectified before the next working day.
Scheduled downtime (for the WWW cache server in particular, or other
services which may affect it or access to it) will be announced in the same
ways as for other Computing Service systems, as will details of any extensive
unscheduled downtime.
One exception, during the pilot trial, is that in order to deal
quickly with any problems that come to light, system restarts may not be
announced in advance in the same way as for other Computing Service systems
(and hence possibly delayed). Where urgent, such restarts may be done at any
time; otherwise, they will be restricted to the advertised "vulnerable periods"
during which maintenance activities for Computing Service systems are normally
scheduled. System restarts should normally cause only a 2-3 minute
interruption to service. See
netnews
for details of the standard "vulnerable periods" and announcements of scheduled
downtime.
To give some idea of what to expect if you use the cache server during the
trial:
- The cache server's configuration is likely to change quite frequently
during the trial, and mistakes are possible...
- When asked for a document that is not in its cache, the server may pass
requests (in particular, those for sites outside JANET) to the national
HENSA WWW cache. In general, that should be beneficial, but any problems
affecting the HENSA cache may in consequence affect access through the
local cache. The HENSA cache comprises a number of systems, so it is
possible (e.g. if one HENSA system is misbehaving but the others are
working) for some requests to work while other requests fail - though that's
only one of many possible explanations for such failures. Reloading the
document, or just its images, may help. Alternatively, disabling use of
the cache server temporarily may be necessary to get around the problem -
or to discover that the problem is not cache-related.
- If retrieval of a document fails or is interrupted by the user, the WWW
client (or possibly the cache server) may be left with an incomplete copy.
The client's "reload" facility (command, menu item, etc.) can be used to
force the document to be re-fetched. That does not always have the desired
effect, however. In particular, the Reload button in Netscape Navigator
seems to fetch a new copy of the document only if its current copy is
out-of-date (not simply because it is incomplete!). Holding down the shift
key while selecting Reload should load a current, fresh copy of the
document.
- Using a proxy/cache server has the effect that requests to a remote WWW
server appear to come from the cache server, not the system on which the
WWW client program is running. This may result in refusal of access to a
resource for which access is controlled according to the system from which
the request is received. This can be circumvented by temporarily disabling
use of the cache, or by configuring the client ("no proxy" setting) to
connect direct to the system concerned, rather than using the cache server.
- URLs containing an abbreviated hostname, e.g. http://www/ rather than
http://www.somedept.cam.ac.uk/, are likely to fail or not work as expected.
The hostname will be interpreted in the context of the cache server's naming
domain, e.g. cam.ac.uk or hensa.ac.uk, rather than the client system's
domain (e.g. somedept.cam.ac.uk). Note that even if the client's "no proxy"
setting should bypass the cache for access to JANET (ac.uk) or University
(cam.ac.uk) systems, abbreviated hostnames may not be recognised as matching
the no proxy setting, and hence may be passed on to the cache.
If you are familiar with your WWW client's configuration facilities, this
summary may be sufficient to get started.
- If you are using Netscape Navigator 2.0 or later (should be at least 2.02
to avoid various security problems!), just configure
http://www.cam.ac.uk/pilot-trial.pac as its proxy auto-configuration
file. You cannot view that directly as a normal document - Netscape will
complain at being sent an unexpected configuration file - but you can
view the auto-configuration file as text
if you want to see its contents (which is likely to vary during the
trial).
- For other clients (and for Netscape 2 or later if you prefer to use manual
configuration), you should set their proxy/cache configuration to use port 8080
on host wwwcache.cam.ac.uk, for http, ftp, and gopher connections. In addition,
if possible you should configure them to connect direct (not use the cache) for
systems within JANET (which may need to be specified as *.ac.uk, .ac.uk,
or just ac.uk, depending on the client software).
![[under construction]](../../icons/local/construction2.gif)
The following documents provide information about configuring various
widely-used WWW clients (browsers). If the one you are using is not covered,
you will need to work out how to configure it using the information in the
"quick start" section above in conjunction with the client program's own
documentation or "help" information.
- lynx
(in Computing Service leaflet G62: Using the Lynx WWW Browser)
Documentation for other browsers is in preparation, but has been delayed.
For now, the documentation for the Manchester Computing (University of
Manchester & UMIST) cache server may provide sufficient guidance BUT
you MUST use the address details for the University of Cambridge cache
server instead of the addresses shown in Manchester's examples (i.e.
"wwwcache.cam.ac.uk" and "8080" should be specified, in a format appropriate to
the browser). Read the following notes before following the link to
Manchester's documentation!
The differences between Manchester and Cambridge configurations are:
- Where the Manchester examples show the URL-style proxy server address
"http://wwwcache.mcc.ac.uk:3128/", Cambridge users should configure their
browser to use "http://wwwcache.cam.ac.uk:8080/".
- Where the browser requires the hostname and port number specified
separately, Cambridge browsers should use hostname wwwcache.cam.ac.uk
and port number 8080, rather than wwwcache.mcc.ac.uk and 3128.
- For Netscape V2 or later, auto-configuration script URL
http://www.cam.ac.uk/pilot-trial.pac should be specified in the Netscape
proxy configuration for Cambridge browsers, rather than the script
named in Manchester's examples.
- Where Manchester specifies e.g. "mcc.ac.uk,umist.ac.uk,man.ac.uk" as
a "no proxy" or "bypass proxy for" entry, browsers in the University of
Cambridge should use just ac.uk.
Manchester Computing's documentatation about their cache server and how
to configure web browsers to use the cache can be found at
http://www.mcc.ac.uk/Cache/.
- The server is configured to act as proxy for http:, ftp:, and gopher:
URLs. While it could also act as proxy for SSL-based (secure socket layer)
URLs (https: and snews:), it could not cache the resulting documents and
it would therefore be of no benefit. (That facility is only relevant when
the server is part of a network "firewall", controlling access between an
internal network and the Internet.)
- The server is configured to cache any cacheable documents which it
retrieves. Non-cacheable documents include those retrieved via HTTP (i.e.
WWW documents as opposed to FTP or gopher) for which no last modification
time or expiry time is supplied with the document, and also the results of
HTTP "POST" requests (e.g. submitting WWW forms).
- The current version of the HTTP protocol does not provide reliable
mechanisms for determining when a cached document is outdated, beyond a
rarely used expiry time facility. In most cases, the server applies an
upper age limit (configurable) after which it will check if the document is
still current, in combination with a "last modified factor", a proportion of
the document's age since it was last modified.
This has the effect that frequently or recently modified documents will be
checked sooner than documents that have not changed for many days, but in
all cases a check will be made sooner or later, and with a definite upper
limit on "later". Finding the right balance between user expectations of
currency and avoiding the overhead and delay of a check for every retrieval
is one of the aims of this trial. Note, however, that when a document is
retrieved through a sequence of caches (e.g. ours and the HENSA cache, maybe
others farther afield) the configuration of each cache contributes to when
the server named in the URL will actually be consulted to check that a cached
document is current. The only way to be sure of receiving a current copy of
a cacheable document is to use the WWW client's Reload facility, assuming that
will use the HTTP protocol facility to force caches to fetch a current copy
(which requires shift-Reload in Netscape Navigator).
- The FTP and gopher protocols don't provide any usable information about
when a file or document was last modified, or an expiry time. In consequence,
the cache will simply return copies from the cache until they reach the
configured upper limit on age, and then fetch new copies when next needed.
- Requests proxied by the cache server will be seen as originating there,
which may have implications for access to resources for which access is
controlled by the hostname or address of the originating system. That affects
both clients and servers. Clients authorised to access a resource may instead
be denied access if the request is routed via the local cache server
(or any other cache server that is not itself authorised to access the
resource), as the request appears to come from the cache. Servers which
allow a cache server (any cache server) to access resources that are
access-controlled by client hostname or address will in fact be giving
access to all authorised clients of the cache.
- Requests directed to servers outside JANET (*.ac.uk) will normally be
passed on to the national HENSA cache, if the document cannot be returned
from local cache cache. At present, the cache server simply reports the
error if it cannot retrieve a document via the HENSA cache. Such failure
may be obvious e.g. if details of the error are displayed instead of the
expected document, or hidden (e.g. "broken image" icon shown where an image
file could not be retrieved). The HENSA cache actually comprises a group of
systems, so it is possible for a document and some images to be retrieved
successfully, while others may fail because the requests went to a HENSA
cache system that was not working. The local cache may be reconfigured to
bypass the HENSA cache at times when it is not working reliably.
Note that "partial failure" (e.g. some images or other subsidiary
documents not fetched) may also occur for other reasons, e.g. incorrect
links, a server is inaccessible or overloaded, or timeouts due to network
congestion.
- The Netscape 2 proxy auto-configuration file is liable to change from
time to time in order to experiment with alternative configurations.
The pilot trial is using a Sun SPARCstation 20/151 system running Solaris 2.5,
with 128MB memory and 8GB of disc, and using the Netscape Proxy Server V1.12.
webmaster@ucs.cam.ac.uk; last
updated 9 May 1997