Caching Matters

Last modified

"Okay, okay, look. They're just standing there and talking, okay? That's all they're doing. That's all they ever do; is just stand there and talk. That's what they were doing last week, that's what they were doing when you asked me 5 minutes ago. So 5 minutes from now when you ask me 'What are they doing?' my answer's going to be, 'they're still just talking and they're still just standing there!'". - Red vs. Blue, Episode 1 

Caching of various sorts is used in all sorts of computing tasks when the cost of obtaining or calculating a particular object is more expensive than the cost of retrieving it from the cache.

Caching is an inherent part of REST and an inherent part of the web.

Caching is not an Optimisation

Caching is often thought of as an optimisation, as in this scenario;

"Our system works great, but it's too slow."

"Let's put a reverse proxy cache in front of it."

"Hurray, now it's like 10 times faster!"

When one is looking at an existing system with performance and/or scalability issues, caching is indeed one of the first things to consider and, if applicable, likely to give you an immense bang for buck.

However, caching is not an optimisation that it is only worth spending developer time on when an existing system isn't performant or scalable enough.

Firstly, it might be too late. Web caching systems have to deal with a variety of situations (see below) and as such need to apply some intelligence to what they know about the resources and representations in question. If you can't tell them what they need to know, they can't cache well.

Secondly, optimisations are about the small efficiencies. Tony Hoare's famous dictum, popularised by Donald Knuth, that "premature optimisation is the root of all evil" is entirely correct and frequently misapplied. In full it is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." Caching isn't about the small efficiencies, it's about big efficiencies that increase performance and scalability (especially the latter) by orders of magnitude and can give you enough power that you can do things that aren't practical otherwise.

Beyond that, caching on the web comes from being explicit in descriptions of resources and their representations. While certain headers and status codes are particularly important to caching, all of them are saying something about a resource, representation, request or response. Apart from the Cache-Control header all of them have a use beyond caching; entity-tags are not "identifiers of entities, used in caching" they are identifiers of entities - caching is just one of their uses (partial downloads is another use with explicit support in HTTP, but beyond that entity-tags give us a basis to build any feature that requires the ability to know if we are dealing with the same or different entities in a response from a particular URI).

Why cache?

There are four main reasons that the web uses caching:

  1. Latency is not zero. What's more latency will never be zero unless we invent completely new communication technologies. We currently have no communication method that operates at faster than light speed. At light speed it takes 30ms to get from one side of the Atlantic to the other (not the furthest your web data may have to travel). Those 30ms slices add up, and that's assuming you've got a direct and perfect connection to your users. You don't, so chances are the best latency you can cope for is much longer than 30ms.
  2. Bandwidth is not cheap. Or rather it is, but we keep spending it. High bandwidth applications will be pushing capacity limits for at least a large number of users for a long time to come.
  3. No computation is always more performant and scalable than any computation. You can tweak the heaviest part of your system forever and you're never going to make it as performant as if you can just stop people from actually using it, while still giving them results.
  4. The network is not reliable. In some cases cached results can work even if a client can't connect to the server.

Those familiar with the Fallacies of Distributed Computing will recognise three of the above as counter to those fallacies (and as such, if they fallacies really are fallacies, their opposites are wisdoms). RESTs inherent support for caching is an example of REST embracing the network rather than trying to hide it. Hiding the network at an inappropriate layer can make one more likely to succumb the distributed computing's fallacies.

Caching Formerly Considered Harmful

Some developers still look askance at caching as something potentially harmful to their applications.

To some extent this is for historical reasons; the history of the way caching was added to the web had some rather negative aspects.

This no longer applies as it did, but a cache treating stale data as fresh is still harmful. However, the web uses caches whether you like it or not, and the same techniques and knowledge needed to make your site cache-friendly, when that's appropriate, is needed to keep your content out of caches when it isn't.

Pressures on Caching

Let's consider the task of a cache and the pressures upon it, and as such the pressures upon how caching works in REST and HTTP.

  1. Contacting a server uses resources and should be avoided if at all possible.
  2. Retrieving a full representation from a server uses resources and should be avoided if at all possible.
  3. Using stale data as if it's fresh leads to inaccuracy and should be avoided if at all possible.
  4. Resources can have more than one representation, of which the cache may not have seen all. It is very bad to return an inappropriate representation.

These requirements are in conflict with each other – the fewer times a server is checked for freshness (to deal with requirement 1) the greater the risk of returning stale data (failing requirement 3). Further, only a server has knowledge of all possible representations for any given resource.

REST therefore requires that control metadata is used for clients and servers to balance their requirements regarding caching. HTTP does this through some of its headers.

Types of cache

Client cache

Clients can maintain their own cached copies of different representations. They may well apply their own requirements in addition to the requirements of a given resource (for example, a client may know that it can safely deal with a stale representation even if that couldn't be said in general of the resource in question; e.g. a browser set to offline mode can assume it's user is aware of the staleness – say, as a completely hypothetical example, someone writing about web caching wants to check the HTTP headers he mentions against the standard, this is obviously not a time for a browser to refuse to show the user data until it was sure it was fresh).

Proxy Caches

A proxy cache sits somewhere between the client and the server. A common place on the network is near the gateway between a LAN and the Internet, allowing cached copies to be served amongst all users of the LAN.

Reverse Proxy Caches

A reverse proxy cache (AKA, "Gateway Cache", "Surrogate Cache" or "Web Accelerator") is a proxy cache deployed by a web administrator. They are generally configured so that they appear to the rest of the web to be the actual webserver, while the webserver is then contacted by the reverse proxy when it needs a fresh representation.

They may work with some sort of load balancing so that they actually contact one of several webservers. This works particularly well with a "shared nothing" approach on the part of the webserver, but some techniques contrary to REST increases the complexity of this task (in particular session state).

Reverse Proxy Caches may be able to benefit from some locally-configured knowledge of a server setup beyond that provided by HTTP, but that is largely out of scope here.

Public vs Private Caches

The above types of caches are divided into two types, public and private caches.

A public, or "shared" cache is used by more than one client. As such it gives a greater performance gain and a much greater scalability gain, as a user may receive cached copies of representations without ever having obtained a copy directly from the server. Reverse Proxies and the vast majority of proxies are public caches.

A private cache is only used by one client. Generally this applies only to a cache maintained by that client itself, though if you had a proxy that was only being used by one client (say you had a small LAN for your own use only and put a web proxy near the gateway) it would be possible to configure it to act as a private cache. Private caches do not offer quite as much scalability as public caches, but they do have certain important advantages over public caches:

  1. They offer immense benefits to the user of that one client.
  2. Almost every client on the visible web (that is, browsers, RSS readers, and other interactive user agents) and a very large number of those on the machine-only parts of the web (that is, spiders and clients of RESTful webservices) use a private cache of some sort. As such you can rely upon the presence of at least one cache being involved in almost everything you do on the web.
  3. With some representations it may not be appropriate for public caches to cache them or to return those representations without re-validating them but perfectly acceptable for private caches to do so (e.g. anything which can only be seen by authorised users or which changes its representation depending upon who is looking at it). It can even be appropriate for encrypted documents to be so cached (though this requires one to be very sure as to the security of the cache itself, and so is generally not done)

Because of the important differences between public and private caches some of the methods we have for controlling caching treats the two differently.

Other types of Caching on the Web

The following are not part of the same caching mechanisms described in this section. They are mentioned here for completeness, and to underline that they differ from those described here.

History List

The history list is a record of the results of previously performed actions, allowing a user to step forwards and backward throughout operations. In a browser this is used in Back/Forward and History →Go operations. In a webservice client there could be an analogous store of objects created based on previous requests still available to the application.

The history list is meant to not follow general HTTP caching mechanism, but rather act as a direct reflection of what the application has recently encountered. However, if a client no longer has a version of a representation in its history cache it has no choice but to replay the request (whether against its own cache or the webserver) to recreate that state. Because clients should not replay unsafe requests against the web, but may have no choice but to do so, many browsers will prompt a user before replaying unsafe requests such as POST.

Concurrent Download Optimisation

If a webpage includes the same image or other related resource many times then most browsers will only dereference the URI and download it once. This is more a matter of efficient management of downloading than of caching and is mentioned here just to point out that it is not affected by cache control. If for example you send a random image each time, you will still only get one such random image on each page. Because this isn't really a web caching mechanism, cache-control techniques will not prevent this.

Caching in implementations

Both client and server applications may, for various reasons, keep cached copies of various objects used in their implementation. For instance if a server application had to load and entire graph of a few thousand objects from a database for almost every request it might well make much sense to load the entire graph once into a cache of objects used throughout the application. This sort of caching does not necessarily involve the same sort of objects as web caching and may have different capabilities (if our application is the only code that will alter that graph of objects it can then use a write-through caching mechanism; this is a very efficient and reliable caching mechanism when applicable, but web caching is not a case where it can be used). As such this section does not apply to such caching, though some knowledge obtained here may be applicable in such cases also.

Controlling Caching

There are two parties involved in a client-server interaction, and as such two parties that may have particular requirements with regard to caching policies.

For the most part the server is considered the authority on the caching policy, as it is considered to be the party in a best position to describe how well a particular representation is going to reflect the state of a particular resource in the future.

However the client remains in a position to insist upon a certain degree of freshness (including insisting upon a completely transparent response with no caching allowed) or alternatively that it will accept a staler response than the server indicated would be appropriate.

These exceptions aside the server is the king of content caching. Intermediary caches will generally follow the server's lead unless over-ridden by a client, and has little say in the matter itself (there are rare exceptions for special cases).


The most efficient form of caching is one whereby the server is not queried at all. In order to do that a cache holds onto a particular resource until it has reached a particular age or a particular expiry date has passed (effectively two ways of looking at the same thing.

A cache calculates the allowed age in the following manner:

  1. If the client set limits upon the age (whether loosening or tightening the requirements set by the server) then these are used.
  2. Otherwise, if any indication was set by the server, these are honoured.
  3. Otherwise it guesses.

The mechanism for guessing is often quite good but obviously fallible. We won't explore it too much here because we're going to look at how we remove the guesswork from caching. A cache MAY make a heuristic estimate of a response's maximum age for the status codes 200, 203, 206, 300, 301 and 410. Therefore as well as it being important to use headers to ensure responses are cached as long as safely possible, it's also important to use headers to ensure responses are not cached any longer than is safe should it have one of those statuses.

One thing that's worth noting is the effect upon their being a query string in an HTTP URI has upon caching. You may have heard that a URI with a query part following a '?' isn't cached. This isn't quite true.

Now, in REST – and hence in HTTP – an identifier is opaque, and a cache should assume nothing about the resource or its representation on the basis of there being a ? present. However, historical use of GET and HEAD in ways counter to REST (specifically, ways which had side-effects) means that when HTTP/1.1 was developed there was legacy code that had to be accounted for. Therefore there was a pragmatic break from REST in ruling that caches should treat responses to such URIs as uncachable *if there is no explicit caching information*. Again, we're going to be looking at using explicit caching information, so this point won't apply to us here.


If a cache has a copy of a resource that it considers stale, it could be inaccurate in that assessment. In such a case it is wasteful to download the entire representation when it is a copy of what it already has.

Rather than do such an entire download a cache can request that a representation be sent only if it has changed. Such changes can be identified as follows:

  1. A last-modification date. If the cache knows its copy was last modified on the 21st January 2007 at 14:34:27 then it can ask to be sent a new copy only if there has been a change by that date.
  2. An Entity-Tag. This is an identifier for an entity within the namespace of the resource. In other words there will never be, and never has been, a different representation of that resource with the same entity tag. Hence if a representation with that resource is still available, it has not been changed.

In either case if the cache is safe in using it's own copy then the server will send a 304 Not Modified response (in the 3xx redirect space because it's essentially being redirected to itself). The headers for this response can include updated expiry and caching information. Therefore bytes rather than kilobytes or megabytes can be sent.

Age Calculation

A cache calculates the age of the response when it receives it (the algorithm is rather pessimistic; for instance assuming worse latency on response than request, so it will tend to over-estimate staleness rather than freshness).

The cache will add an Age header, replacing any that it received from a cache further upstream. And Age header therefore means that you are receiving something from a cache and it was that old when the cache got it.

At any point after then the current age of an entity is:

(Now – Response time) + Initial Age.

Expiration Calculation

If there is a max-age cache directive, then that is used as the maximum age for the entity.

Otherwise, and only if there is no max-age, if there is an Expires header, that sets a time limit on the age. The Expires header is compared with the Date header to produce a maximum age, so even if the server's clock is out of synch with the cache's this will work well.

If a client requests the entity, and there is no reason to avoid the cache (see below) then the cached copy will be used as long as Maximum-Age > Age.

Validation Mechanism

Validation can happen either in insisting that a verb is obeyed normally only if an item has changed (e.g. to make GET conditional on there having been a change) or only if an item hasn't changed (e.g. so that a PUT doesn't over-write changes).

Last modification dates can be compared simply. Either the server date is later than the date quoted, or not.

Entity-Tags can be slightly more cumbersome.

E-Tags come in two forms, weak and strong. A strong E-Tag header looks like:

ETag: "abcd"

A weak E-Tag header looks like:

ETag: W/"abcd"

The difference relates to the sort of changes that require them to change.

An entity can change in a way that is not semantically significant, e.g. <someXml><aTag></aTag></someXml> could be replace with <someXml><aTag/></someXml> and any compliant XML parser would treat them the same.

Similarly there can be many changes made to different entity types that are not that significant and where it would be perfectly okay for a cached version to continue being used.

However, strong ETags are also used for partial entities. The most common form of these is if you resume a download of a large file that was stopped part-way through. Now, consider a client that had downloaded the above XML as far as <someXml><aTag></aT

and then stopped. It knows it has the first 19 bytes already so it can ask for the rest rather than the whole entity (in this case it takes more than 19 bytes to ask for that, but in the case of a file of several megabytes in size this is a significant gain). However, we've changed to the shorter form, so the portion from the 10th byte on looks like:


and combining them produces:


This is not well-formed XML and so the process fails.

E-Tags are also used to differentiate between different representations, the French version is different from the German, the GIF version different from the PNG, the gzip-compressed from the plain, and so forth.

This generally takes more arduous work on the part of the server than last-modification dates, but it allows a greater guarantee to be made, so is superior to last-modification dates when applicable.

Invalidation Mechanism

Many events could conceivably lead to a cached record becoming invalid during the period that it is considered fresh for. Some of these could be due to operations on the web itself, and some of those could be routed through the cache in question.

Caches cannot know about the interactions between different resources (a PUT, POST or DELETE to one resource could have a knock-on effect to innumerable related resources) and they cannot know what exactly has been done to a particular resource (a PUT or DELETE may be handled in a relatively simple manner, but a server may apply all manner of complex transformations and calculations to a resource so a cache cannot assume it will GET back what it has put or even that a DELETE will result in a 404 or 410). A cache can though reasonably assume that if a resource has been the object of a PUT, POST or DELETE request that it has probably been changed as a result of it. Therefore any such request should result in the cache considering all cached representations of that resource to be invalidated, and at the very least requiring revalidation before being used again.

Response Headers

These are the headers that a server may include in a response to a client that relate to caching. They include both those that may be set by an origin server (that is to say, the webserver itself) and those that may be set by a proxy as the message passes through them)


The sender's estimate of how long it took between it requesting and receiving the response. This allows proxies to pass on information so that clients and other proxies down-stream have an accurate view of how old the response was when they receive it.


Can contain one or more of the following: (extensions are also possible)

public Can be cached in any cache, even if it normally would be considered uncacheable or only cacheable in a private cache
private Intended for one user, can be stored in a private cache, but not in a public cache even if it would normally be considered more widely cacheable.
private="fieldnames" The field names given can not be stored in a shared cache.
no-cache The cache can store the response, but may not use it without revalidating, even if it would normally be prepared to use stale responses or consider the cached response fresh.
no-cache="fieldnames" The field names given cannot be returned from the cache without revalidating
no-store The cache must not store the entity at all, and must not store the request it is responding too (intended to help prevent accidental leaks of sensitive information – though such guarantees are best produced through SSL)
no-transform Most proxies behave transparently – that is they strive to make it seem as if the client contacted the server, except for speed. Some, for various reasons, will perform transforming operations upon entites. This header forbids that.
must-revalidate Forbids stale entities being used, even if the server cannot be contacted.
proxy-revalidate Like must-revalidate, but only applying to proxy caches. In particular this is used with resources which require authentication, but the server is allowing the proxy to store – the proxy will still have to pass back authentication headers before being allowed to use its cached copy.
max-age=num Age of entity before it is considered stale.
s-maxage=num Over-rides max-age for proxy caches, but not private caches. Implies proxy-revalidate also.


Sets and expires date, e.g.:

Expires: Thu, 01 Dec 1994 16:00:00 GMT


Sets the last modified date, e.g:

Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT


Sets an entity tag, described above, on a response. E.g.:

ETag: "abcde" – labels the entity with the strong entity-tag "abcde".

ETag: W/"abcde" – labels the entity with the weak entity-tag "abcde".


The response header Pragma: no-cache is often seen, but is not defined as a response header (see below about it as a request header) and is therefore not reliable. It's not clear either whether it is intended to have the same meaning as Cache-Control: no-cache or Cache-Control: no-store. It's probably harmless to send it as well as the relevant Cache-Control header, but probably pointless also, though it may affect some older caches.


A resource may have more than one representation. Obviously it is harmful for a cache to return the incorrect representation in response to a given request. Entity tags solve one part of this problem, in identifying specific entities, but not all of it. It is also necessary for a cache to understand which headers may result in a different representation being sent.

For instance, consider a resource has four versions an HTML and plain text version, each available in English and German. Therefore the server will be paying attention to both the Accept: header (to decide between text/html and text/plain) and the Accept-Language header (to decide between "en" or a related tag and "de" or a related tag). The response would therefore come with the header:

Vary: Accept, Accept-Language

Any cache would then recognised that as well as matching on the URI it must also match on the contents of the Accept and Accept-Language headers before sending a response. If a second request does not come with the same values in this headers it must re-validate with the server (though since the same version may still be returned it can use If-None-Match to avoid a full response).

Caches that are not capable of storing such information along with responses are of course forced to not cache such responses, since they are not capable of correctly checking they are sending the correct one (at the time of writing the browser cache in Internet Explorer is an example of this).

The header Vary: * indicates that some factor other than request headers is used for determining which representation to send.


This identifies any proxy (whether caching or not) that a message passed through and where it in turn received the response from. It doesn't have a strong effect upon caching in itself, but expect to see (and make use of) it if you are debugging or exploring caching issues.


Warning headers inform a client about some aspects of the transmission of a response that may indicate a lack of transparency (e.g. if a stale cached response was used because the server couldn't be reached).

Each warning header can contain more than one warning, but more than one warning header can be present in a response. If a proxy is adding a warning to a response that already has one or more warnings it will add its warning in a separate header beneath the others.

The following warnings are defined:

110 Response is stale Included if a proxy returns a response it would normally consider stale.
111 Revalidation failed Included if a proxy tried to revalidate a stale response but was unable to do so.
112 Disconnected operation Included if a proxy is deliberately not connected to the network.
113 Heuristic expiration Included if a proxy decided on a expiry date heuristically rather than by explicit headers, and the response is more than 24 hours old.
199 Miscellaneous warning Included for various other warnings. The text in the warning may give further clues.
214 Transformation applied Included in a proxy has performed any sort of transformation to the response (can be left out if a proxy upstream has applied a transformation).
299 Miscellaneous persistent warning Like 199, but is not deleted from the cached entry when that entry is revalidated.

Request Headers

These are headers that a client may include in making a request to a server or that a proxy may add in passing on a request.

Accept, Accept-Charset, Accept-Encoding, Accept-Language

Each of these headers, which indicate content-types, character sets, encodings and languages that a client will accept, and the relative preferences amongst them. Not normally considered caching-related, they affect caching in affecting that they are the headings most likely to be used in selecting between different representations of a given resource (see the notes on E-Tags and Vary above). Other headers can be used to affect choice of representation, and be mentioned in a Vary header, and indeed other factors entirely may have an effect (requiring Vary: *), but these are probably the four most commonly used in this case. See the section of content-negotiation for more.


In a request, can contain one or more of the following:

no-cache The cache can store the response, but may not use it without revalidating, even if it would normally be prepared to use stale responses or consider the cached response fresh.
no-store The cache must not store the entity at all, and must not store the request it is responding too (intended to help prevent accidental leaks of sensitive information – though such guarantees are best produced through SSL)
max-age=num The client is willing to accept a response whose age is no greater than the specified time. Stale responses are not allowed unless explicitly stated.
max-stale The client is willing to accept stale responses.
max-stale=num The client is willing to accept stale responses, but which are is no more than num seconds past the point they would have been considered stale.
min-fresh=num The client is willing to accept a response that will still be considered fresh after num seconds.
no-transform Proxies must not apply any transformation to the response.
only-if-cached The client only wants the response if it can be satisfied from the cache (or a group of caches with internal linkage). If this cannot be done then a 504 Gateway Timeout response should be returned.


The server must only perform an operation if a current representation has an entity tag given in this header, or in the case of If-Match: *, if there is at least one current representation. This is used to make sure that a resource hasn't been changed since the representation in a client's cache was obtained, to avoid a PUT or DELETE from one client over-writing a PUT or DELETE from another.


A full response should only be returned if the representation has changed since the time given. Note that this could mean either a semantically-significant change (i.e. the resource has changed) or not (i.e. some implementation detail concerning the representation has changed).

If there has been no such change then the response should be a 304 Not Modified rather than a full response. Date, ETag and Content-Location headers must be set if they would have been set in the case of a 200 OK response with the full body. Expires, Cache-Control and Vary headers must be set if they might differ from that sent in any previous response.


A full response is only returned if there is no entity matching an entity tag given in the header, otherwise a 304 Not Modified is returned. This is also used by a PUT or DELETE request made in the belief that there is no current representation there (in which case If-None-Match: * may be used to match any existing representation).


Like If-Match this is used to ensure that a PUT or DELETE doesn't over-write another PUT or DELETE, but using the Last-Modified date.


A request of Pragma: no-cache is used to ensure that a request is validated with a server. It is equivalent to Cache-Control: no-cache, but used for backwards-compatibility with HTTP/1.0 caches. If a request may be treated by a proxy not known to be HTTP/1.1 compliant then both headers should be sent.

Recipes and Techniques

Some techniques for dealing with common cases.

Calculating the last-modification date of a response.

The last modification date of a dynamically generated response can be difficult to calculate.

Let us consider the following page.

It is produced by a script. It contains a thread of a discussion board which contains multiple posts. Posts can be added, deleted or edited. The page uses images, CSS and Javascript.

The Images, CSS and Javascript are not a problem. They are resources link to, and have their own caching information.

The last-modification date of the page is therefore the most recent of:

  1. The last time the script was changed.
  2. The last time any includes the script uses was changed.
  3. The last time the thread was changed (e.g. renamed)
  4. The last time any post was changed (including creation)
  5. The last time a post was deleted.

Assuming the thread and post are built from information in a database it is relatively easy to store a timestamp/datetime field initialised to getutcdate() or whatever function returns the current time in UTC/GMT in the database in question and to update that upon UPDATE through a trigger or insisting upon a single update procedure which guarantees such an update. (Note that all HTTP times are in UTC/GMT, this can be particularly irksome for developers in Ireland, the UK, Portugal or another country that uses UTC in the winter time, as they may not find they have accidentally used local time until daylight saving time kicks-in in the summer).

Deletions are more difficult, since they also have to be noted. One method is to cause the thread's database record to be updated upon deletion, another is to "soft-delete" posts – setting a "deleted" bit on rather than actually deleting the post – so that the date of deletion is available that way.

Combining script change time with the time obtained from the database can also be tricky, as these times are coming from very different sources. However it generally isn't difficult for a script to find its own last-modification date and then find the most recent of that and the last-modification date of the database objects. Finding the last modification dates of includes, libraries or the compiled code used by some frameworks (such as ASP.NET) can be trickier. One imperfect solution is to record a application last-modification or last-restart time. This will lead to many unnecessary requests, but not to stale data being used when it shouldn't.

Calculating an E-Tag for a Response.

Let's consider calculating an entity tag for the above page.

We must change it any time the code producing the page changes (possibly not for a weak e-tag).

We must change it any time a feature of the thread changes.

We must change it any time a post changes, including being added or deleted.

We must also send different entity tags for different representations of the thread.

A simple method is as follows.

  1. Define a string for each of the different representations that may be sent. E.g. if we are sending both HTML and RSS versions we might use "h" and "r" respectively. If there is more than one factor we are varying upon (say we have both different content types and different languages supported) it can be easiest to deal with each factor separately.
  2. Define a string for the script and other code that is used to generate the page. This could be hard-coded into the script and changed whenever the script is changed, though it is safer to base it on dynamically obtaining a version number or last-modified time (you must be able to guarantee that if the last-modified time is, for example, given to the nearest second, that there won't be two or more changes in a second – for scripts you normally can, for other objects this may not be possible).
  3. Store a change count in the database record for the thread. Update it every time it changes or a post is changed, inserted or deleted.
  4. Concate these strings together, say the script was last changed on the 23rd of May 2006 and the thread has a change-count of 43. The e-tag for the HTML version could then be: h2006052343.

If we want to make use of a change count for each post (to reduce the number of writes made to the thread table in the database) this is trickier; if we had two posts in a thread with a change count of 4 and 1 and we delete the second and then add a third the method of simply concatenating change counts would still be combining 4 and 1, though the thread has changed greatly. We need to in some way represent the effect that deleted threads have had upon the thread. This is possible, but not easy.

Another technique is to use a reliable hash of the entity. If we use the MD5 hashing algorithm we can also use the same value for the Content-MD5 header (used to guard against accidental damage to a message in transit). Hashing an entire message body before transmission is an expensive operation, and not always practical.

Handling If-Modified-Since and If-None-Match headers.

Once you have calculated a last-modification date and/or entity-tag, you can easily handle If-Modified-Since and If-None-Match headers.


  1. Calculate the last modified date as described above.
  2. If If-Modified-Since is specified compare the date given in that header with the modification date calculated.
  3. If the last-modification date calculated is less than or equal to (a simple string comparison for equality could have false negatives, but generally won't and it will have no false positives) then:
    1. For GET and HEAD requests you can return a 304 Not Modified response. You must also include the date header (automatically added by most server-side technologies), the E-Tag and Content-Location header if you would have sent it with a full response, and any Expires, Cache-Control or Vary headers that could conceivably have changed since the version in the cache was obtained (in particular, if you are calculating the Expires header by adding a certain time-span onto the current time, then it would have changed).
    2. While If-Modified-Since is not normally used with POST, PUT or DELETE, it is best to respond to such a method with 412 Precondition Failed if the last-modified date hasn't changed. (At the same time, because this response isn't explicitly mentioned in the RFC like it is for the other conditional requests, it's perhaps best to avoid using it when writing client-side code).
  4. Otherwise continue as you would otherwise in handling the request.


Handling If-None-Match is analogous to handling If-Modified-Since, except that you are comparing e-tags, rather than last-modification dates.

The main point to note is that If-None-Match requests could include the wildcard * or could include multiple e-tags. It is important that you handle these correctly, though it is pretty easy to turn a list of e-tags into convenient datatype to cycle through (trivial in many languages). If you have no reason to expect that more than one e-tag will be sent for comparison then a direct comparison between generated e-tag and header value can be a suitable optimisation and/or short-cut.

In responding to an If-None-Match request with 304 Not Modified it is important to include the e-tag in the response, since otherwise the cache does not know which of a hypothetically boundless number of representations you are saying should be re-used (even in the case of there only actually being one representation – the cache doesn't know that).

Long-Lived Cached Libraries

Applications often make use of libraries of files, particularly banks of related images, javascript files, CSS stylesheets, XSLT transform stylesheets and similar, that are used throughout the application.

In many such cases these files will only be altered as part of a well-planned update of a major component of the application (since a small change in just one file has too many possible side-effects to be made in an ad-hoc manner).

Let's imagine we use four javascript files; util.js contains functions used by the other three. xml.js contains various functions for dealing with XML. dhtml.js contains functions for dynamically altering a webpage, and ajax.js uses all three of these to carry out various AJAX-style operations.

These four, potentially large, files will be used in many different places – sometimes all of them, sometimes just some. As such there is a clear advantage in having all of these files heavily cached, and not retrieved from the server unless absolutely necessary.

Let's first define headers that we will send with a response to make it as cacheable as possible.

The most obvious choice for an Expires header is something like

Expires: Thu, 31 Dec 2099 23:59:59 GMT

However, the HTTP spec says that servers shouldn't send an expiry date more than one year in the future. Still, this is not hard to script, so if we are sending our response on Sunday the 3rd of June, 2007 our expiry header will be something like:

Expires: Tue, 03 Jun 2008 22:29:45 GMT

Our cache-control header will look like:

Cache-Control: public, max-age=31536000 

Our entity tag can be anything at all. If it's possible that some implementation matter could mean that resonses would not be byte-for-byte identical each time then it would have to be a weak tag. Most likely, this will not be necessary.

ETag: "immutable" 

Our Last-Modification header should be hard-coded to some value. The obvious best choice is when the file really was last modified, though any fixed date in the past would work.

Altogether we have:

Expires: Tue, 03 Jun 2008 22:29:45 GMT 
Cache-Control: public, max-age=31536000
ETag: "immutable"
Last-Modified: Sun, 03 Jun 2007 24:11:34 GMT

With the expires header above being set dynamically to a year beyond the current date.

While re-validation is going to be rare, we can still deal with this and send a 304 Not Modified whenever applicable.

A general-purpose script for handling if-unmodified and if-none-match headers can be used here, or we can cheat with a "dumb" version that just assumes the presence of an If-Modified-Since header means we can send a 304. It is probably also safe in this case to handle an If-None-Match header "blindly". Be careful applying this shortcut though. Generally it saves no more than a very small amount of resources, so it is probably not really worth the risk of an inaccurate response being sent in an unforeseen circumstance unless you are *very* sure this can't happen.

Causing these headers to be sent with each of these requests means that you will massively increase the response speed perceived by users and greatly reduce the effort needed by your server to handle them (to practically nill, should you be using a reverse proxy). However what happense if you do need to change one or all of these files?

First, it will take up to a year for some clients to receive the new versions of the file (there are a few techniques that can be used to force clients to act in a way that forces the caches involved to be cleared, but they are imperfect at best).

Worse it is possible for some files to be returned from the server and others from a cache, so even if you take great pains to ensure that changes in how the files are used are backwards-compatible, this effort becomes more complicated – not only must uses of the files be backwards-compatible and assume they may receive older versions of the scripts, but also each script must make the same assumption about each other.

In the case of webservices this can be even worse. The sort of cases where we might apply the same technique for webservices make it even more likely to result in disaster – whole applications could be rendered useless.

This is exactly the sort of hypothetical scenario that sometimes leads to developers deliberately avoiding responses being cached.

There is one case where an updated version of a script would be guaranteed to be used – when it is considered a different resource, i.e. when it has a different URI.

Let's say we have our four files identified as follows:

At some point in the future we've made changed our util script twice and our ajax script once. We now have:

By linking to these versioned scripts we can be sure that we are getting the up to date version, and in particular, that any assumptions ajax/1.1 makes about util/1.2 are safe.

This has complicated using the scripts though. If we update a script we have to then update any file that makes use of it. This is potentially a maintenance issue (though various techniques can make this neglible) and more also reduces the cacheablility of those pages, meaning that our attempts to have some parts of our application be highly cacheable has resulted in perhaps the entire application being needlessly served directly from the server.

However. Let's define four resources that are the current version of each script. E.g. the current version of the util script is identified by

Now in dealing with a GET or HEAD request for that URI we can return a 302 or 307 redirect to and change where we redirect to when we are releasing version 1.3 or 2.0 of that script.

Redirects take considerably less overhead than full responses in most cases. We can also cause the redirect to be cached itself, though for shorter periods:

HTTP/1.1 302 Found 
Expires: Mon, 04 Jun 2007 12:38:25 GMT
Cache-Control: public, max-age=3600
ETag: "1.2"
Last-Modified: Sun, 03 Jun 2007 24:11:34 GMT

So, here we're happy for the current version of the script to be cached for up to an hour. When it's revalidated after that hour it will still redirect to the file for the 1.2 version, which is still cached. This mechanism gives us a good balance between reducing network utilisation and server processing on the one hand, and ensuring an up-to-date response on the other.

It also gives us a method for dealing with cases where a particular use case requires the older version of the script to be used, or when you want to bypass the caching mechanism during development and testing. However, in the vast majority of cases we will simply delete all records of the old script from the server (indeed the server implementation quite likely does not involve the script for version 1.2 being stored differently than that of 1.1 or 1.0, remember that while URIs often map to particular places in a file-system, this is only a matter of convenience – when it stops being convenient to you, stop doing it). Note that it does us no harm that there may be copies of version 1.2 on many caches for up to a year after we move to 1.3 or 2.0, those cached versions will only be used for another hour, since after an hour it will never be requested again.

This technique is not applicable universally. It is likely to be useful when:

  1. Changes are very rare.
  2. The sort of versioning concept being used here makes some sort of logical sense. If it seems strange to be talking about "version1.2" or whatever of a given resource, it's probably a bad idea.
  3. The files are referenced in a large number of cases and hence would be downloaded very often if they were not cached – such as javascript files and images used on many HTML pages, or "hub" resources that are used in many paths taken by a webservice client. Alternatively if a file is not particularly heavily used, but is very large, then it may also be useful (it won't be cached much in most use cases, since it isn't requested much, but this will deal with any surprise surges in interest).

Immutable Objects

The above technique for dealing with rarely-changed resources does so by defining another set of resources – for the state of the rarely changed resource in between each of those rare changes, and then considering each of those states to be static and immutable.

There are other cases where an object may be considered immutable either by its very nature (if you were to model some mathematical fact in a resource or set of resources, they aren't going to change) or a matter of policy (RFCs never change once published. W3C specs change from version to version but any stated version will not change).

These are clearly the most cacheable objects on the web, and sometimes we can even go beyond caching (a validating XML and/or HTML parser that understands "-//W3C//DTD XHTML 1.0 Strict//EN" does not have to use "" to get the DTD, but can have a hard-coded or otherwise obtained copy.

By extension, there can be a lot of information in a web application that you may wish to mark as truly immutable. Why then does the RFC 2616 fight against you by saying you SHOULD (and when an RFC says "SHOULD" in capital letters, that means you should have a really good reason before you do otherwise, and work out all the possible consequences and prepare for them, not just that they'd prefer that you didn't) not mark something as cacheable for more than a year, and that other parties should not consider you cacheable for more than a year if you do claim to be?

Well in practice a lot of things tend not to be as immutable as people planned (for examples examine the fall of just about every empire in history). Things tend to change, and the more people involved in something, the more likely something deemed immutable at one point will be forced to change at another. In all it's best not to assume your own plans for immutability are valid as you believe, and to be even more dubious about the plans of others. Use of the versioning approach above gives you close to the gains you can get from absolute immutability (maybe increasing the expiry on the redirects to as much as a couple of days) with nowhere near the risk of painting yourself into a corner and causing major problems with rolling out upgrades.

A related question, is how much you should put into the information published by a webservice, and how much you should specify as in that webservice's documentation for client's to "just know". Let's look back at the fact that clients may recognised "-//W3C//DTD XHTML 1.0 Strict//EN" and know it's meaning without dereferencing "". This seems to be an analogous situation, and indeed it is – the choice is given to clients to download the DTD from the web, or to use the public identifier as a key into its own store of information about XHTML strict.

The purely RESTful solution is that one always downloads the DTD from the web, or at least checks that it has a cached copy that is considered fresh. The most common applications for handling XHTML are web browsers. Block at your firewall and your browser is still going to be able to process HTML. Indeed, it would not be considered a robust and scalable solution if all browsers (not to mention editors [though some editors do check the DTD during validation checks] and other applications that deal with HTML) in the world had to continually hit one point on the web.

Does this make browsers unRESTful? No. Checking the web is not an appropriate solution for this problem. It is best solved outside of REST.

Why is it not an appropriate solution, especially since we're talking here about how great REST is?

Well, let's think about what browsers need to do with the information in an HTML document. They have to make sure that a document is well-formed and valid, where a DTD can help, but then they have to make best attempts at correcting invalid elements and attributes, where a DTD cannot help.

At this point they have still have to perform rendering, some of which could be built on top of a general-purpose CSS + XML rendering engine (though since web browsers and HTML predate CSS and XML by years this is only a recently applicable point) and some of which cannot. And that's assuming that the best method to deal with invalid HTML (which may not have any DTD at all) is to first fix and then render, rather than some more fluid solution. In all, the information in a DTD is of minimal interest to a browser for most operations, and what information it will make use of is a subset of the information it is required to have on the parsing of HTML, mostly browsers care about which DTD you are claiming to be valid against a lot more than whether you actually are valid against it (and only the browsers of the last few years care that much). Indeed if you do try to make a browser carry out an operation where the DTD genuinely is of use it is very likely to download it (in some browsers this will happen if you can load an XHTML page into a validating DOM document).

Therefore this is not really a case of REST being broken from at all. In general there will always be a certain amount of information about what is going on in any RESTful webservice that cannot be expressed well through the hypermedia of that webservice itself, but anything that can be expressed in hypermedia should.

Further, one should generally not go too far in insisting that something be considered immutable, to do so is to introduce brittleness into a system. Follow the specification's suggestions that you never consider anything to have a maximum age of over a year, and use the versioning technique above or another approach that gives you the ability to cope if you decide that your immutable resource changes - or at the very least, that a representaiton does.

Uncacheable and Ultra-Short-Lived Pages

There are two cases where it is vital that pages are not cached; when they change so frequently in semantically-significant ways that malfunction will result from cached copies being used, and for security reasons.

While the no-store cache directive is intended to cover security-sensitive reasons for avoiding a cache, there is ultimately no way to avoid someone that has seen a request or response from storing it. no-store is intended only to prevent accidental leakage, not deliberate wrong-doing. Since security considerations very obviously must include considering deliberate wrong-doing, this is not adequate for any use-case that really needs it. If you need to prevent a request or response being seen by unauthorised agents along the route it takes, use SSL or another form of encrypted connection.

To prevent something being cached because freshness is of ultimate importance we can make use of a few different techniques; an expiry date in the past, a constantly changing last-modified date, a max-age of zero and a no-cache and/or no-store directive. Often a belt and braces approach is taken (and the "Pragma" header is sometimes added in for good measure, since some older caches not compliant with HTTP/1.1 may have respected it as a response header):

Date: Sun, 03 Jun 2007 24:11:34 GMT
Last-Modified: Sun, 03 Jun 2007 24:11:34 GMT
Cache-Control: no-cache, no-store, must-revalidate, max-age=0
Pragma: no-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT

It is important that we can force responses to not be cached in this manner, but do take care not to over-use it.

In particular there are many cases where it is only necessary for a client to obtain an absolutely fresh representation in certain cases. If this is the case in a webservice, and if the client is capable of recognising when this would be the case itself, then we can put the burden upon the client to ensure that a fresh response is requested, rather than never allowing the response to be cached.

Forcing Fresh Responses As A Client

From the client side, to force a fresh copy to be obtained we can set a Cache-Control header to be sent in the request with with a no-cache directive. For backwards-compatibility we should combine this with a pragma:

Cache-Control: no-cache
Pragma: no-cache

Often though we don't need to insist upon a full end-to-end response. What we want is to ensure that the cached copy is revalidated (max-age is zero) and that we don't get a cached copy even if the server is unreachable (must-revalidate) in which case we can send:

Cache-Control: max-age=0, must-revalidate

Forcing Fresh Requests from the Client in Hypermedia

The above technique is easy to apply when we are writing a client for a webservice. It is also easy when documenting a webservice to indicate times when it should be used (and to advise against its over-use).

A related requirement is for the server to insist that when a client dereferences a URI obtained from the server in a hypermedia format it sends.

This is less easy. While the server is in control of the representations, including hypermedia links, the client is in charge of how it deals with those.

One common technique in browser-tragetted code is to set a field of the query portion of a URI to the current timestamp; hence ensuring that a different request is made each time, and completely bypassing all caching mechanisms. This is incompatible with cases where we might be happy to for the resource to be cached and must either depend upon Code-On-Demand (the link is written into the document through the use of javascript) or else it will require that the hypermedia document with the link is itself rendered uncacheable so that it will always have a different representation. The javascript approach has some uses when we need to by-pass the caching mechanism for one small piece of traffic (most often logging and tracking code), but in general the harm caused by such uniquely-generated links is so great that we should consider this an anti-pattern.

It is possible to obtain entities through AJAX techniques. This means we are essentially writing a webservice client in a Code-On-Demand context. If we are already doing such AJAX work this may be highly appropriate, if not this is considerable overhead on server, browser and development time.

In all, if a server has requirements as to how something should be cached or not it should deal with this by sending headers to deal with the most conservative case that will arise for a given resource and techniques to by-pass caching should be avoided unless absolutely necessary.

The javascript technique of adding timestamps to URIs should really only be used with a belt-and-braces approach; set appropriate headers to prevent caching in the response and use the javascript technique only to ensure that even buggy caches cannot resend the response. In practice this is less necessary these days than it has been historically when caching behaviour was less well defined. If you do make use of this approach at least the use of correct cache-control headers from the server will prevent well-behaved caches being filled with useless records they will never make use of again.

Note: Some browsers give the user the ability to force a fresh request, e.g. by holding down Shift or another modifier-key when refreshing the page. Some browsers do not. As such if you are writing code to be used by a browser it is both noteworthy that some users can force fresh responses to be sent, and that some can't.

Synching Implementation Caches with Web Caches

As stated above, often we use various forms of caching in the implementation of a web application beyond web caching itself.

Let's say you are writing a webservice application that obtains account information about 5 suppliers from a remote database. The action of obtaining this information is very expensive, but once you have it the information is small and rarely changes - and obvious case for building some sort of caching.

Now let's say a request comes from a client with the header:

Cache-Control: max-age=25

Because of this header any information in caches that was more that 25seconds old will not have been used. This will all take place at the level of the various caches that may be between the client and the server. The server though is now going to use information from a cached store of what is in the remote database.

In such a case, if there is any chance that the data in the database could have changed in the last 25 seconds, it is clearly more in keeping with the meaning of the request header that that cache is also refreshed.

There is nothing in the spec to insist upon such behaviour, and it may be inappropriate (if the cache update takes so long that it would not be possible to do so before a request would time out) or unnecessary (if the web application can be sure that the database has not changed), but otherwise listening to cache-control headers from clients may result in behaviour closer to client expectations.


Caching offers immense benefits to scalability. In providing explicit support for caching the REST style allows us to make the most of caching, while avoiding potential harm caching could cause if mis-applied. All web caching is built upon the REST principle that requests and responses are self-describing.

Page statistics
21365 view(s) and 18 edit(s)
Social share
Share this page?


This page has no custom tags.
This page has no classifications.