← Back

The curious case of the CDN Cache-HISS
August 6, 2023

A few years ago, I was asked to take a more active role maintaining the CDN configuration of the company I work for. My knowledge of CDNs at the time was pretty basic and could be summarised by the following diagram:

Cache Hit or Cache MISS

“Sure, I got this”. And for a short time, things were good. But soon after, I was asked the following question:

“We are reviewing the performance of a client’s requests - several of their requests are quite slow, despite being Cache HITs. What’s going on?”.

And I soon realised, who would have guessed - things are never that simple. Cache HITs? Cache MISSes? Cache-HIT ratios? Things have gotten complicated, and those terms don’t make all that much sense anymore..

Let me introduce you: the cache HISS: a little bit cache HIT, a little bit cache MISS… and it comes in different flavours!

Definitions

Before going further, let’s define some important terms. Believing there was no standard definition, I was going to define these myself, until it was pointed out to me that a RFC from 2022 did just that. We will therefore use the following definitions:

HTTP 304 - Not Modified

Imagine you are managing a CDN in front of a file repository (the origin). All the files in the repository are quite large, maybe several GBs. Those files do change, but really quite rarely. It is acceptable if new versions of a file are served within one hour of being uploaded, therefore you configure a Time-To-Live (TTL) in your CDN: the CDN should cache the file for not more than one hour after retrieval.

Assuming each file is requested several times per hour, this would mean that the CDN would need to download each file again every hour, to ensure it has the latest version in cache. In practice, a CDN has multiple caches around the globe (Points-of-Presence or POPs), so it might be worse: each POP might need to refresh every file every hour.

This could quickly become very expensive and impractical: your origin would be constantly sending many large files that have not changed to the CDN.

To not overload the origin, modern CDNs supportsETags. The Etag is an optional response header returned by the origin, its value corresponding to a specific version of the file being served. How this ETag is calculated has never been specified and can vary by implementation.

If the CDN receives a request for which the cached file’s TTL has expired, and the file was previously returned with an ETag response header, it would add a if-none-match request containing the value of that Etag. This is akin to telling the origin “Can you send me file X? I already have this version in cache, so if it has not changed, do not sent me the whole file again”.

If your origin supports ETags, it will compute the ETag of the file requested. If the file has not changed, the computed ETag will match the ETag sent by the CDN in the if-none-match header, and instead of sending the whole file back, your origin will return a HTTP/304 Not Modified response. The CDN would then renew the TTL of the file in cache using the information returned by the origin in the cdn-cache-control and cache-control headers, and serve the file it has in cache.

What happened?

According to our definition - this is a cache MISS. Somewhat unintuitively though, the request was actually served using data from the CDN cache.

Is this a Cache-HISS?

Request coalescing/collapsing

To improve on the previous architecture, you decide that setting a TTL of 1h on your files is not that great after all. You need to wait up to an hour for the new versions of your files to be served, and the revalidation requests from the CDN are not very efficient, since most of your files never change!

You decide that having a much higher TTL (1 year), and purging files from the CDN when they change would mean fewer revalidation requests to your origin, and ensuring the new versions of the files are available much sooner.

Now - some of the files are requested quite a lot. Especially after a file changes, a lot of people want to download the new version! So let’s see what would actually happen.

This is quite impractical. Therefore, to reduce the impact on the origin, modern CDNs will use request collapsing(or: Coalescing). Request collapsing is an optimisation where the CDN will realise, upon receiving one of these 100 requests, that it already is retrieving that file from the backend for an earlier request. It will therefore “park” the request, waiting for the earlier request to complete. Once the earlier request is completed, the file will be served from cache for all “parked” requests. We say that those “parked” requests are “collapsed” into the earlier request.

What happened?

Unintuitively enough - while the requests that returned cache HITs did not go to the origin, they were waiting, some for up to 10 seconds, for the first request to complete. According to the RFC 9211, those are even cache MISSes…

Another case of cache-HISS?

Multi-Tiered architectures

In an earlier blog post about the execution model of AWS Lambda@edge in Cloudfront, I briefly touched on the concept of multi-tiered CDN architectures. Indeed most CDN nowadays [Cloudflare’s orpheus, Cloudfront Origin shield, Fastly shielding] enable the configuration of multiple layers of caching, 2, sometimes even 3.

Cloudfront’s tiered architecture

In this example of a three-tier setup with Cloudfront, the “edge location” will be geographically quite close to the user - but the “origin shield” will be geographically much closer to the origin, and potentially quite far from the user. What this architecture implies - is that a request might be a HIT or a MISS in every of the different caching layers.

Let’s take the following use-case: your origin is located in Virginia (US-East), and uses a CDN using a three-tiered architecture such as the one presented above. A user from India requests a 1KB file. Noone in India requested that file before, therefore neither the edge location nor the Regional Edge Cache has the file in cache. The request therefore goes all around the globe to the origin shield cache in US-East. Someone in the US requested that file earlier, therefore the file is in cache in the origin shield, and the request is therefore a cache HIT in the shield.

What happened?

Also a cache-HISS?

Serving stale content

When a file is cached by a CDN and its TTL expires, or if you purge that file from the CDN, the CDN will often keep the file in cache for a little while and mark it as “stale”, instead of directly deleting it. This is because in some cases, you might decide that serving content that is slightly out of date is better than the alternative.

The most common case is when the TTL of a cached file in the CDN expires, and the CDN tries to fetch a new version from your backend server. What if there is an error fetching the new version, because your fileserver is currently overloaded or down? Would it be better to serve a 5xx HTTP error - or a slightly outdated file?

This behaviour is known as stale-on-error and is available with most CDN providers. Depending on your business requirements, this might be a good option to turn on.

Let’s take the following example:

What happened?

Cache-HISS!

Final thoughts

Looking at a single qualifier for a request, “HIT” or “MISS”, tells us very little about what really happened. In some cases, the request might hit a caching layer half way around the planet, wait for another request to complete, hit a timeout or an error, or even hit the origin and require computation there. By extension, a cache-hit ratio will fail to convey any information about user experience, or how many requests reached the origin.

The Cache-Status HTTP response header introduced by the RFC 9211 is a really interesting attempt at improving the situation and providing a standard way of exposing and tracking CDNs' caching behaviour. It recognises that to get meaningful information about a request, it is important to take into account the cache behaviour at each caching layer. It also provides a clear definition of what cache HITs and a cache MISSes are.

It is however pretty recent and implementation of that RFC is sparse. Most CDNs providers present some aggregated cache-hit ratio. Forget about it, and measure what matters: