request collapsing(or: Coalescing). Request collapsing is an optimisation where the CDN will realise, upon receiving one of these 100 requests, that it already is retrieving that file from the backend for an earlier request. It will therefore “park” the request, waiting for the earlier request to complete. Once the earlier request is completed, the file will be served from cache for all “parked” requests. We say that those “parked” requests are “collapsed” into the earlier request.
What happened?
Unintuitively enough - while the requests that returned cache HITs did not go to the origin, they were waiting, some for up to 10 seconds, for the first request to complete. According to the RFC 9211, those are even cache MISSes…
Another case of cache-HISS?
In an earlier blog post about the execution model of AWS Lambda@edge in Cloudfront, I briefly touched on the concept of multi-tiered CDN architectures. Indeed most CDN nowadays [Cloudflare’s orpheus, Cloudfront Origin shield, Fastly shielding] enable the configuration of multiple layers of caching, 2, sometimes even 3.
In this example of a three-tier setup with Cloudfront, the “edge location” will be geographically quite close to the user - but the “origin shield” will be geographically much closer to the origin, and potentially quite far from the user. What this architecture implies - is that a request might be a HIT or a MISS in every of the different caching layers.
Let’s take the following use-case: your origin is located in Virginia (US-East), and uses a CDN using a three-tiered architecture such as the one presented above. A user from India requests a 1KB file. Noone in India requested that file before, therefore neither the edge location nor the Regional Edge Cache has the file in cache. The request therefore goes all around the globe to the origin shield cache in US-East. Someone in the US requested that file earlier, therefore the file is in cache in the origin shield, and the request is therefore a cache HIT in the shield.
What happened?
Also a cache-HISS?
When a file is cached by a CDN and its TTL expires, or if you purge that file from the CDN, the CDN will often keep the file in cache for a little while and mark it as “stale”, instead of directly deleting it. This is because in some cases, you might decide that serving content that is slightly out of date is better than the alternative.
The most common case is when the TTL of a cached file in the CDN expires, and the CDN tries to fetch a new version from your backend server. What if there is an error fetching the new version, because your fileserver is currently overloaded or down? Would it be better to serve a 5xx HTTP error - or a slightly outdated file?
This behaviour is known as stale-on-error and is available with most CDN providers. Depending on your business requirements, this might be a good option to turn on.
Let’s take the following example:
What happened?
Cache-HISS!
Looking at a single qualifier for a request, “HIT” or “MISS”, tells us very little about what really happened. In some cases, the request might hit a caching layer half way around the planet, wait for another request to complete, hit a timeout or an error, or even hit the origin and require computation there. By extension, a cache-hit ratio will fail to convey any information about user experience, or how many requests reached the origin.
The Cache-Status HTTP response header introduced by the RFC 9211 is a really interesting attempt at improving the situation and providing a standard way of exposing and tracking CDNs' caching behaviour. It recognises that to get meaningful information about a request, it is important to take into account the cache behaviour at each caching layer. It also provides a clear definition of what cache HITs and a cache MISSes are.
It is however pretty recent and implementation of that RFC is sparse. Most CDNs providers present some aggregated cache-hit ratio. Forget about it, and measure what matters: