The Thundering Herd Problem

Imagine you are sitting on a farm and then suddenly a bull start running towards you.
What would you do? Easy “I would just get out of the way” you would say. Well how about now?
If it’s not clear it’s a herd of raging bulls. Unless you have extreme luck you would likely be trampled. That's the exact situation of the server when any hot key expires in our system cache and suddenly thousands of requests hit our server, missing the cache and directly hammering the database. As a result, the database becomes overwhelmed, eventually causing it to crash and bringing down entire system. This is commonly known as “thundering herd” problem.
To prevent this issue, we use several caching strategies:
Jitter on TTL
Mutex
Stale while revalidating
Cache Warming
Jitter on TTL
In context of caching, jitter refers to introducing a random variation in the expiration time of the cache. This is done to ensure that all cached item instead expiring at the exact same time they expire at the slight different time due to added jitter.
Mutex
When a cached item expires and multiple requests try to fetch it at the same time then a mutex lock ensure that only one of those requests go to database to recompute it's value and fill the cache while all the other requests either wait or server stale them.
Stale while revalidating
It serves the cache content immediately to the users (even if it's immediately expired) while simultaneously updating the cache with new data in the background.
It introduces the three different stages for cached content:
Fresh: content that has been recently cached, and is good to serve to users
Stale: content that is expired but still acceptable temporarily.
Rotten: content too old to be trusted.
Two variables define the boundaries between these states:
max-age: this sets the limit between fresh and stale. Requests older than this age trigger a cache update.
stale-while-revalidate: this sets the limit between stale and rotten. Only requests older than this age are fulfilled direct from the database or origin.
Cache-Control: max-age=60, stale-while-revalidate=300
This means that:
if the cache contents are less than 60 seconds old, just serve cached content with no further action.
if the contents are older than 60 seconds, then for the next 300 seconds, we can display cached content if requested, but quietly update the cache for the next visit.
if the contents are older than 360 seconds, ignore the cache contents, serve content direct from origin, and update the cache as well.
Cache Warming
Cache warming means preloading data into the cache before real user request it in order to reduce sudden database spikes if multiple users request it at the exact same time.



