Intro

While load balancing helps us scale by distributing traffic against multiple replicas of application resources, caching helps us make better use of these resources.

Caching is simply the act of storing data in a temporary location so that it can be accessed quickly. They take advantage of locality of reference, a principle that says that computer programs that request data tend to request that same data again.

tl:dr: a cache is a high-speed data storage layer. Data that is retrieved from a cached is guaranteed to be served faster due to prerequisite computations being skipped.

Caches are used in all computing layers (browser, server, operating system, hardware, etc) and are fundamental to the speed and efficiency of the modern internet. In this article, we will explore the types of Caches that exist and how they work.

Type of Caches

Application Server cache (improve this section)

Caches are commonly found at the boundary between a webserver and the browser. Placing a cache here enables storage of response data. If a request is made to the service that contains the same arguments as a previous call, the webserver node can quickly fetch data from the disk. Alternatively, application server caches can cache data in memory (super fast) than local disk (faster than network called elsewhere).

In a world of horizontal scaling systems, a problem becomes apparent. If we cache something on one node in a highly available system, what happens if a request is a load balanced to a different node? In this case, a cache lookup would reveal that the data requested doesn’t exist and do the computations again to serve the same data. One solution to this problem is to use a distributed cache. Distributed caches work to make sure that cache data appears the same to all nodes in a system. The trade-off here is latency, as a network call to a cache would be slower than fetching data from memory or disk. A distributed cache, however, still saves us time compared to having to do computations, database queries, and API requests due to the data being close to the node of the original request.

Implementations of distributed caches can differ. While some will take advantage of a node disk and memory in a cluster and pool them together, others will spin up a cluster of their own with replication strategies.

Content Delivery / Distribution Network

CDNs are one of many backbones of the modern internet. The main idea of a CDN is simple. If you make a request for static media in Australia, you wouldn’t want to wait for ages while that content is being served from some server in America.

A CDN is a network of geographically distributed servers around the world strategically placed, such that static files can be cached close to the physical internet boundary (edge) of the original request. This greatly speeds up request times for content-heavy traffic e.g. Netflix, google images, tiktok.

Cache Invalidation

Caching requires some maintenance to ensure that the data being retrieved is consistent with its source of truth. For example, if we modify some data in a database, the cached value for it should be invalidated and ideally updated. Let’s explore some strategies for how we can do this

Write-through cache: Data is written to the cache and main memory location of the data (usually database) at the same time. This minimises the risk of data loss with a tradeoff of higher latency for write operations.

Write-around cache:

Write-back cache:

Cache Eviction Policies

Cache’s are not infinite in size and must have entries cleared up for space. Here are the most common techniques:

First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before.
Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before.
Least Recently Used (LRU): Discards the least recently used items first.
Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first.
Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first.
Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary.

Sources

https://www.imperva.com/learn/performance/what-is-cdn-how-it-works/ (images from)

https://aws.amazon.com/caching/

https://www.cloudflare.com/learning/cdn/what-is-caching/

https://www.educative.io/courses/grokking-the-system-design-interview/