Caching is the single biggest performance lever in almost every system, but its implementation is usually a messy, uncoordinated affair.

Let’s see how it all fits together. Imagine a simple web request:

graph LR
    A[User Browser] --> B(DNS Cache);
    B --> C(CDN Edge Cache);
    C --> D(Load Balancer Cache);
    D --> E(Web Server Cache);
    E --> F(Application Cache);
    F --> G(Database Cache);
    G --> H(OS/Disk Cache);

1. Browser Cache: Your browser aggressively caches static assets (CSS, JS, images) based on Cache-Control and Expires headers. If you hit refresh, it checks these first. If the resource is fresh, it serves it directly, skipping the network entirely. This is your first line of defense against latency.

2. DNS Cache: Before even hitting a web server, your system needs to resolve the domain name to an IP address. Your OS and your router both maintain DNS caches. If you’ve visited example.com recently, your system might already know its IP address without needing to ask a DNS server. This speeds up the initial connection.

3. CDN (Content Delivery Network) Edge Cache: CDNs distribute your static assets across servers geographically closer to your users. When a user requests an asset, they hit the nearest CDN edge server. If that server has a cached copy, it serves it. This drastically reduces latency for global users. Think of it as a distributed, intelligent browser cache for your whole site.

4. Load Balancer Cache: Some load balancers can cache responses for identical requests. If multiple users request the same dynamic content that doesn’t change frequently, the load balancer can serve it directly without bothering the backend web servers. This is less common for dynamic content but effective for things like API endpoints that return consistent data.

5. Web Server Cache: Web servers like Nginx or Apache can cache entire page responses or specific components. This is often configured at the reverse proxy level. For example, Nginx can use proxy_cache to store full HTTP responses from backend application servers, serving them directly on subsequent identical requests.

6. Application Cache: This is where developers have the most control. It’s memory within your application (e.g., using Redis, Memcached, or in-memory maps) to store frequently accessed data. This could be user session data, results of expensive computations, or pre-rendered HTML fragments. It bypasses database lookups entirely.

7. Database Cache: Databases themselves have multiple layers of caching. The most significant is the buffer pool (e.g., MySQL’s innodb_buffer_pool_size). This keeps frequently accessed data blocks and index pages in RAM. When a query hits, the database first checks if the required data is already in the buffer pool. If so, it avoids a slow disk read.

8. OS/Disk Cache: At the lowest level, the operating system also caches disk blocks in RAM (e.g., Linux’s page cache). If the database or application needs data from disk, the OS checks its cache first. This is a general-purpose cache that benefits any process reading files, not just databases.

The surprising truth is that most of these caches operate with minimal coordination. A browser might request an asset that’s also cached by the CDN, which is also cached by the web server, and then by the application. The system doesn’t "know" about all these layers; it just checks its own cache first. If it finds a valid entry, it stops. This redundancy, while seemingly wasteful, is what makes the system resilient and fast.

What most people don’t realize is how profoundly cache invalidation strategies affect system behavior. A poorly managed cache invalidation can lead to stale data being served, often in subtle ways that are hard to debug, because the system is working correctly according to its cached state, not the actual current state.

The next logical step is understanding how to effectively manage cache invalidation across these layers.

Want structured learning?

Take the full Caching-strategies course →