The write-behind cache pattern is less about hiding latency and more about decoupling the write operation from the eventual persistence.

Imagine you’re writing a critical log entry. With a synchronous write, your application thread blocks until the disk confirms the write. This is slow. With a write-behind cache, you write to memory (blazing fast!) and then asynchronously tell a background process to write it to disk. Your application is free to do other things immediately.

Here’s a simplified Java example using an in-memory ConcurrentHashMap as the cache and a separate ExecutorService for background writes:

import java.util.concurrent.*;
import java.util.*;

public class WriteBehindCache<K, V> {
    private final ConcurrentMap<K, V> cache = new ConcurrentHashMap<>();
    private final ExecutorService persistenceExecutor = Executors.newSingleThreadExecutor();
    private final PersistenceService<K, V> persistenceService;

    public WriteBehindCache(PersistenceService<K, V> persistenceService) {
        this.persistenceService = persistenceService;
    }

    public void put(K key, V value) {
        cache.put(key, value);
        // Schedule the persistence task
        persistenceExecutor.submit(() -> {
            try {
                persistenceService.write(key, value);
                // Optionally remove from cache after successful persistence,
                // or implement a TTL/LRU eviction strategy.
                // cache.remove(key);
            } catch (Exception e) {
                System.err.println("Failed to persist " + key + ": " + e.getMessage());
                // Implement retry logic or dead-letter queue here
            }
        });
    }

    public V get(K key) {
        return cache.get(key);
    }

    public void shutdown() {
        persistenceExecutor.shutdown();
        try {
            if (!persistenceExecutor.awaitTermination(60, TimeUnit.SECONDS)) {
                persistenceExecutor.shutdownNow();
            }
        } catch (InterruptedException e) {
            persistenceExecutor.shutdownNow();
            Thread.currentThread().interrupt();
        }
    }

    // Interface for the actual persistence mechanism
    interface PersistenceService<K, V> {
        void write(K key, V value) throws Exception;
    }

    public static void main(String[] args) throws InterruptedException {
        // Example implementation of PersistenceService (e.g., writing to a database)
        PersistenceService<String, String> dbWriter = (key, value) -> {
            System.out.println("Persisting [" + key + "]: " + value + " to database...");
            // Simulate a slow database write
            Thread.sleep(500);
            System.out.println("Persistence complete for [" + key + "]");
        };

        WriteBehindCache<String, String> cache = new WriteBehindCache<>(dbWriter);

        System.out.println("Starting writes...");
        cache.put("user:1", "Alice"); // This returns immediately
        System.out.println("Write 1 submitted.");
        cache.put("user:2", "Bob");   // This also returns immediately
        System.out.println("Write 2 submitted.");

        // Simulate application doing other work
        Thread.sleep(200);
        System.out.println("Application continuing work...");

        System.out.println("Retrieving user:1: " + cache.get("user:1")); // Might be in cache or might have been persisted already

        Thread.sleep(1000); // Wait for async writes to likely complete

        cache.shutdown();
        System.out.println("Cache shut down.");
    }
}

The core problem this solves is the I/O bound nature of traditional persistence. By offloading the write to a separate thread pool, the main application threads can continue processing requests without waiting for disk or network latency. This dramatically increases throughput for write-heavy workloads. The ExecutorService acts as a buffer; if the persistence layer is temporarily slow, writes queue up in the executor’s thread pool, not blocking the application.

The key levers you control are:

  1. Cache Implementation: ConcurrentHashMap is simple, but you could use distributed caches like Redis or Memcached, or even specialized in-memory data grids. The choice depends on scale, consistency needs, and fault tolerance.
  2. Persistence Executor: The size and configuration of the ExecutorService (or equivalent thread pool/queue) are crucial. Too few threads and the queue can back up, negating the benefit. Too many and you risk overwhelming the persistence layer or consuming excessive resources. Executors.newSingleThreadExecutor() is a starting point for simple cases, but a ThreadPoolExecutor with a bounded queue and appropriate rejection policy is often better for production.
  3. Persistence Service: This is the actual write method implementation. It could be writing to a relational database, a NoSQL store, a file system, or even another message queue. Its performance directly impacts how quickly the cache can be cleared or how much data can back up.
  4. Error Handling & Retries: What happens if persistenceService.write fails? The example shows a simple println. Production systems need robust retry mechanisms (e.g., exponential backoff) or a dead-letter queue to handle persistent failures without losing data.
  5. Cache Eviction/Invalidation: The example put keeps the item in the cache indefinitely. In a real system, you’d need strategies to remove items from the cache once they are persisted, or after a certain time-to-live (TTL), or based on least-recently-used (LRU) policies. This is particularly important if the cache is intended to be a temporary staging area.

When the persistence layer is slow, the write-behind cache can mask this by accumulating writes in its in-memory buffer and background queue. This means that while your application thinks writes are fast, the actual data might be sitting in memory or waiting in a queue for a significant duration before hitting its final destination. This "write latency illusion" is the pattern’s core strength and also its primary risk if not managed carefully. You might see a spike in memory usage or a growing queue size if the persistence layer becomes a bottleneck.

The next challenge is handling cache invalidation or ensuring consistency when reads might bypass the cache and go directly to the source of truth, or when multiple writers are involved.

Want structured learning?

Take the full Caching-strategies course →