Varnish Cache Configuration for High-Throughput Web Apps (2026)

Varnish can actually make your web app slower if configured incorrectly, despite its reputation for speed.

Let’s watch Varnish in action with a simple setup. Imagine a basic Nginx server serving static files, and Varnish sitting in front of it.

# /etc/nginx/sites-available/default
server {
    listen 8080;
    server_name example.com;
    root /var/www/html;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }
}

And here’s a minimal Varnish configuration (/etc/varnish/default.vcl):

vcl 4.1;

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {
    # Hide the backend's IP address
    set req.http.X-Forwarded-For = client.ip;
    # Remove any client-provided Host header to prevent cache poisoning
    unset req.http.Host;
    # Set a default Host header for the backend
    set req.http.Host = "example.com";
}

sub vcl_deliver {
    # Add a header to indicate if the request was a cache hit or miss
    if (obj.hits > 0) {
        set resp.http.X-Varnish-Cache = "HIT";
    } else {
        set resp.http.X-Varnish-Cache = "MISS";
    }
    return (deliver);
}

To start Varnish, assuming Varnish is installed:

sudo systemctl start varnish
sudo systemctl enable varnish

And Nginx:

sudo systemctl start nginx
sudo systemctl enable nginx

Now, if you curl http://localhost (assuming your Varnish is listening on port 80, which is the default), the first request will be a MISS. Varnish will fetch the content from Nginx (port 8080), store it, and return it to you. Subsequent requests for the same URL will be HITs, served directly from Varnish’s memory.

The core problem Varnish solves is the "thundering herd" and the inherent latency of fetching data from a backend application or database for every single request. By caching responses in memory (or on disk), Varnish can serve identical requests much, much faster than hitting the origin server. It acts as a reverse proxy and a highly efficient HTTP cache.

Internally, Varnish uses a powerful configuration language called VCL (Varnish Configuration Language). This language allows you to define precisely how Varnish should handle incoming requests (vcl_recv), how it should fetch content from backends (vcl_backend_fetch), how it should process responses (vcl_backend_response, vcl_hit, vcl_miss), and how it should deliver responses to clients (vcl_deliver). You can manipulate headers, change backend selection, implement complex cache invalidation strategies, and much more.

The backend block defines your origin servers. vcl_recv is where you decide what to do with the incoming request. vcl_deliver is where you can add or modify outgoing headers. The obj.hits variable in vcl_deliver is a Varnish-specific way to check if the object being delivered was previously fetched from the backend (a hit).

The req.http.Host manipulation in vcl_recv is crucial. Without it, if multiple domains point to the same Varnish instance, Varnish might serve cached content from domainA.com to a user requesting domainB.com if both have the same index.html and no Host header is set for the backend. Varnish needs to know which Host header to present to the backend to ensure it receives the correct content.

A common point of confusion is how Varnish determines what to cache and for how long. By default, Varnish respects Cache-Control and Expires headers from the backend. If the backend doesn’t send these, Varnish might not cache content aggressively. You can override this in VCL. For instance, to always cache a response for 5 minutes:

sub vcl_backend_response {
    # If the backend didn't specify a TTL, set one
    if (!resp.uncacheable && !resp.ttl > 0) {
        set resp.ttl = 5m; # Cache for 5 minutes
    }
    return (deliver);
}

This resp.ttl directive tells Varnish how long it should consider the object fresh. 5m means 5 minutes. You can use s for seconds, m for minutes, h for hours, d for days. !resp.uncacheable ensures you don’t try to set a TTL on an object explicitly marked as uncacheable by the backend.

The most surprising thing about Varnish’s vcl_recv is that it’s not just for receiving requests; it’s also where you decide if a request should even proceed to the backend. You can use it to drop requests entirely based on criteria like IP address, user agent, or even the presence of certain headers, all before hitting your origin server.

The next concept you’ll likely grapple with is cache invalidation. How do you tell Varnish to remove an item from the cache when the content changes on your backend, before its TTL expires?