eBPF lets you observe system behavior in ways that were previously impossible, but its true power lies in measuring latency across system layers, not just within them.
Let’s see how this works. Imagine a simple HTTP request. A user clicks a link, and a browser on a client machine sends a GET request to a web server.
Client Browser -> Network -> Load Balancer -> Web Server -> Database -> Web Server -> Network -> Client Browser
We want to measure the time each of these segments takes. Traditional tools give us pieces: browser dev tools for the client, ping for network, web server logs for its processing, and database slow query logs. But stitching them together and pinpointing where delays occur is a nightmare.
eBPF, however, can attach to kernel events at each step. We can trace:
- Kernel Network Stack Entry/Exit: When the request enters the kernel on the server and when the response leaves.
- Application Socket Operations: When the web server reads the request from its socket and when it writes the response back.
- Application-Level Logic: If the web server calls out to other services (like a database), we can trace those calls.
- Database Query Execution: The time spent actually running the SQL.
Here’s a simplified view of how you might instrument this with eBPF. We’ll use bpftrace, a high-level tracing language for eBPF.
First, let’s trace the time a request spends in the kernel’s network stack on the server. We’ll measure the time from when a packet arrives at the network interface to when it’s passed to the application’s socket buffer.
kprobe:tcp_v4_rcv
{
$ts = nsecs;
// Store timestamp keyed by connection tuple
$conn_key = arg0; // sk_buff *skb
$conn_info = $conn_key;
start_time[$conn_info] = $ts;
}
kretprobe:tcp_v4_rcv
/start_time[$conn_info]/
{
$ts = nsecs;
$conn_key = arg0;
$conn_info = $conn_key;
$latency = $ts - start_time[$conn_info];
// Filter for specific ports, e.g., 80 for HTTP
$port = nshdr->dest;
if ($port == 80) {
printf("TCP RX Latency (kernel): %d ns\n", $latency);
}
delete(start_time[$conn_info]);
}
This bpftrace script attaches to tcp_v4_rcv (when a TCP packet is received) and tcp_v4_rcv’s return. It captures the timestamp at entry, stores it keyed by the socket buffer pointer, and calculates the duration upon return. We filter by destination port 80 to see only incoming HTTP requests.
Next, let’s measure the time spent within the web server application itself, from when it reads data from its socket to when it writes data back. This requires tracing user-space functions.
uprobe:/usr/local/nginx/sbin/nginx:nginx_read_request
{
$ts = nsecs;
$conn_info = arg0; // Assuming arg0 is a connection identifier
start_time[$conn_info] = $ts;
}
uprobe:/usr/local/nginx/sbin/nginx:nginx_write_response
/start_time[$conn_info]/
{
$ts = nsecs;
$conn_info = arg0;
$latency = $ts - start_time[$conn_info];
printf("Nginx Processing Latency: %d ns\n", $latency);
delete(start_time[$conn_info]);
}
This example uses uprobe to hook into user-defined functions within the Nginx binary. The nginx_read_request and nginx_write_response are hypothetical function names representing when Nginx starts processing an incoming request and when it begins sending a response. The key conn_info here would need to be a value that uniquely identifies the request/connection across both probes.
To get the full picture, we’d also trace database calls. If your web server uses a library like libpq for PostgreSQL, you could trace its internal functions.
uprobe:/usr/lib/x86_64-linux-gnu/libpq.so.5:PQexec
{
$ts = nsecs;
$conn_info = arg0; // Assuming arg0 is the PGconn pointer
start_time[$conn_info] = $ts;
}
uretprobe:/usr/lib/x86_64-linux-gnu/libpq.so.5:PQexec
/start_time[$conn_info]/
{
$ts = nsecs;
$conn_info = arg0;
$latency = $ts - start_time[$conn_info];
printf("PostgreSQL Query Latency: %d ns\n", $latency);
delete(start_time[$conn_info]);
}
This traces the PQexec function in libpq, measuring the time spent by the PostgreSQL client library waiting for a query to complete.
By combining these, you can construct a full end-to-end latency profile. The total request latency observed by the client is the sum of:
Network RX Latency (client) + Network TX Latency (client) + Network RX Latency (server) + TCP RX Latency (server) + Nginx Processing Latency + Database Query Latency + Nginx TX Latency + Network TX Latency (server) + Network RX Latency (client)
The most surprising thing about measuring latency with eBPF is how many "black boxes" you can illuminate without modifying application code. You’re not instrumenting your Go application with OpenTelemetry or adding print statements to your Python script; you’re observing the system’s kernel and libraries from the outside, using events that already happen. This means you can trace third-party applications, legacy systems, or even the kernel itself.
The real power comes from correlating these measurements. Imagine an eBPF program that collects timestamps for each of these events for a specific request ID (if you can propagate one, e.g., via HTTP headers and then into kernel skb metadata). You can then aggregate these timestamps to show a waterfall of latency across layers for that single request.
For example, you might see:
- Network ingress: 100 µs
- Kernel TCP processing: 50 µs
- Nginx read/write: 2 ms
- PostgreSQL query: 50 ms
- Nginx response: 500 µs
- Network egress: 200 µs
This immediately points to the database query as the bottleneck.
The one thing most people don’t know is that kernel functions often have direct counterparts or related functions that can provide the exact boundary of a system’s responsibility. For instance, tcp_v4_rcv might be too broad. You might want to trace skb_copy_datagram_iovec to measure the time spent copying data from the kernel buffer to the user-space socket buffer, which is a more precise boundary for the kernel’s role in delivering data to the application.
Once you’ve mastered measuring latency, the next challenge is correlating these measurements across distributed systems.