bpftrace can dynamically trace kernel and user-space events using a high-level scripting language, making it incredibly powerful for understanding system behavior.

Let’s see bpftrace in action tracing system calls. Imagine we want to see every time a write system call is made, along with the process ID and the number of bytes written.

tracepoint:syscalls:sys_enter_write
/pid == 12345/ {
  printf("PID %d writing %d bytes\n", pid, arg0);
}

Save this as trace_write.bt and run it with sudo bpftrace trace_write.bt. Now, if process with PID 12345 calls write, you’ll see output like: PID 12345 writing 1024 bytes.

This script targets the sys_enter_write tracepoint, which fires just before the kernel executes the write system call. The /pid == 12345/ part is a filter, ensuring we only see events from a specific process. Inside the curly braces, printf formats and prints the current process ID (pid) and the first argument to the write syscall (arg0), which is the number of bytes.

The real magic of bpftrace is its ability to go beyond simple tracepoints. You can attach probes to kernel functions, user-space functions, kprobes, uprobes, tracepoints, and even USDT (User-level Statically Defined Tracing) markers.

Consider tracing function calls within a specific user-space application, like Nginx. If Nginx is running with PID 54321, and we want to see every time the ngx_http_process_request function is called:

uprobe:/usr/sbin/nginx:ngx_http_process_request
{
  printf("Nginx request processing started\n");
}

Running sudo bpftrace -p 54321 uprobe.bt (assuming uprobe.bt contains the script above and Nginx is running with PID 54321) will show "Nginx request processing started" each time that function is invoked. The -p 54321 flag attaches the probe to the process with that PID.

bpftrace scripts are compiled into eBPF programs and loaded into the kernel. The kernel then executes these programs in a sandboxed environment, guaranteeing safety. bpftrace handles the complexity of interacting with the eBPF verifier and the kernel’s tracing infrastructure.

You can also aggregate data. For instance, to count how many times each HTTP status code is returned by Nginx (assuming you can hook into the function that sets the status code, let’s call it ngx_http_finalize_request and the status is passed as an argument):

uprobe:/usr/sbin/nginx:ngx_http_finalize_request
/arg0/ {
  $status = arg0;
  $status_map[$status]++;
}
END
{
  print $status_map;
}

This script uses a map called $status_map. When ngx_http_finalize_request is called, it takes the status code (arg0) and increments the count for that status in the map. The END block, executed when the script is interrupted (e.g., by Ctrl+C), prints the contents of the map, showing status codes and their counts.

The power of bpftrace lies in its ability to correlate events. You can track a request through multiple functions, both in the kernel and user-space, by passing context between probes using maps. This allows for deep, end-to-end performance analysis and debugging that was previously only possible with kernel module development.

What most people miss is how bpftrace’s AST (Abstract Syntax Tree) representation allows for sophisticated control flow and data manipulation within the eBPF programs. You’re not just dropping print statements; you’re writing small, safe programs that run in the kernel. This means you can perform complex filtering, data aggregation, and even trigger actions based on observed events, all without recompiling the kernel or loading kernel modules. The hist() aggregation function, for example, can create histograms of values directly in the kernel, reducing the amount of data that needs to be sent back to userspace.

The next step is understanding how to use bpftrace for network traffic analysis, perhaps by tracing socket send/receive operations.

Want structured learning?

Take the full Ebpf course →