You can observe any kernel event without modifying kernel source code by leveraging eBPF tracepoints.

Let’s see this in action. Imagine we want to track every time a specific process, say nginx, opens a file.

First, we need to identify the kernel tracepoint that corresponds to file opening. The do_sys_openat2 tracepoint is a good candidate for this. We can list available tracepoints with sudo perf list 'syscalls:sys_enter_openat*'.

Now, let’s write a simple eBPF program to attach to this tracepoint. We’ll use bpftrace, a high-level tracing language for eBPF.

tracepoint:syscalls:sys_enter_openat2 /pid == 12345/ {
  printf("Process %d (%s) opened file: %s\n", pid, comm, str(args->filename));
}

In this bpftrace script:

  • tracepoint:syscalls:sys_enter_openat2: This specifies that we want to hook into the sys_enter_openat2 tracepoint, which is triggered just before the openat2 system call is executed.
  • /pid == 12345/: This is a filter. We’re only interested in events originating from the process with PID 12345. You’d replace 12345 with the actual PID of your nginx process.
  • { ... }: This is the action block that executes when the tracepoint is hit and the filter matches.
  • printf(...): This prints a formatted string to the console.
  • pid: This is a built-in variable in bpftrace representing the process ID.
  • comm: Another built-in variable, representing the command name of the process.
  • str(args->filename): args is a special object in bpftrace that holds the arguments passed to the traced function. args->filename accesses the filename argument of the openat2 system call, and str() converts it to a C-style string for printing.

To run this, you’d first find the PID of your nginx process (e.g., pgrep nginx). Let’s assume it’s 7890. Then you’d run:

sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat2 /pid == 7890/ { printf("Process %d (%s) opened file: %s\n", pid, comm, str(args->filename)); }'

Now, if your nginx process performs any file operations (like reading a config file, serving a static asset, or writing a log), you’ll see output like this in your terminal:

Attaching 1 probe...
Process 7890 (nginx) opened file: /etc/nginx/nginx.conf
Process 7890 (nginx) opened file: /var/log/nginx/access.log
Process 7890 (nginx) opened file: /usr/share/nginx/html/index.html

The power here lies in the fact that tracepoints are static markers embedded within the kernel’s code. They are designed specifically for instrumentation and debugging. Unlike kprobes, which can be attached to arbitrary kernel functions (and thus are more dynamic but potentially more fragile if kernel functions change), tracepoints represent well-defined event boundaries. eBPF programs can attach to these tracepoints, execute a small piece of code (the action block), and then detach, all without requiring a kernel recompile or reboot. This makes them incredibly stable and reliable for observing system behavior.

The args object is dynamically populated based on the specific tracepoint. Each tracepoint has a defined set of arguments. For sys_enter_openat2, these include filename, flags, and mode. bpftrace provides convenient ways to access and interpret these, like str() for null-terminated strings.

While bpftrace offers a high-level abstraction, under the hood, it’s generating eBPF bytecode that gets verified by the kernel’s verifier and then loaded into the kernel for execution. This verification step is crucial; it ensures that your eBPF program cannot crash the kernel or access arbitrary memory. It guarantees that the program will always terminate and only access memory it’s allowed to.

The tracepoints themselves are part of the kernel’s tracing infrastructure, often managed by the tracefs filesystem (usually mounted at /sys/kernel/debug/tracing). You can explore available tracepoints and their associated arguments using tools like perf or by directly examining files within tracefs.

This approach allows you to gain deep visibility into kernel operations for performance analysis, security monitoring, or debugging complex issues without any risk to your system’s stability.

The next logical step is to correlate these file open events with network activity, perhaps by tracing tcp_connect or tcp_sendmsg and seeing which files are accessed just before or during network operations.

Want structured learning?

Take the full Ebpf course →