eBPF can tell you what’s actually happening on your system, not just what the logs say is happening.
Let’s see eBPF in action. Imagine you’ve got a web server, and you suspect something’s poking around it. You want to know immediately if a process tries to open a sensitive file it shouldn’t, or if it’s trying to establish a network connection to a known bad IP.
Here’s a basic eBPF program, written in C, that hooks into the sys_enter_openat syscall. This means it runs every single time a process on your system tries to open a file.
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
// Define a structure to pass data from kernel to userspace
struct event {
u64 timestamp;
int pid;
char comm[TASK_COMM_LEN];
char filename[256]; // Max filename length we'll capture
};
// Define a map to send events to userspace
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 256 * 1024); // 256 KB ring buffer
} rb SEC(".maps");
SEC("tp/syscalls/sys_enter_openat")
int handle_openat(struct trace_event_raw_sys_enter *ctx) {
struct event *e;
void *ptr;
// Get a pointer to the ring buffer
ptr = bpf_ringbuf_reserve(&rb, sizeof(struct event), 0);
if (!ptr) {
return 0; // Failed to reserve space, just exit
}
e = ptr; // Cast to our event struct
// Populate the event data
e->timestamp = bpf_ktime_get_ns();
e->pid = bpf_get_current_pid_tgid() >> 32; // Get PID from TGID
bpf_get_current_comm(&e->comm, sizeof(e->comm)); // Get process name
// Safely read the filename argument from the syscall context
// The filename is the second argument to openat
const char *filename_ptr = (const char *)ctx->args[1];
bpf_probe_read_user_str(e->filename, sizeof(e->filename), filename_ptr);
// Submit the event to the ring buffer
bpf_ringbuf_submit(e, 0);
return 0;
}
char _license[] SEC("license") = "GPL";
This C code compiles into BPF bytecode. You’d then use a tool like bpftool or a userspace library (like libbpf) to load this into the kernel and attach it to the sys_enter_openat tracepoint.
On the userspace side, you’d have a program that continuously reads from the rb ring buffer, decodes the struct event, and then applies your detection logic. For example, it might check if e->filename matches /etc/shadow and if e->comm is something unexpected like nc or wget.
import ctypes
import sys
import time
from bcc import BPF
# Python code to read from the ring buffer and apply logic
bpf_text = """
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
struct event {
u64 timestamp;
int pid;
char comm[TASK_COMM_LEN];
char filename[256];
};
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 256 * 1024);
} rb SEC(".maps");
SEC("tp/syscalls/sys_enter_openat")
int handle_openat(struct trace_event_raw_sys_enter *ctx) {
struct event *e;
void *ptr;
ptr = bpf_ringbuf_reserve(&rb, sizeof(struct event), 0);
if (!ptr) {
return 0;
}
e = ptr;
e->timestamp = bpf_ktime_get_ns();
e->pid = bpf_get_current_pid_tgid() >> 32;
bpf_get_current_comm(&e->comm, sizeof(e->comm));
const char *filename_ptr = (const char *)ctx->args[1];
bpf_probe_read_user_str(e->filename, sizeof(e->filename), filename_ptr);
bpf_ringbuf_submit(e, 0);
return 0;
}
char _license[] SEC("license") = "GPL";
"""
# Load BPF program
b = BPF(text=bpf_text)
# Define the event structure for Python
class Event(ctypes.Structure):
_fields_ = [
("timestamp", ctypes.c_ulonglong),
("pid", ctypes.c_int),
("comm", ctypes.c_char * 16), # TASK_COMM_LEN is usually 16
("filename", ctypes.c_char * 256),
]
# Open the ring buffer map
rb = b.get_map("rb")
# Callback function for processing events
def print_event(cpu, data, size):
event = ctypes.cast(data, ctypes.POINTER(Event)).contents
print(f"[{time.strftime('%H:%M:%S')}] PID: {event.pid} COMM: {event.comm.decode()} FILENAME: {event.filename.decode()}")
# Add your intrusion detection logic here
if b"/etc/shadow" in event.filename and event.comm.decode() not in ["sshd", "login", "vim", "nano"]: # Example sensitive file check
print(f"ALERT: Suspicious access to {event.filename.decode()} by {event.comm.decode()} (PID: {event.pid})")
# Attach the callback to the ring buffer
rb.open_perf_buffer(print_event)
print("Listening for file open events. Press Ctrl+C to stop.")
try:
while 1:
# Poll the ring buffer for new events
b.perf_buffer_poll()
except KeyboardInterrupt:
print("Stopping.")
When you run this Python script, you’ll start seeing output like this:
Listening for file open events. Press Ctrl+C to stop.
[14:30:01] PID: 12345 COMM: bash FILENAME: /proc/meminfo
[14:30:01] PID: 12345 COMM: bash FILENAME: /usr/lib/locale/locale-archive
[14:30:02] PID: 67890 COMM: nginx FILENAME: /var/log/nginx/access.log
[14:30:03] PID: 11223 COMM: vi FILENAME: /etc/passwd
[14:30:04] PID: 11224 COMM: nc FILENAME: /etc/shadow
ALERT: Suspicious access to /etc/shadow by nc (PID: 11224)
The power here is that you’re not waiting for auditd to log something, or for an application to generate an error. You’re seeing system calls as they happen, directly from the kernel. This allows for near real-time detection of anomalous behavior.
The core problem this solves is visibility. Traditional logging and monitoring tools often operate at a higher level of abstraction. They might tell you "a process failed," but not why or what exact kernel action led to that failure. eBPF gives you direct access to the kernel’s execution context.
Internally, eBPF works by allowing you to load small, sandboxed programs into the kernel. These programs can attach to various hook points, like tracepoints (events emitted by the kernel), kprobes (kernel function entry/exit), or uprobes (userspace function entry/exit). When the hooked event occurs, your eBPF program is executed. It can then inspect kernel data structures, make decisions, and even send data back to userspace via maps (like the ring buffer we used). The kernel verifies the safety of the eBPF program before loading it, ensuring it can’t crash the system.
The exact levers you control are the hook points you choose and the logic within your eBPF program. You can hook into thousands of different kernel events: network packets arriving (xdp, kprobe/tcp_v4_rcv), file system operations (sys_enter_read, sys_enter_write), process creation (sys_enter_clone), scheduler events, and much more. Your eBPF program then decides what data to collect and what conditions constitute an "event" or "alert."
The most surprising thing is how much fine-grained control you have over which arguments of a syscall are accessible. For openat, the filename is the second argument (args[1]). But for other syscalls, like execve, the arguments are a bit more complex: filename (args[0]), argv (args[1]), and envp (args[2]). Each of these argv and envp are pointers to arrays of pointers. To get the actual strings, you need to read the pointer to the array, then iterate through the array reading each pointer to get the string itself. bpf_probe_read_user_str is your friend for string-like data, but for array structures, you’ll be using bpf_probe_read.
The next concept you’ll run into is correlating events across different syscalls or even different machines.