CO-RE (Compile Once - Run Everywhere) in eBPF is a lie, at least at first glance, because your eBPF programs will fail if you don’t manage kernel version compatibility. The system’s kernel headers change, and your eBPF program, compiled against one set of headers, might try to access memory structures that don’t exist or have moved in another kernel.
Let’s say you have an eBPF program that needs to read the task_struct in the kernel to get process information. A common way to do this is by using bpf_get_current_pid_tgid() to get the PID and then traversing a linked list of tasks. However, the offsets and even the presence of certain fields within task_struct can vary between kernel versions. For example, the comm field (process name) might be at offset 1024 in kernel 5.4 but at 1104 in kernel 5.15. If your program is compiled against 5.4 headers and tries to read task_struct->comm at offset 1024 on a 5.15 kernel, it’s going to read garbage or, more likely, crash the verifier or even the kernel itself.
Here’s how CO-RE tackles this, and the common pitfalls:
1. BTF (BPF Type Format) is Your Friend
The core of CO-RE’s magic is BTF. BTF is a data format that describes C types (structs, enums, etc.) in a machine-readable way. The kernel embeds BTF information about its own data structures. When you compile your eBPF program with CO-RE, it doesn’t hardcode offsets. Instead, it uses libbpf’s BTF-aware features to look up type information at runtime from the kernel’s BTF data.
- Diagnosis: If your program fails to load, and you see errors related to
invalid typeorrelocation target not found, it’s often because the kernel’s BTF information is missing or incomplete for the structures your program uses. - Check: Use
bpftool btf dump id <type_id>to inspect the BTF information for a specific type (e.g.,task_struct). You can find the type ID by looking at the output ofbpftool btf dump file /sys/kernel/btf/vmlinux. - Fix: Ensure your kernel has BTF enabled and that the
/sys/kernel/btf/vmlinuxfile exists and is populated. This is usually controlled by kernel configuration options likeCONFIG_BTF=yandCONFIG_DEBUG_INFO_BTF=y. If they are not enabled, you’ll need to recompile your kernel. - Why it works: BTF provides a universal, self-describing way for eBPF programs to query kernel data structure layouts without needing to know the exact kernel version beforehand.
2. BTF-Generated Type Information
When you compile your eBPF program, you generate BTF information for your own eBPF types. This BTF information is then embedded within your eBPF object file. Libbpf uses this to understand how your eBPF program wants to interact with kernel BTF.
- Diagnosis: Errors like
invalid field accessorfailed to find struct fieldduring program loading point to mismatches between the BTF information your eBPF program expects and what the kernel provides. - Check: Use
bpftool prog load <your_bpf_obj> /sys/fs/bpf/your_progand examine the error messages. If you’re usingbpftool, you can also inspect the BTF embedded in your object file withbpftool btf dump file <your_bpf_obj>. - Fix: Ensure your build system is correctly generating and embedding BTF for your eBPF program. For projects using
clangandlibbpf, this typically involves passing flags like-g -fno-omit-frame-pointer -target bpf -D__TARGET_ARCH_x86(or your architecture) and linking withlibbpf’s helpers that understand BTF. Thebpftoolcommand itself is often used to generate BTF for your object file during development (bpftool gen skeleton your_bpf_obj > your_bpf_skel.h). - Why it works: By embedding BTF for your program’s types, you allow libbpf to match your program’s expectations against the kernel’s BTF descriptions, enabling it to dynamically resolve field accesses.
3. Kernel Headers vs. BTF
Historically, eBPF development relied heavily on kernel headers. CO-RE aims to move away from this. You still need kernel headers for compiling your userspace helpers and sometimes for defining your eBPF program’s types (though these can also be generated from BTF). The critical part is that your eBPF runtime shouldn’t depend on the exact kernel headers you compiled against.
- Diagnosis: If your eBPF program compiles fine but fails on a specific kernel version, and you’re not using BTF for structure access (e.g., you’re using direct memory access with hardcoded offsets), this is the problem.
- Check: Look at your eBPF C code. Are you directly dereferencing pointers with fixed offsets (e.g.,
*(u32 *)(task + 1024))? If so, you’re not using CO-RE properly. - Fix: Adopt libbpf’s
struct_opsorbpf_core_read()helpers. For example, instead of*(u32 *)(task + 1024), usebpf_core_read(&pid, sizeof(pid), task, offsetof(struct task_struct, pid));. This tells libbpf to look up thepidfield’s offset from thetask_structdefinition in the kernel’s BTF. - Why it works:
bpf_core_readand similar helpers leverage BTF to abstract away the physical location of fields within kernel data structures, making your eBPF program portable across different kernel versions.
4. Using bpftool for Type Information
bpftool is indispensable for debugging CO-RE issues. It can dump BTF, help you understand type relationships, and even generate skeleton code.
- Diagnosis: Any ambiguity about how kernel structures are laid out or what types are available is a prime candidate for
bpftoolinvestigation. - Check: Use
bpftool btf dump file /sys/kernel/btf/vmlinuxto see all available BTF types. Then, usebpftool btf show type <type_name>(e.g.,bpftool btf show type task_struct) to inspect a specific type. - Fix: If a required type is missing or malformed in BTF, it indicates a kernel build issue or a very old kernel that doesn’t support BTF well. You might need to update your kernel or ensure
CONFIG_DEBUG_INFO_BTF=yis set. - Why it works:
bpftoolprovides a direct window into the kernel’s understanding of its own types, allowing you to verify that the information your eBPF program needs is actually present and correct.
5. Kernel Version Specific Workarounds
Sometimes, despite BTF, there are subtle differences or missing BTF for certain structures in older kernels. CO-RE’s CO-RE Relocation feature can help here. Libbpf can automatically apply relocations to your eBPF program based on the target kernel’s BTF and a set of predefined relocation rules.
- Diagnosis: Your program loads, but behaves unexpectedly or crashes on specific kernel versions, and BTF seems mostly correct. This might be due to a minor structural change that BTF alone doesn’t fully resolve for your specific access pattern.
- Check: Examine your eBPF object file for relocation entries using
bpftool bpf show object <your_bpf_obj>. Look forCO-RE relocationentries. - Fix: Ensure you are compiling with
libbpfandclangthat support CO-RE relocations. Often, this is automatic when using modernlibbpfversions and appropriate compiler flags. If a specific kernel version is problematic, you might need to add a custom relocation rule to yourlibbpfbuild orbpftoolconfiguration, though this is advanced. More commonly, it means ensuring your program usesbpf_core_readand relies on BTF lookup rather than hardcoded offsets. - Why it works: CO-RE relocations allow libbpf to patch the eBPF bytecode at load time, adjusting memory accesses based on discovered kernel structure layouts and predefined correction rules for known kernel differences.
6. struct_ops for Advanced Field Access
For complex scenarios where simple bpf_core_read isn’t enough, struct_ops provides a more structured way to define how to read fields from kernel structures.
- Diagnosis: You’re trying to access fields that are deeply nested or conditional within kernel structures, and
bpf_core_readis becoming unwieldy or insufficient. - Check: Review your eBPF code for complex memory access patterns.
- Fix: Define a
struct_opstable in your eBPF program that maps field names to functions or offsets that can reliably retrieve that field’s value, regardless of the kernel version. Libbpf will use this table and the kernel’s BTF to perform the correct reads. - Why it works:
struct_opsallows you to define a portable interface for accessing kernel structure members, abstracting away the underlying implementation details that vary between kernel versions.
The next error you’ll hit after fixing kernel version compatibility is likely related to insufficient BPF verifier resources or hitting BPF helper function limitations, as your program now successfully loads but might be too complex or inefficient.