Tail calls are the secret sauce that lets you chain eBPF programs together to build complex, multi-stage processing pipelines.
Let’s see this in action. Imagine we have two simple eBPF programs: one that marks packets with a specific flag, and another that inspects that flag and potentially drops the packet.
// prog1.c - Marks packets
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
SEC("classifier")
int mark_packet(struct __sk_buff *skb) {
// A simple way to mark a packet. In a real scenario, you'd use
// BPF_PROG_TYPE_SCHED_CLS or similar for actual classification.
// Here, we'll just use a helper that conceptually marks it.
// For demonstration, we'll use a map to store a "mark".
// In a real scenario, you'd use bpf_redirect, bpf_clone_redirect,
// or bpf_skb_change_proto to alter the packet, or simply return
// a specific action code.
// For this example, we'll simulate a mark by adding a value to the
// IP header's TOS field if it's an IPv4 packet.
if (skb->protocol == bpf_htons(ETH_P_IP)) {
struct iphdr *iph = bpf_skb_header_pointer(skb, 0, sizeof(*iph));
if (iph) {
iph->tos |= 0x01; // Set a custom flag
}
}
// This is the crucial part for chaining: tail call.
// We're calling prog2.
return bpf_tail_call(skb, &prog_array, 1); // Call prog at index 1
}
// prog2.c - Inspects and potentially drops packets
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
SEC("classifier") // This SEC name might need to match prog1's expectation or be generic
int inspect_and_drop(struct __sk_buff *skb) {
if (skb->protocol == bpf_htons(ETH_P_IP)) {
struct iphdr *iph = bpf_skb_header_pointer(skb, 0, sizeof(*iph));
if (iph) {
if (iph->tos & 0x01) { // Check for our custom flag
// If the flag is set, drop the packet.
return BPF_DROP;
}
}
}
// If the flag isn't set, allow the packet to continue.
return BPF_OK; // Or BPF_PASS, depending on the context.
}
// prog_array.bpf.h - Common header for the program array map
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(max_entries, 2);
__uint(key_size, sizeof(u32));
__uint(value_size, sizeof(u32));
} prog_array SEC(".maps");
// Helper to load programs and set up the map
// This part would be in your userspace loader (e.g., using libbpf)
/*
// Load prog1.o and prog2.o
// Get file descriptors for prog1_fd and prog2_fd
// Create and initialize the prog_array map
// prog_array_fd = bpf_map_create(...)
// Add prog1 to the map at index 0
u32 key0 = 0;
bpf_map_update_elem(prog_array_fd, &key0, &prog1_fd, BPF_ANY);
// Add prog2 to the map at index 1
u32 key1 = 1;
bpf_map_update_elem(prog_array_fd, &key1, &prog2_fd, BPF_ANY);
// Attach prog1 to a network interface (e.g., XDP_FLAGS_SKB_MODE)
// attach_prog(prog1_fd, ifindex, XDP_FLAGS_SKB_MODE);
*/
In this setup, prog1 is attached to an interface. When it receives a packet, it performs its initial check. If the packet is IPv4, it modifies the tos field. Then, instead of returning an action like BPF_PASS or BPF_DROP, it executes bpf_tail_call(skb, &prog_array, 1). This instruction tells the kernel: "Pause this program, take the current packet (skb), and immediately start executing the eBPF program found at index 1 in the prog_array map."
The prog_array map is a special eBPF map type (BPF_MAP_TYPE_PROG_ARRAY) that stores file descriptors of other eBPF programs. The bpf_tail_call function uses this map to find the next program to run. In our example, index 1 points to inspect_and_drop.
So, inspect_and_drop receives the packet after prog1 has potentially modified it. It checks the tos field. If the flag we set (0x01) is present, it returns BPF_DROP, and the packet is discarded. If the flag isn’t present (meaning prog1 didn’t set it, or it was an unrelated packet), inspect_and_drop returns BPF_OK, and the packet proceeds normally through the network stack.
The beauty of tail calls is that they don’t involve a context switch or a new stack frame for the called program. It’s a direct jump. The kernel effectively replaces the current program’s execution context with that of the called program. This makes them incredibly efficient, allowing you to chain many programs together without significant performance overhead. The total number of eBPF instructions executed across all tail-called programs in a single path is still subject to the overall eBPF instruction limit (typically 1 million), but each individual program can be much smaller and focused.
What most people don’t realize is that the bpf_tail_call function must be the last instruction executed in a program path if it’s intended to chain. If a program executes bpf_tail_call and then has any subsequent instructions in that same path, those instructions will never be reached. The program’s execution effectively ends, and the called program takes over. This is why bpf_tail_call is sometimes called a "goto" in eBPF; it transfers control unconditionally and permanently to the target program for that execution path.
The next logical step is to explore how to handle conditional tail calls or how to return control back to the caller if needed, which often involves using helper functions like bpf_redirect or more complex map lookups.