Debug Cilium Datapath Connectivity Issues Step by Step (2026)

Cilium’s datapath isn’t just a black box of eBPF; it’s a sophisticated system where network packets are intercepted, inspected, and rewritten on the fly, making traditional debugging tools like tcpdump only a partial solution.

Let’s say you’ve got pods that can’t talk to each other, and you’ve ruled out the obvious like incorrect service definitions or network policies. The problem is likely in how Cilium is programming the kernel’s network stack via eBPF.

Common Causes and Fixes for Datapath Connectivity Issues

Incorrect eBPF Program Attachment:
- Diagnosis: Cilium programs are attached to various kernel network hooks. If a program isn’t attached where expected, traffic won’t be processed correctly. Check the status of your eBPF programs using bpftool prog list. Look for programs with names like cilium_l3_ingress or cilium_l4_xmit and verify their attach points (e.g., xdp on an interface, cgroup on a network namespace).
- Fix: This is usually a symptom of a larger Cilium agent issue. Restarting the Cilium agent pod (kubectl delete pod -n kube-system cilium-<node-name>) often resolves this by forcing it to re-initialize and re-attach its eBPF programs.
- Why it works: The agent is responsible for loading and attaching eBPF programs to the correct kernel hooks. A restart ensures a clean slate for this process.
Datapath Synchronization Delays or Failures:
- Diagnosis: Cilium agents need to synchronize their desired datapath state (e.g., service translations, endpoint IPs, policy rules) with the actual eBPF programs running in the kernel. Check the Cilium agent logs for messages indicating datapath synchronization problems, often mentioning Stuck or Failed. You can also check the agent’s status endpoint: kubectl exec -n kube-system <cilium-pod-name> -- cilium status. Look for Datapath: OK.
- Fix: Ensure your Kubernetes API server is reachable and healthy from the Cilium agent pods. If there are network issues between the agent and the API server, synchronization will fail. Increasing the agent’s sync-interval (e.g., spec.initContainers[0].env for CILIUM_EXTRA_ARGS to include --sync-interval 60s) can sometimes help if the network is intermittently flaky, but it’s a workaround, not a fix for underlying connectivity.
- Why it works: A stable connection to the API server allows the agent to continuously receive updates and program the datapath correctly.
IP Address Management (IPAM) Conflicts or Exhaustion:
- Diagnosis: Cilium assigns IPs to pods. If the IPAM configuration is incorrect (e.g., kube-proxy-replacement=strict without proper host IP configuration) or if the allocated IP pool is exhausted, new pods might not get IPs, or existing ones might have incorrect ones, breaking connectivity. Check cilium status for IPAM details and look for IPAM: OK. Also, inspect pod IPs using kubectl get pods -o wide.
- Fix: Verify your CiliumNetworkConfig for the IPAM settings, especially ipv4.allocator and ipv4.pool. If a pool is exhausted, you’ll need to expand it or reconfigure. For example, if you’re using kubernetes IPAM, ensure your cluster-pool-ipv4-cidr in the Cilium Helm values or CiliumNetworkConfig is sufficiently large.
- Why it works: Proper IPAM ensures each pod gets a unique, routable IP address that the datapath can use for forwarding decisions.
BGP Control Plane Issues (if using BGP):
- Diagnosis: If you’re using Cilium’s BGP capabilities to advertise pod CIDRs to your network, BGP peering issues can cause external connectivity problems. Check the BGP status within the Cilium agent: kubectl exec -n kube-system <cilium-pod-name> -- cilium bgp dump peers. Look for established sessions and successful route advertisements.
- Fix: Ensure your BGP router configuration matches the BGPPeeringPolicy defined in your Cilium configuration, including AS numbers and neighbor IPs. Verify network reachability between the Cilium agent and your BGP peers.
- Why it works: BGP is how Cilium tells your physical network how to route traffic to your pods. If BGP isn’t working, external routers won’t know where to send traffic.
eBPF Map Corruption or Incorrect State:
- Diagnosis: eBPF maps are key-value stores used by eBPF programs to maintain state (e.g., service backend translations, connection tracking). If a map gets corrupted or contains stale data, traffic can be misrouted. Use bpftool map list to see available maps and bpftool map dump id <map_id> to inspect their contents. Look for maps like cilium_svc_map or cilium_conn_track.
- Fix: Restarting the Cilium agent pod (kubectl delete pod -n kube-system cilium-<node-name>) is the most common way to clear and re-populate these maps. In rare, persistent cases, you might need to manually clear specific maps using bpftool map delete ..., but this is advanced and risky.
- Why it works: Restarting the agent forces it to re-initialize the maps with the current state from the Kubernetes API.
Network Policy Enforcement Misconfiguration:
- Diagnosis: While not strictly a "datapath failure," an overly restrictive or incorrectly configured network policy can appear as a connectivity issue. Use cilium policy get <pod-name> to view policies applied to a pod. Then, use cilium monitor --pod <pod-name> to see if traffic is being dropped by policy enforcement.
- Fix: Review your CiliumNetworkPolicy or CiliumClusterwideNetworkPolicy resources. Ensure the selectors match the intended pods and that the egress/ingress rules accurately reflect the required communication. For example, if pod A needs to reach pod B on port 80, ensure there’s an ingress rule on pod B allowing traffic from pod A on port 80, and an egress rule on pod A allowing traffic to pod B on port 80.
- Why it works: Network policies are enforced by eBPF programs that act as gatekeepers for traffic. If a policy denies traffic, the eBPF program drops the packet.

After fixing these, the next error you’ll likely encounter is a DNS resolution issue if your CNI isn’t correctly configured to handle DNS traffic or if CoreDNS itself is having problems.