Cilium’s control plane is designed to be upgraded with zero impact on your running workloads, but the data plane (the eBPF programs in the kernel) needs a bit more care.

Let’s say you have a Kubernetes cluster running a few nginx pods. We’ll use this to illustrate the upgrade process.

Here’s a basic Cilium deployment using Helm:

helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.12.0 \
  --namespace kube-system \
  --set kubeProxyReplacement=strict \
  --set k8sServiceHost=$(kubectl -n kube-system get pod -l k8s-app=kube-apiserver -o jsonpath='{.items[0].status.podIP}') \
  --set k8sServicePort=$(kubectl -n kube-system get svc kube-dns -o jsonpath='{.spec.port}')

Now, imagine we want to upgrade Cilium from version 1.12.0 to 1.13.0.

First, we need to understand the upgrade process. Cilium’s Helm chart allows for rolling upgrades. This means that as new Cilium pods are deployed, old ones are terminated, and traffic is gradually shifted. The key to zero disruption lies in how Cilium manages its eBPF programs and how Kubernetes handles pod replacements.

Here’s a typical workflow you’d follow:

  1. Prepare for the Upgrade:

    • Review Release Notes: Always, always check the release notes for the target version. Look for any breaking changes, deprecated features, or specific upgrade instructions. For 1.13.0, you might see notes about changes in kubeProxyReplacement behavior or new eBPF features.
    • Backup your Configuration: Before any upgrade, back up your Helm release.
      helm get values cilium -n kube-system > cilium-values-backup.yaml
      
  2. Perform the Helm Upgrade: This is the core command. We’ll specify the new version and potentially update any values.

    helm upgrade cilium cilium/cilium --version 1.13.0 \
      --namespace kube-system \
      --set kubeProxyReplacement=strict \
      --set k8sServiceHost=$(kubectl -n kube-system get pod -l k8s-app=kube-apiserver -o jsonpath='{.items[0].status.podIP}') \
      --set k8sServicePort=$(kubectl -n kube-system get svc kube-dns -o jsonpath='{.spec.port}')
    
    • What’s Happening Here? Helm will create new Cilium pods with the 1.13.0 image. Kubernetes’ Deployment controller, by default, will perform a rolling update. It will start a new pod, wait for it to become ready, and then terminate an old pod. This process repeats until all old pods are replaced.
  3. Monitor the Rolling Update: Keep an eye on the Cilium pods in the kube-system namespace.

    kubectl get pods -n kube-system -l k8s-app=cilium -w
    

    You’ll see new pods appearing and old ones disappearing. The -w flag watches the output.

  4. Verify Connectivity: While the pods are rolling, continuously test your application’s connectivity.

    • Check Pod-to-Pod Communication:

      # Assuming you have two pods, pod-a and pod-b, in the same namespace 'my-app'
      kubectl exec -n my-app pod-a -- curl -s http://pod-b.my-app.svc.cluster.local
      

      Run this command repeatedly from one pod to another.

    • Check Service Access:

      # Access a service, e.g., your nginx service
      curl http://<your-nginx-service-ip>
      

      Again, repeat this to ensure no dropped connections.

    • Check Cilium Status:

      cilium status --all-pods -n kube-system
      

      This command will show the status of all Cilium agents. During the upgrade, you might see some agents reporting a different version temporarily, but they should all converge to the new version.

The magic behind zero disruption lies in Cilium’s eBPF program management and Kubernetes’ rolling update strategy. When a new Cilium agent pod starts, it loads its eBPF programs into the kernel. These new programs are designed to be compatible with the old ones during the transition. Traffic is routed based on the eBPF programs that are currently active. As old pods are terminated, their eBPF programs are unloaded. Kubernetes ensures that only a fraction of pods are down at any given moment, and Cilium’s data plane logic handles the seamless handover of traffic.

One aspect that often trips people up during upgrades is the kubeProxyReplacement setting. If you’re using strict mode, Cilium is entirely replacing kube-proxy. During an upgrade, the new Cilium pods need to re-establish the necessary eBPF rules that mimic kube-proxy functionality. If there’s any hiccup in the new Cilium agent starting up or loading its eBPF programs correctly, your services might become temporarily unreachable. This is why monitoring cilium status and performing connectivity checks during the rollout is crucial.

The next hurdle you’ll likely face after a successful Cilium upgrade is managing custom eBPF programs or advanced network policies that might have specific version dependencies.

Want structured learning?

Take the full Cilium course →