Argo CD can be upgraded with zero downtime because its core components are designed for high availability and graceful restarts.
Let’s see Argo CD in action during an upgrade. Imagine we have a running Argo CD instance, and we want to upgrade from version v2.5.0 to v2.6.1.
First, we need to ensure our current Argo CD installation is healthy and has multiple replicas for its core components: API server, controller, and repository server.
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o jsonpath='{.items[*].metadata.name}'
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-controller -o jsonpath='{.items[*].metadata.name}'
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-repo-server -o jsonpath='{.items[*].metadata.name}'
You should see at least two pods for each of these deployments. If not, scale them up first:
kubectl scale deployment argocd-server -n argocd --replicas=2
kubectl scale deployment argocd-controller -n argocd --replicas=2
kubectl scale deployment argocd-repo-server -n argocd --replicas=2
Now, let’s simulate the upgrade. We’ll update the deployment manifests for each component to use the new image tag. For example, to upgrade the argocd-server:
# Before upgrade
apiVersion: apps/v1
kind: Deployment
metadata:
name: argocd-server
namespace: argocd
spec:
template:
spec:
containers:
- name: argocd-server
image: quay.io/argoproj/argocd:v2.5.0
# After upgrade
apiVersion: apps/v1
kind: Deployment
metadata:
name: argocd-server
namespace: argocd
spec:
template:
spec:
containers:
- name: argocd-server
image: quay.io/argoproj/argocd:v2.6.1
We apply this change using kubectl apply -f deployment-argocd-server.yaml. Argo CD deployments are configured with RollingUpdate strategy by default. This strategy ensures that old pods are terminated gradually as new ones are created.
The key to zero downtime lies in how Argo CD manages its state and how Kubernetes orchestrates the pods. The argocd-controller is the heart of Argo CD, constantly reconciling the desired state in Git with the actual state in the cluster. When a controller pod restarts, its counterpart takes over seamlessly. This is because the controller’s state is primarily derived from the Kubernetes API server and the Git repository, not from local storage that would be lost on restart.
Similarly, the argocd-server pods handle API requests. With multiple replicas, Kubernetes’ load balancer (typically Service argocd-server of type LoadBalancer or NodePort) will direct traffic only to healthy, running pods. As old argocd-server pods are terminated, new ones spin up and become ready to receive traffic. The terminationGracePeriodSeconds on the pods allows them to shut down gracefully, finishing any in-flight requests before exiting.
The argocd-repo-server fetches application manifests from Git. If one replica restarts, other replicas can continue serving manifest requests. Argo CD’s internal caching mechanisms also help mitigate brief interruptions during repo server restarts.
During the rolling update, you’ll observe pods being terminated and new ones starting. For example, you might see:
kubectl get pods -n argocd -w
Output might look like:
NAME READY STATUS RESTARTS AGE
argocd-server-7b7f7d5f6f-abcde 1/1 Running 0 5m
argocd-server-7b7f7d5f6f-fghij 1/1 Running 0 5m
argocd-controller-5d9d8c9f8c-klmn 1/1 Running 0 5m
argocd-controller-5d9d8c9f8c-opqrs 1/1 Running 0 5m
argocd-repo-server-6c8d7e6f5g-tuvw 1/1 Running 0 5m
argocd-repo-server-6c8d7e6f5g-wxyz 1/1 Running 0 5m
# After applying changes to argocd-server deployment
NAME READY STATUS RESTARTS AGE
argocd-server-7b7f7d5f6f-abcde 1/1 Terminating 0 6m
argocd-server-7b7f7d5f6f-fghij 1/1 Running 0 6m
argocd-server-7b7f7d5f6f-uvwxy 0/1 Pending 0 1s
argocd-server-7b7f7d5f6f-uvwxy 0/1 ContainerCreating 0 3s
argocd-server-7b7f7d5f6f-uvwxy 1/1 Running 0 10s
argocd-controller-5d9d8c9f8c-klmn 1/1 Running 0 6m
argocd-controller-5d9d8c9f8c-opqrs 1/1 Running 0 6m
argocd-repo-server-6c8d7e6f5g-tuvw 1/1 Running 0 6m
argocd-repo-server-6c8d7e6f5g-wxyz 1/1 Running 0 6m
Notice how one argocd-server pod is terminated while a new one is created and starts running. The load balancer ensures traffic is not sent to the terminating pod. This process repeats until all old pods are replaced by new ones running the upgraded version.
The state of your applications managed by Argo CD is stored in the Kubernetes API and your Git repository. This means that even if all Argo CD pods were to disappear momentarily, they can be recreated and resume their reconciliation process without losing track of application states, as long as the underlying Kubernetes cluster and Git repository are available.
The most important configuration to ensure during an upgrade is the RollingUpdate strategy for your Argo CD deployments, which is the default. If you’ve explicitly changed this to Recreate for any reason, you will experience downtime. The maxUnavailable and maxSurge parameters of the RollingUpdate strategy also play a role in how quickly the rollout happens and how many pods are unavailable at any given moment, but with defaults (typically maxUnavailable: 1 and maxSurge: 25%), downtime is avoided.
After the upgrade is complete, you might encounter a new set of application health checks or resource validation errors if the new Argo CD version has stricter policies or detects previously ignored issues in your application manifests.