Drone CI’s default configuration is a single point of failure.
Here’s how a typical Drone CI build flows:
- User pushes code: A Git webhook triggers Drone.
- Drone scheduler: The
drone-serverreceives the webhook, schedules the build, and stores build metadata in the database. - Drone agent: A
drone-agent(ordrone-runner) polls thedrone-serverfor pending builds. - Build execution: The agent pulls the Docker image specified in the
.drone.yml, clones the repository, and executes the build steps within a Docker container. - Build logs/artifacts: The agent streams logs and uploads artifacts back to the
drone-server.
This setup works fine for a single developer or a small team, but if your drone-server or a critical drone-agent goes down, your entire CI/CD pipeline grinds to a halt.
To achieve high availability, we need to eliminate these single points of failure. This means running multiple instances of drone-server and ensuring drone-agents can be dynamically scaled and managed.
Multi-Instance Drone Server
The drone-server itself can be run as multiple instances behind a load balancer. The key here is that drone-server instances are stateless regarding build execution. They rely on the database for build state and the agents for actual work.
Configuration:
- Database: Use a highly available database solution (e.g., AWS RDS Multi-AZ, Google Cloud SQL HA, or a self-hosted PostgreSQL/MySQL cluster with replication).
- Load Balancer: A standard TCP load balancer (like HAProxy, Nginx, or a cloud provider’s LB service) is sufficient. It should distribute traffic across your
drone-serverinstances. - Drone Server Configuration: Each
drone-serverinstance needs to share the same database credentials and secrets.
Example docker-compose.yml snippet for multiple servers:
version: '3.7'
services:
drone-server-1:
image: drone/server:latest
restart: always
environment:
- DRONE_DATABASE_DRIVER=postgres
- DRONE_DATABASE_DATASOURCE=postgres://user:password@host:port/database?sslmode=disable
- DRONE_SECRET=your_super_secret_signing_key # Must be identical across all servers
- DRONE_GITHUB=true
- DRONE_GITHUB_CLIENT_ID=your_github_client_id
- DRONE_GITHUB_CLIENT_SECRET=your_github_client_secret
- DRONE_GIT_ALWAYS_CLONE=true # Optional, but good for HA to ensure fresh clone
networks:
- drone
ports:
- "8080:80" # This port is for the load balancer to access
drone-server-2:
image: drone/server:latest
restart: always
environment:
- DRONE_DATABASE_DRIVER=postgres
- DRONE_DATABASE_DATASOURCE=postgres://user:password@host:port/database?sslmode=disable
- DRONE_SECRET=your_super_secret_signing_key # Must be identical across all servers
- DRONE_GITHUB=true
- DRONE_GITHUB_CLIENT_ID=your_github_client_id
- DRONE_GITHUB_CLIENT_SECRET=your_github_client_secret
- DRONE_GIT_ALWAYS_CLONE=true
networks:
- drone
ports:
- "8081:80" # This port is for the load balancer to access
# ... more drone-server instances as needed
networks:
drone:
In this setup, you’d point your load balancer to drone-server-1:8080 and drone-server-2:8081. All drone-server instances share the same DRONE_SECRET for JWT signing, ensuring they can validate each other’s requests and build metadata.
Dynamic Agent Scaling with Kubernetes
The drone-agent is where the actual build work happens. For high availability, we want this to be a dynamic, scalable pool. The best way to achieve this is by running Drone with the Kubernetes execution driver.
How it works:
When a drone-server schedules a build, it tells the drone-agent (which is now running as a Kubernetes deployment) to create a new build pod. The drone-agent then uses the Kubernetes API to spin up a new pod specifically for that build, mounting the necessary volumes and running the build steps. When the build is complete, the build pod is terminated.
Configuration:
- Kubernetes Cluster: You need a functioning Kubernetes cluster.
- Drone Server Configuration: The
drone-serverneeds to be configured to use the Kubernetes execution driver. - Drone Agent Deployment: The
drone-agentis deployed as a Kubernetes Deployment, configured to watch for build requests from thedrone-server.
Example drone-server Kubernetes deployment (simplified):
apiVersion: apps/v1
kind: Deployment
metadata:
name: drone-server
namespace: drone
spec:
replicas: 2 # Run multiple drone-server instances
selector:
matchLabels:
app: drone-server
template:
metadata:
labels:
app: drone-server
spec:
containers:
- name: drone-server
image: drone/server:latest
ports:
- containerPort: 80
env:
- name: DRONE_DATABASE_DRIVER
value: "postgres"
- name: DRONE_DATABASE_DATASOURCE
valueFrom:
secretKeyRef:
name: drone-db-secret
key: datasource
- name: DRONE_SECRET
valueFrom:
secretKeyRef:
name: drone-secret
key: secret
- name: DRONE_GITHUB
value: "true"
- name: DRONE_GITHUB_CLIENT_ID
valueFrom:
secretKeyRef:
name: drone-github-secret
key: client_id
- name: DRONE_GITHUB_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: drone-github-secret
key: client_secret
- name: DRONE_GIT_ALWAYS_CLONE
value: "true"
# Enable Kubernetes execution driver
- name: DRONE_EXEC_DRIVER
value: "kubernetes"
# Point to the drone-agent service within Kubernetes
- name: DRONE_KUBERNETES_NAMESPACE
value: "drone" # Namespace where drone-agent is deployed
- name: DRONE_KUBERNETES_SERVICE_ACCOUNT
value: "drone-agent" # Service account for drone-agent
# Optional: Configure resource requests/limits for build pods
- name: DRONE_KUBERNETES_DEFAULT_CPU_LIMIT
value: "1"
- name: DRONE_KUBERNETES_DEFAULT_MEMORY_LIMIT
value: "1Gi"
---
apiVersion: v1
kind: Service
metadata:
name: drone-server
namespace: drone
spec:
selector:
app: drone-server
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer # Or ClusterIP if using an external ingress controller
Example drone-agent Kubernetes deployment (simplified):
apiVersion: apps/v1
kind: Deployment
metadata:
name: drone-agent
namespace: drone
spec:
replicas: 1 # Usually one replica, as it's stateless and just watches API
selector:
matchLabels:
app: drone-agent
template:
metadata:
labels:
app: drone-agent
spec:
serviceAccountName: drone-agent # Ensure this SA exists and has permissions
containers:
- name: drone-agent
image: drone/agent:latest
env:
- name: DRONE_SERVER
value: "http://drone-server.drone.svc.cluster.local" # Internal service name
- name: DRONE_TOKEN # A shared secret token for auth
valueFrom:
secretKeyRef:
name: drone-agent-secret
key: token
- name: DRONE_KUBERNETES_ENABLED
value: "true"
# Optional: Configure resource requests/limits for build pods
- name: DRONE_KUBERNETES_DEFAULT_CPU_LIMIT
value: "1"
- name: DRONE_KUBERNETES_DEFAULT_MEMORY_LIMIT
value: "1Gi"
With the Kubernetes driver, Drone can dynamically provision build pods. If your drone-server instances are scaled (e.g., replicas: 2 in the drone-server deployment) and behind a Kubernetes Service of type LoadBalancer or an Ingress controller, you have a highly available CI system. The agents (which are now just pods watching the API) will automatically register with whichever drone-server instance is available.
The real magic of the Kubernetes driver is that it offloads the build execution to Kubernetes’ native scheduling and scaling capabilities. If you have a surge of builds, Kubernetes will spin up more build pods as needed, and if you have idle periods, it scales down to zero.
The next thing you’ll likely run into is managing secrets effectively within Kubernetes for both the server and the agent, and configuring RBAC for the drone-agent’s service account to have the necessary permissions to create and delete pods in the drone namespace.