Drone CI’s default configuration is a single point of failure.

Here’s how a typical Drone CI build flows:

  1. User pushes code: A Git webhook triggers Drone.
  2. Drone scheduler: The drone-server receives the webhook, schedules the build, and stores build metadata in the database.
  3. Drone agent: A drone-agent (or drone-runner) polls the drone-server for pending builds.
  4. Build execution: The agent pulls the Docker image specified in the .drone.yml, clones the repository, and executes the build steps within a Docker container.
  5. Build logs/artifacts: The agent streams logs and uploads artifacts back to the drone-server.

This setup works fine for a single developer or a small team, but if your drone-server or a critical drone-agent goes down, your entire CI/CD pipeline grinds to a halt.

To achieve high availability, we need to eliminate these single points of failure. This means running multiple instances of drone-server and ensuring drone-agents can be dynamically scaled and managed.

Multi-Instance Drone Server

The drone-server itself can be run as multiple instances behind a load balancer. The key here is that drone-server instances are stateless regarding build execution. They rely on the database for build state and the agents for actual work.

Configuration:

  • Database: Use a highly available database solution (e.g., AWS RDS Multi-AZ, Google Cloud SQL HA, or a self-hosted PostgreSQL/MySQL cluster with replication).
  • Load Balancer: A standard TCP load balancer (like HAProxy, Nginx, or a cloud provider’s LB service) is sufficient. It should distribute traffic across your drone-server instances.
  • Drone Server Configuration: Each drone-server instance needs to share the same database credentials and secrets.

Example docker-compose.yml snippet for multiple servers:

version: '3.7'

services:
  drone-server-1:
    image: drone/server:latest
    restart: always
    environment:
      - DRONE_DATABASE_DRIVER=postgres
      - DRONE_DATABASE_DATASOURCE=postgres://user:password@host:port/database?sslmode=disable
      - DRONE_SECRET=your_super_secret_signing_key # Must be identical across all servers
      - DRONE_GITHUB=true
      - DRONE_GITHUB_CLIENT_ID=your_github_client_id
      - DRONE_GITHUB_CLIENT_SECRET=your_github_client_secret
      - DRONE_GIT_ALWAYS_CLONE=true # Optional, but good for HA to ensure fresh clone
    networks:
      - drone
    ports:
      - "8080:80" # This port is for the load balancer to access

  drone-server-2:
    image: drone/server:latest
    restart: always
    environment:
      - DRONE_DATABASE_DRIVER=postgres
      - DRONE_DATABASE_DATASOURCE=postgres://user:password@host:port/database?sslmode=disable
      - DRONE_SECRET=your_super_secret_signing_key # Must be identical across all servers
      - DRONE_GITHUB=true
      - DRONE_GITHUB_CLIENT_ID=your_github_client_id
      - DRONE_GITHUB_CLIENT_SECRET=your_github_client_secret
      - DRONE_GIT_ALWAYS_CLONE=true
    networks:
      - drone
    ports:
      - "8081:80" # This port is for the load balancer to access

  # ... more drone-server instances as needed

networks:
  drone:

In this setup, you’d point your load balancer to drone-server-1:8080 and drone-server-2:8081. All drone-server instances share the same DRONE_SECRET for JWT signing, ensuring they can validate each other’s requests and build metadata.

Dynamic Agent Scaling with Kubernetes

The drone-agent is where the actual build work happens. For high availability, we want this to be a dynamic, scalable pool. The best way to achieve this is by running Drone with the Kubernetes execution driver.

How it works:

When a drone-server schedules a build, it tells the drone-agent (which is now running as a Kubernetes deployment) to create a new build pod. The drone-agent then uses the Kubernetes API to spin up a new pod specifically for that build, mounting the necessary volumes and running the build steps. When the build is complete, the build pod is terminated.

Configuration:

  1. Kubernetes Cluster: You need a functioning Kubernetes cluster.
  2. Drone Server Configuration: The drone-server needs to be configured to use the Kubernetes execution driver.
  3. Drone Agent Deployment: The drone-agent is deployed as a Kubernetes Deployment, configured to watch for build requests from the drone-server.

Example drone-server Kubernetes deployment (simplified):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: drone-server
  namespace: drone
spec:
  replicas: 2 # Run multiple drone-server instances
  selector:
    matchLabels:
      app: drone-server
  template:
    metadata:
      labels:
        app: drone-server
    spec:
      containers:
      - name: drone-server
        image: drone/server:latest
        ports:
        - containerPort: 80
        env:
        - name: DRONE_DATABASE_DRIVER
          value: "postgres"
        - name: DRONE_DATABASE_DATASOURCE
          valueFrom:
            secretKeyRef:
              name: drone-db-secret
              key: datasource
        - name: DRONE_SECRET
          valueFrom:
            secretKeyRef:
              name: drone-secret
              key: secret
        - name: DRONE_GITHUB
          value: "true"
        - name: DRONE_GITHUB_CLIENT_ID
          valueFrom:
            secretKeyRef:
              name: drone-github-secret
              key: client_id
        - name: DRONE_GITHUB_CLIENT_SECRET
          valueFrom:
            secretKeyRef:
              name: drone-github-secret
              key: client_secret
        - name: DRONE_GIT_ALWAYS_CLONE
          value: "true"
        # Enable Kubernetes execution driver
        - name: DRONE_EXEC_DRIVER
          value: "kubernetes"
        # Point to the drone-agent service within Kubernetes
        - name: DRONE_KUBERNETES_NAMESPACE
          value: "drone" # Namespace where drone-agent is deployed
        - name: DRONE_KUBERNETES_SERVICE_ACCOUNT
          value: "drone-agent" # Service account for drone-agent
        # Optional: Configure resource requests/limits for build pods
        - name: DRONE_KUBERNETES_DEFAULT_CPU_LIMIT
          value: "1"
        - name: DRONE_KUBERNETES_DEFAULT_MEMORY_LIMIT
          value: "1Gi"

---
apiVersion: v1
kind: Service
metadata:
  name: drone-server
  namespace: drone
spec:
  selector:
    app: drone-server
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: LoadBalancer # Or ClusterIP if using an external ingress controller

Example drone-agent Kubernetes deployment (simplified):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: drone-agent
  namespace: drone
spec:
  replicas: 1 # Usually one replica, as it's stateless and just watches API
  selector:
    matchLabels:
      app: drone-agent
  template:
    metadata:
      labels:
        app: drone-agent
    spec:
      serviceAccountName: drone-agent # Ensure this SA exists and has permissions
      containers:
      - name: drone-agent
        image: drone/agent:latest
        env:
        - name: DRONE_SERVER
          value: "http://drone-server.drone.svc.cluster.local" # Internal service name
        - name: DRONE_TOKEN # A shared secret token for auth
          valueFrom:
            secretKeyRef:
              name: drone-agent-secret
              key: token
        - name: DRONE_KUBERNETES_ENABLED
          value: "true"
        # Optional: Configure resource requests/limits for build pods
        - name: DRONE_KUBERNETES_DEFAULT_CPU_LIMIT
          value: "1"
        - name: DRONE_KUBERNETES_DEFAULT_MEMORY_LIMIT
          value: "1Gi"

With the Kubernetes driver, Drone can dynamically provision build pods. If your drone-server instances are scaled (e.g., replicas: 2 in the drone-server deployment) and behind a Kubernetes Service of type LoadBalancer or an Ingress controller, you have a highly available CI system. The agents (which are now just pods watching the API) will automatically register with whichever drone-server instance is available.

The real magic of the Kubernetes driver is that it offloads the build execution to Kubernetes’ native scheduling and scaling capabilities. If you have a surge of builds, Kubernetes will spin up more build pods as needed, and if you have idle periods, it scales down to zero.

The next thing you’ll likely run into is managing secrets effectively within Kubernetes for both the server and the agent, and configuring RBAC for the drone-agent’s service account to have the necessary permissions to create and delete pods in the drone namespace.

Want structured learning?

Take the full Drone course →