A drone CI pipeline step is failing because the drone-runner process, responsible for executing build jobs, is unable to communicate with the Drone server API. This usually manifests as a connection refused or timeout error from the runner’s perspective.

Common Causes and Fixes

  1. Network Connectivity Between Runner and Server

    • Diagnosis: From the host where the drone-runner is running, try to curl the Drone server’s API endpoint.
      curl -v http://drone.example.com:8000/api/v1/ping
      
      (Replace drone.example.com:8000 with your Drone server’s actual address and port.)
    • Fix: Ensure that the drone-runner host can reach the Drone server’s IP address and port. This might involve:
      • Firewall Rules: Open the Drone server’s port (default 8000) on the server’s firewall.
      • DNS Resolution: If using a hostname, verify DNS is correctly resolving.
      • Network ACLs/Security Groups: If running in a cloud environment, ensure your network security groups or ACLs allow ingress traffic on port 8000 from the drone-runner’s IP.
    • Why it works: The runner needs to establish a TCP connection to the server’s API to receive job instructions and report status. If this connection is blocked or fails, the runner cannot proceed.
  2. Incorrect Drone Server Address/Port in Runner Configuration

    • Diagnosis: Check the DRONE_SERVER environment variable or configuration file for the drone-runner.
      # Example for docker/kubernetes runner
      echo $DRONE_SERVER
      # Or check the runner's configuration file if applicable
      
    • Fix: Update the DRONE_SERVER environment variable to the correct address and port of your Drone server. For example, if your Drone server is at http://192.168.1.100:8000, set:
      export DRONE_SERVER=http://192.168.1.100:8000
      
      If using HTTPS, ensure the scheme is https and the port is correct (usually 443).
    • Why it works: The runner uses this URL to find and communicate with the Drone server. An incorrect URL means it’s trying to connect to the wrong place.
  3. Drone Server Not Running or Unhealthy

    • Diagnosis: Check the status of the Drone server process or container.
      # If running as a systemd service
      sudo systemctl status drone-server
      
      # If running in Docker
      docker ps | grep drone-server
      
      # If running in Kubernetes
      kubectl get pods -l app=drone-server -n drone
      
      Also, check the Drone server logs for any errors.
    • Fix: Start or restart the Drone server. If it’s crashing, investigate the server logs for the root cause. This might involve increasing resource limits (CPU/memory) if the server is overloaded.
    • Why it works: The runner is a client; it needs a healthy server to connect to. If the server is down, the connection will inevitably fail.
  4. TLS/SSL Certificate Issues (for HTTPS)

    • Diagnosis: If your Drone server uses HTTPS, check the DRONE_TLS_SECRET and DRONE_RPC_SECRET environment variables on both the server and the runner configuration. Also, verify the DRONE_SERVER URL in the runner configuration uses https.
    • Fix:
      • Matching Secrets: Ensure DRONE_RPC_SECRET is identical on both the Drone server and the drone-runner. This secret is used for authentication between them.
      • Valid Certificate: If the Drone server is using a self-signed certificate and the runner is not configured to trust it, the connection will fail. You can either:
        • Configure the runner to trust the CA that signed the server’s certificate.
        • For testing/simplicity, disable TLS verification on the runner (not recommended for production): Set DRONE_SKIP_VERIFY=true in the runner’s environment variables.
        • Use a valid, trusted certificate for the Drone server.
    • Why it works: The RPC secret authenticates the runner to the server. TLS verification ensures the runner is talking to the actual Drone server and not an imposter. If these fail, the connection is dropped.
  5. Runner Resource Constraints (CPU/Memory/Disk)

    • Diagnosis: Check the resource utilization on the host where the drone-runner is running. Look for high CPU, low memory, or disk space exhaustion.
      # On Linux
      top
      df -h
      
      Also, check the drone-runner logs for any resource-related errors.
    • Fix: Allocate more resources (CPU, RAM) to the drone-runner process or the host it’s running on. Ensure sufficient disk space is available for build artifacts and Docker images.
    • Why it works: A starved runner might not be able to establish or maintain network connections, or it might crash before it can properly communicate with the server.
  6. Incorrect DRONE_RPC_SECRET Configuration

    • Diagnosis: Verify the DRONE_RPC_SECRET value in the drone-runner configuration matches the DRONE_RPC_SECRET value configured for the Drone server.
    • Fix: Ensure the DRONE_RPC_SECRET environment variable or configuration setting is identical for both the Drone server and all registered drone-runners. For example, if set in Docker Compose:
      # drone-server service
      environment:
        - DRONE_RPC_SECRET=your-super-secret-key
      
      # drone-runner service
      environment:
        - DRONE_RPC_SECRET=your-super-secret-key
      
    • Why it works: This secret acts as a shared key for secure communication and authentication between the runner and the server. Mismatched secrets prevent them from trusting each other.

After addressing these, the next error you might encounter is a resource limit exceeded in a pipeline step if the runner is still struggling with insufficient resources for the actual build jobs.

Want structured learning?

Take the full Drone course →