The Docker Swarm manager is unreachable because the internal network overlay it relies on for inter-manager communication has become corrupted or is experiencing routing issues.

Common causes and fixes for a Docker Swarm manager not being reachable:

  1. Manager Node Not Healthy/Joined: A manager node might have lost its connection to the swarm or is in an unhealthy state, preventing it from participating in the cluster’s consensus.

    • Diagnosis: On any healthy manager node, run docker node ls. Look for the affected node’s status. If it’s Down or Disconnected, this is the problem.
    • Fix: If the node is Down, try to restart the Docker daemon on that node: sudo systemctl restart docker. If it’s Disconnected or remains Down, you might need to force-remove it from the swarm and rejoin it. On a healthy manager: docker node rm <node-id>. Then, on the affected node, re-initialize or join: docker swarm join --token <join-token> <manager-ip:port>.
    • Why it works: This ensures the node is actively participating in the swarm’s Raft consensus and network membership.
  2. Overlay Network State Corruption: The internal overlay networks used by Swarm for manager-to-manager communication (specifically ingress and internal control plane networks) can become desynchronized or corrupted.

    • Diagnosis: On a healthy manager, check the status of the internal networks: docker network ls. Look for networks named ingress and potentially others with scope=swarm. Then, inspect them: docker network inspect ingress. Look for inconsistencies in endpoint IPs or configurations. A more direct check involves looking at the swarm’s internal state. For advanced debugging, you might examine iptables rules related to overlay networks or check logs for swarmd on the affected node for Raft-related errors.
    • Fix: This is tricky as direct manipulation of internal overlay state is not officially supported and can be dangerous. Often, the most reliable fix is to rebuild the swarm or evacuate services. However, a less disruptive attempt is to restart the Docker daemon on all manager nodes sequentially, allowing them to re-establish their Raft consensus. If this fails, you may need to drain services from the affected node(s) and then remove/re-add them. A more drastic, but sometimes effective, measure is to force a manager to step down and then promote another if possible, though this requires at least 3 managers.
    • Why it works: Restarting the daemon forces the swarmd process to re-establish its connection and re-synchronize its state with other managers via the overlay network.
  3. Firewall Blocking Inter-Manager Communication: Firewalls on the manager nodes or in the network path are blocking the ports required for Swarm communication.

    • Diagnosis: Ensure ports 2377 (Swarm management), 7777 (Swarm internal control, often implicitly handled by overlay), 7776 (Swarm internal control), and 4789 (VXLAN for overlay networks) are open between manager nodes. Use telnet <manager-ip> <port> or nc -vz <manager-ip> <port> from one manager to another.
    • Fix: Open the necessary ports on your firewalls. For iptables on Linux: sudo iptables -A INPUT -p tcp --dport 2377 -j ACCEPT, sudo iptables -A INPUT -p udp --dport 4789 -j ACCEPT. Repeat for other necessary ports and ensure they are persistent.
    • Why it works: Unblocking these ports allows the Swarm daemons (dockerd and swarmd) to discover and communicate with each other, enabling consensus and state synchronization.
  4. Underlying Host Network Issues: The network interfaces or routing on the manager nodes themselves are misconfigured, preventing them from reaching each other.

    • Diagnosis: On the affected manager node, run ip addr to check interface status and IPs. Use ping <other-manager-ip> and traceroute <other-manager-ip> to check basic connectivity and identify routing hops. Check /etc/docker/daemon.json for any network-related configurations that might be incorrect.
    • Fix: Correct any IP address conflicts, ensure default gateways are set correctly, and verify that the network interfaces used by Docker are up and configured properly. Restart the Docker daemon after making network changes.
    • Why it works: Docker Swarm relies on stable IP connectivity between managers; host network issues directly impede this.
  5. Resource Exhaustion on Manager Node: The manager node is overloaded with CPU, memory, or disk I/O, causing the Docker daemon or swarmd process to become unresponsive.

    • Diagnosis: Use top, htop, free -h, and df -h on the affected manager node to check resource utilization. Look for high CPU or memory usage by dockerd or swarmd, or disk space running critically low.
    • Fix: Free up resources by stopping non-essential processes, deleting unused Docker images/volumes (docker system prune -a), or increasing the node’s resources (CPU, RAM).
    • Why it works: Unresponsive processes due to resource starvation cannot participate in cluster operations or respond to health checks.
  6. Raft Consensus Leader Failure/Unavailability: Swarm managers use the Raft consensus algorithm. If the current leader manager becomes unavailable and a new leader cannot be elected, the swarm can become unresponsive.

    • Diagnosis: On a healthy manager, check docker node ps and look for the leader. If no leader is clearly designated, or if the designated leader is unreachable, this is a sign. You can also inspect swarmd logs on managers for Raft-related errors (e.g., "no leader," "timeout").
    • Fix: This often requires ensuring at least 3 managers are healthy and connected. If a manager is permanently lost, you might need to force-remove it from the swarm. If the leader is just temporarily down, it should recover. If a new leader election fails, it usually points back to network issues or multiple managers being down simultaneously.
    • Why it works: Raft requires a quorum of managers to operate. If the leader is lost and a new one cannot be elected due to network partitions or too many managers being offline, operations halt.

The next error you’ll likely see after resolving manager reachability is related to service deployments or updates failing because the orchestrator cannot reach the swarm state.

Want structured learning?

Take the full Docker course →