Consul agents can communicate with each other across different network segments (e.g., subnets, VPCs, availability zones) without requiring direct network access between every agent, provided a few key components are properly configured.
Here’s a simplified view of how two Consul agents, Agent A in 10.0.1.10/24 and Agent B in 10.10.1.10/24, can talk when separated by a router or firewall.
Scenario Setup:
Imagine we have two distinct subnets:
- Subnet 1:
10.0.1.0/24, with Consul Agent A at10.0.1.10. - Subnet 2:
10.10.1.0/24, with Consul Agent B at10.10.1.10.
These subnets are not directly routable to each other for all traffic, simulating a network segmentation. We want Agent A to be able to discover and communicate with Agent B.
Consul’s Internal Communication:
Consul agents use a gossip protocol for peer discovery and state synchronization. This gossip typically happens over UDP port 8301. For RPC communication (e.g., joining a cluster, querying state), TCP port 8500 is used.
When agents are in different network segments, they can’t directly gossip or RPC to each other if there’s no route. This is where Consul’s Serf WAN gossip and Consul Server communication come into play.
The Key Components: Serf WAN and Server-to-Server RPC
- Serf WAN Gossip (UDP 8301): This is the low-level membership and failure detection protocol. For inter-segment communication, you don’t want every agent gossiping to every other agent. Instead, you designate specific agents (usually servers) to form a WAN gossip pool.
- Consul Server RPC (TCP 8500): Consul servers also need to communicate with each other to maintain cluster state, replicate data, and serve queries. This communication forms the backbone of the Consul cluster.
How to Make Them Talk:
The most common and robust pattern for multi-segment Consul deployments is to have a Consul Server cluster in a central, highly routable segment and then have Consul Agents (clients or servers) in other segments join this central cluster.
Step-by-Step Implementation:
-
Establish a Central Consul Server Cluster:
- Deploy at least 3 Consul servers (for high availability) in a network segment that is routable between your desired segments. Let’s say these servers are at
10.0.0.10,10.0.0.11,10.0.0.12in10.0.0.0/24. - These servers will form a Consul datacenter (e.g.,
dc1). - Their configuration would look something like this (
server.json):
{ "server": true, "bootstrap_expect": 3, "datacenter": "dc1", "data_dir": "/opt/consul/data", "log_level": "INFO", "node_name": "consul-server-X", "bind_addr": "10.0.0.X", // The IP address within the central segment "client_addr": "0.0.0.0", // Allow clients to connect from anywhere within the datacenter "ports": { "dns": 8600, "http": 8500, "rpc": 8500, "serf_lan": 8301, "serf_wan": 8302 // Often configured separately for WAN gossip } }- Start Consul with this config:
consul agent -config-file server.json
- Deploy at least 3 Consul servers (for high availability) in a network segment that is routable between your desired segments. Let’s say these servers are at
-
Configure Firewall/Routing Rules:
- Crucially, ensure that the network segments where your client agents reside (
10.0.1.0/24,10.10.1.0/24) can reach the server IPs (10.0.0.10,10.0.0.11,10.0.0.12) on TCP port8500(for RPC) and UDP port8301(for Serf LAN gossip). - You do NOT need direct connectivity between client agents in different segments.
- You do NOT need direct connectivity between servers in different datacenters if you were to expand later, beyond the specific inter-datacenter ports.
- Crucially, ensure that the network segments where your client agents reside (
-
Join Client Agents to the Central Cluster:
- On Agent A (
10.0.1.10) in10.0.1.0/24, configure it to join the centraldc1datacenter. - Its configuration (
client-a.json):
{ "server": false, "datacenter": "dc1", // It belongs to the 'dc1' datacenter "data_dir": "/opt/consul/data", "log_level": "INFO", "node_name": "consul-agent-a", "bind_addr": "10.0.1.10", // Its IP within its segment "client_addr": "0.0.0.0", "ports": { "dns": 8600, "http": 8500, "rpc": 8500, "serf_lan": 8301 }, "retry_join": [ "10.0.0.10", // One of the central server IPs "10.0.0.11" // Another central server IP ] }-
Start Agent A:
consul agent -config-file client-a.json -
On Agent B (
10.10.1.10) in10.10.1.0/24, do the same. -
Its configuration (
client-b.json):
{ "server": false, "datacenter": "dc1", // It also belongs to 'dc1' "data_dir": "/opt/consul/data", "log_level": "INFO", "node_name": "consul-agent-b", "bind_addr": "10.10.1.10", // Its IP within its segment "client_addr": "0.0.0.0", "ports": { "dns": 8600, "http": 8500, "rpc": 8500, "serf_lan": 8301 }, "retry_join": [ "10.0.0.10", // One of the central server IPs "10.0.0.12" // Another central server IP ] }- Start Agent B:
consul agent -config-file client-b.json
- On Agent A (
Verification:
Once started, Agent A and Agent B will attempt to contact the specified retry_join servers. If network connectivity is established (i.e., the firewall/routing allows TCP 8500 and UDP 8301 from the client segments to the server segment), they will join the dc1 datacenter.
You can verify this from any of the server nodes:
consul members -dc=dc1
This command should show Agent A (10.0.1.10) and Agent B (10.10.1.10) as members of the dc1 datacenter, even though they are in different subnets.
Why This Works:
- Centralized Control: All agents in a datacenter, regardless of their physical network location, communicate with the server cluster for membership and state.
- Serf LAN: Each agent uses Serf LAN gossip on UDP 8301 to talk to its local Consul servers. If an agent is in a different segment, its Serf LAN traffic must be able to reach the servers in the central segment.
- RPC: When an agent joins, it performs an RPC to a server to register itself. This also requires connectivity from the agent’s segment to the server’s segment on TCP 8500.
- No Direct Agent-to-Agent for Membership: For a given datacenter, agents don’t need to gossip directly with every other agent. They only need to reach the servers. The servers then handle gossip and state propagation among themselves and with the clients.
Advanced: Multi-Datacenter Communication
If you have agents in different datacenters (e.g., dc1 and dc2), you would configure Serf WAN gossip between servers of dc1 and dc2. The clients in dc2 would still join their local dc2 servers, and the dc2 servers would then communicate with dc1 servers. This pattern scales to many datacenters and segments.
The next logical step is understanding how to configure services to be discoverable across these datacenters.