Consul datacenters are connected by Mesh Gateways, but they don’t actually know about each other’s existence until you explicitly tell them to.

Let’s say you have two datacenters, dc1 and dc2, and you want services in dc1 to be able to communicate with services in dc2 over a secure, encrypted connection.

Here’s a simplified view of what that might look like:

[Service A in dc1] <---> [Mesh Gateway in dc1] <---> [WAN] <---> [Mesh Gateway in dc2] <---> [Service B in dc2]

Consul’s WAN federation allows datacenters to gossip about each other’s health and topology. However, for service-to-service communication between datacenters, especially when you want to enforce security and control traffic flow, you need Mesh Gateways. They act as secure entry and exit points for inter-datacenter traffic.

Setting up Mesh Gateways

The core idea is to deploy a specific type of Consul service called a "Mesh Gateway" in each datacenter you want to connect. These gateways are configured to trust each other and to route traffic.

1. Enable WAN Federation (if not already done)

This is the foundational step. Datacenters need to be able to discover each other.

  • Diagnosis: Check consul status on a server in each DC. You should see other DCs listed under "Joined WAN."
  • Fix: If not, on each Consul server, ensure the retry_join or start_join configuration points to servers in other datacenters. For example, in dc1’s server config:
    server = true
    bootstrap = false # if not the first server in the cluster
    retry_join = ["10.1.1.1", "10.1.1.2", "dc2-consul-server-ip"] # IPs of servers in dc1 and dc2
    
  • Why it works: This allows Consul servers in different datacenters to establish gossip connections, sharing information about cluster membership and health.

2. Deploy Mesh Gateway Services

You’ll define two types of services: the mesh-gateway service itself, and then a mesh-gateway-upstream service for each remote datacenter you want to connect to.

  • Diagnosis: Look for services named mesh-gateway and mesh-gateway-upstream-* in consul services. You won’t see them if they aren’t registered.
  • Fix (in dc1):
    • Create a service definition for the mesh-gateway itself. This tells Consul that a gateway exists here.
      # dc1-mesh-gateway-service.json
      {
        "service": {
          "name": "mesh-gateway",
          "port": 8443, # The port the gateway listens on for incoming connections
          "kind": "mesh-gateway"
        }
      }
      
    • Register this service using consul services register dc1-mesh-gateway-service.json.
    • Create a service definition for the upstream gateway in dc2. This tells dc1 how to reach dc2’s gateway.
      # dc1-mesh-gateway-upstream-dc2.json
      {
        "service": {
          "name": "mesh-gateway-upstream-dc2",
          "port": 8443, # The port of the actual remote gateway instance
          "kind": "mesh-gateway-upstream",
          "connect": {
            "sidecar_service": {
              "proxy": {
                "upstreams": [
                  {
                    "destination_name": "mesh-gateway", # The name of the gateway service in dc2
                    "datacenter": "dc2",
                    "package_name": "mesh-gateway" # The actual Consul service name in dc2
                  }
                ]
              }
            }
          }
        }
      }
      
    • Register this service using consul services register dc1-mesh-gateway-upstream-dc2.json.
  • Fix (in dc2): Do the equivalent steps, registering a mesh-gateway service and a mesh-gateway-upstream-dc1 service that points back to dc1’s gateway.
  • Why it works: The mesh-gateway service registration makes the gateway discoverable within its own datacenter. The mesh-gateway-upstream service registration, critically, uses the kind: "mesh-gateway-upstream" and connect.sidecar_service.proxy.upstreams to tell Consul that this service definition represents a path to a gateway in another datacenter. Consul then uses this information to route traffic destined for dc2 through dc1’s mesh-gateway service.

3. Configure TLS Encryption

Mesh Gateways use TLS to secure communication between them. They need to trust each other’s certificates.

  • Diagnosis: If you see errors like "x509: certificate signed by unknown authority" or connection timeouts when trying to connect between DCs, TLS is likely misconfigured.
  • Fix:
    • Each datacenter needs a CA certificate that is trusted by the other. A common approach is to use a shared root CA or to have each datacenter’s gateway trust the other’s CA.
    • On the Consul servers in dc1, configure ca_file and cert_file for the Consul agent’s TLS settings.
    • In the mesh-gateway service definition (for both dc1 and dc2), you’ll specify the TLS configuration. This is often done via the Consul agent’s configuration, which the gateway service inherits.
    • For the mesh-gateway-upstream service, Consul can automatically derive the necessary configuration if the mesh-gateway service itself is correctly configured with TLS.
    • A key detail: ensure the mesh-gateway service definition in dc1 has its port set to the TLS listener port (e.g., 8443), and configure the Consul agent running the gateway to use TLS on that port.
  • Why it works: TLS encrypts the traffic between the gateways, preventing eavesdropping. By configuring each gateway to trust the CA that signed the other’s certificate, they can mutually authenticate each other, ensuring that traffic is only flowing between legitimate gateways.

4. Configure Service Defaults and Service Router

To actually route traffic from a service in dc1 to a service in dc2, you need to define how Consul Connect should handle this.

  • Diagnosis: Services in dc1 can’t reach services in dc2 even after gateways are up.
  • Fix:
    • In dc1, ensure service-defaults.consul is configured to allow inter-datacenter communication and to use mesh gateways.
    • Define a service-router in dc1 that matches traffic destined for services in dc2 and directs it to the mesh-gateway-upstream-dc2 service.
      # dc1-service-router-to-dc2.json
      {
        "Kind": "service-router",
        "Name": "route-to-dc2",
        "Services": {
          "service": {
            "Datacenter": "dc2" # Matches services in dc2
          }
        },
        "Routes": [
          {
            "Services": {
              "service": {
                "Datacenter": "dc2"
              }
            },
            "MeshGateway": {
              "Name": "mesh-gateway", # The name of the gateway service in the *local* DC (dc1)
              "ConsumerNamespace": "default" # Or the namespace of the calling service
            },
            "Destination": {
              "Service": "mesh-gateway-upstream-dc2" # The name of the upstream service definition in dc1
            }
          }
        ]
      }
      
    • Register this using consul config write dc1-service-router-to-dc2.json.
  • Why it works: The service-router acts as a policy. It intercepts requests originating in dc1 that are destined for services in dc2. It then instructs Consul Connect to send that traffic to the mesh-gateway-upstream-dc2 service definition. Consul, seeing this points to a remote mesh gateway, knows to route the traffic through the local mesh-gateway service in dc1, which then establishes a connection to the mesh-gateway in dc2.

The next thing you’ll likely encounter is managing fine-grained access control policies between services across these connected datacenters using Consul’s intention system.

Want structured learning?

Take the full Consul course →