Comparing topology snapshots before and after deployments is crucial for verifying that your infrastructure changes haven’t inadvertently altered critical network paths or resource relationships.
Let’s say we have a simple application deployed across two Kubernetes clusters, cluster-a and cluster-b, with a service my-app in cluster-a that needs to reach a database my-db in cluster-b.
Initial State (Before Deployment)
Imagine we’re using a tool like kubectl to inspect our resources.
In cluster-a:
kubectl get service my-app -o yaml -n default
Output might show:
apiVersion: v1
kind: Service
metadata:
name: my-app
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
app: my-app
type: ClusterIP
And in cluster-b, for the database:
kubectl get service my-db -o yaml -n database
Output might show:
apiVersion: v1
kind: Service
metadata:
name: my-db
namespace: database
spec:
ports:
- port: 5432
protocol: TCP
targetPort: 5432
selector:
app: postgresql
type: ClusterIP
For my-app in cluster-a to reach my-db in cluster-b, we’d typically have some form of cross-cluster networking set up. This could be via a service mesh (like Istio or Linkerd) with multi-cluster capabilities, or perhaps a dedicated CNI plugin that handles cross-cluster communication, or even something simpler like direct VPN tunnels.
Let’s assume a multi-cluster service mesh is in place. The my-app service in cluster-a would resolve to a Kubernetes Service object, but the actual network traffic would be intercepted by the service mesh’s sidecar proxy. This proxy would then route the traffic to the correct endpoint in cluster-b, potentially via a gateway or a direct mesh-to-mesh connection. The topology we’re concerned with here isn’t just the Kubernetes API objects, but the effective network path.
Deployment Scenario
Now, let’s imagine a deployment happens in cluster-a. We might be updating the my-app deployment itself, or perhaps changing its associated Service object, or even updating the networking configuration that enables cross-cluster communication.
For instance, a common mistake might be to accidentally change the selector on the my-app service in cluster-a to something that no longer matches the pods running the application. Or, perhaps the cross-cluster networking configuration was modified, breaking the path to cluster-b.
Post-Deployment State (After Deployment)
After the deployment, we’d re-run our checks.
In cluster-a:
kubectl get service my-app -o yaml -n default
Suppose the selector was accidentally changed to app: my-app-new:
apiVersion: v1
kind: Service
metadata:
name: my-app
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
app: my-app-new # <-- THIS CHANGED
type: ClusterIP
If the pods still have the label app: my-app, this service will no longer route traffic to them. This is a local problem.
But let’s focus on the cross-cluster aspect. If the cross-cluster networking configuration was changed, the my-app service might still look correct in kubectl, but its ability to reach my-db in cluster-b is broken.
To compare topologies, we need to go beyond just kubectl get service. We need to understand how the system behaves from an end-to-end perspective.
Tools and Techniques for Comparison
-
Service Mesh Observability: If you’re using a service mesh, this is your primary tool. Tools like Kiali (for Istio) or the Linkerd dashboard provide visualizations of your service graph.
- Before: You’d capture a screenshot or export the service graph from Kiali/Linkerd showing
my-appincluster-asuccessfully connecting tomy-dbincluster-b. - After: You’d capture the same graph. If the connection is broken, you’d see a red line, a missing edge, or a significant increase in error rates between the two services. The visualization directly shows the effective topology.
- Before: You’d capture a screenshot or export the service graph from Kiali/Linkerd showing
-
Network Policy and Firewall Rules: For cross-cluster communication, especially if it’s not managed by a service mesh, you might have explicit network policies or firewall rules.
- Before: You’d audit your
NetworkPolicyobjects in Kubernetes (if applicable) and any cloud provider firewall rules (e.g., AWS Security Groups, Azure Network Security Groups) that permit traffic between the clusters on the required ports (e.g., PostgreSQL’s 5432). - After: You’d re-audit. A common deployment error is to create new policies that inadvertently
denytraffic that was previously allowed, or to forget to update rules when IP ranges change. For example, a newNetworkPolicyincluster-bmight restrict ingress to port 5432 to only pods withapp: database, but if the traffic is coming from a different source IP range (e.g., a new gateway IP for the cross-cluster connection), it will be blocked.
- Before: You’d audit your
-
DNS Resolution and Endpoints: Even if the service mesh or CNI handles routing, the underlying Kubernetes
ServiceandEndpointobjects are still relevant.- Before: You’d check the
Endpointsformy-dbincluster-b.
This would show the actual IP addresses of the pods servingkubectl get endpoints my-db -n database -o yamlmy-db. You’d also verify DNS resolution fromcluster-atomy-db.database.svc.cluster.local(or its cross-cluster equivalent). - After: You’d check again. If the
my-dbdeployment incluster-bhad issues (e.g., pods crashing, incorrect labels), theEndpointsobject might be empty or point to unhealthy IPs. If the cross-cluster DNS mechanism was broken,my-apppods might not be able to resolve the service name at all.
- Before: You’d check the
-
Ingress/Egress Gateway Configuration: If your cross-cluster communication relies on ingress or egress gateways (common in service meshes or custom network setups), their configurations are critical.
- Before: Examine the configuration of the gateway in
cluster-a(for egress frommy-app) andcluster-b(for ingress tomy-db). This might involve custom Kubernetes resources likeGatewayandVirtualService(Istio) or specific CNI configurations. - After: Check these configurations. A deployment might have accidentally altered a
VirtualServicerule that dictates how traffic leavingcluster-adestined forcluster-bis handled, or aGatewayresource that defines the entry point intocluster-b. For example, a change to an egress gateway’s IP address or port without updating the corresponding ingress gateway would break the connection.
- Before: Examine the configuration of the gateway in
-
CNI Plugin State: If your cross-cluster connectivity is handled by your CNI, you might need to inspect the CNI’s specific control plane or configuration. This is highly dependent on the CNI. For example, some CNIs might expose their own CRDs or APIs to check network connectivity status between nodes or pods across clusters.
The Counterintuitive Truth About Topology
The actual network topology of a distributed system is often a complex, layered abstraction. What kubectl shows you are Kubernetes API objects – the desired state. The actual state of network reachability is determined by a combination of your CNI, service mesh, network policies, cloud provider firewalls, and the underlying network fabric. A "successful" deployment in Kubernetes might create all the right API objects, but if a firewall rule was missed or a service mesh gateway configuration was subtly altered, the effective topology will have broken connections that are invisible from a simple kubectl get perspective.
Next Steps
After ensuring your application can reach its dependencies across clusters, the next common problem is ensuring that external clients can reach your application’s ingress points, which involves a similar process of verifying ingress controller configurations and associated load balancers or firewall rules.