The most surprising thing about monitoring Google Cloud with Dynatrace is that you’re not just getting metrics, you’re getting behavior – Dynatrace automatically maps your GCP services to your applications, showing you how infrastructure performance directly impacts end-user experience.
Let’s see it in action. Imagine you have a GKE cluster running a web application.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-web-app
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web-container
image: gcr.io/my-project/my-web-app:v1.2.0
ports:
- containerPort: 8080
When Dynatrace’s GCP integration is enabled, it doesn’t just pull GCP metrics like CPU utilization for your GKE nodes. It deploys its OneAgent (or uses an existing one) into your GKE pods. This agent observes network traffic, process activity, and application-level requests.
Here’s how the mental model builds:
- GCP Service Discovery: Dynatrace uses a service account with
monitoring.viewerandcloudplatform.viewerroles to connect to your GCP project. It queries GCP APIs to discover your resources: GKE clusters, Compute Engine instances, Cloud SQL databases, Pub/Sub topics, Load Balancers, etc. - OneAgent Deployment: For compute resources like GKE and Compute Engine, Dynatrace can automatically deploy its OneAgent. In GKE, this is typically done via a DaemonSet, ensuring an agent runs on each node. The agent then automatically discovers processes and applications running within pods.
- Topology Mapping: This is where the magic happens. Dynatrace correlates the GCP resource metadata (like cluster name, instance ID, project ID) with the OneAgent’s process and application discovery. It builds a live, interactive topology map showing your GCP infrastructure and the applications running on it. You see a GKE node, then the pods on it, then the processes inside the pods, and finally the application services those processes provide.
- Metrics and Traces Correlation: Dynatrace pulls standard GCP metrics (e.g.,
compute.googleapis.com/instance/cpu/utilization,container.googleapis.com/pod/cpu/usage_time) and overlays them onto the discovered topology. Crucially, it combines these with application-level metrics and distributed traces captured by the OneAgent. So, a spike in GKE node CPU usage can be directly linked to a slowdown in yourmy-web-appservice, and even to specific user requests that were affected. - Problem Detection: Dynatrace’s AI, Davis, analyzes this combined data. If it sees a correlation – for example, high latency on a Cloud SQL instance coinciding with slow responses from your backend service – it flags it as a "problem," pinpointing the root cause.
Consider a scenario where your my-web-app deployment is experiencing intermittent 5xx errors. Without Dynatrace’s GCP integration, you might look at GKE node metrics and see normal CPU/memory. But Dynatrace would show you:
- The specific GKE node experiencing high network egress.
- That this egress is caused by a particular pod (
my-web-app-xxxxx-yyyyy). - That this pod is making an unusually high number of calls to a Cloud SQL instance.
- That the Cloud SQL instance is responding with higher-than-usual query latency.
- Finally, that specific user sessions experienced timeouts due to this chain of events.
The levers you control are primarily around the GCP service account permissions and OneAgent deployment strategy. You can configure which GCP projects Dynatrace monitors and how OneAgents are deployed (e.g., automatically on GKE or manually on specific Compute Engine instances).
What most people don’t realize is that Dynatrace actively injects custom metadata into GCP metrics it ingests, allowing for much richer filtering and analysis within the GCP console itself, even for metrics not directly captured by the OneAgent. This bidirectional enrichment is key.
The next step is understanding how Dynatrace leverages this rich topology to automate anomaly detection and root cause analysis across hybrid environments.