Kubernetes applications are notoriously hard to monitor because they’re ephemeral and distributed by design.

Here’s a Kubernetes deployment running a simple web service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-container
        image: nginx:latest
        ports:
        - containerPort: 80

When a request comes in, it hits a Service, which then load-balances it to one of the Pods. If a Pod fails, Kubernetes automatically spins up a new one. This constant churn makes traditional monitoring tools, which rely on static host IPs and predictable processes, struggle to keep up. Dynatrace, however, is built to handle this dynamic environment.

Dynatrace’s magic starts with its OneAgent. For Kubernetes, this is deployed as a DaemonSet, meaning it runs on every node in your cluster. This ensures that no matter where your application Pods land, the OneAgent is there to observe them. It automatically detects Pods, containers, and the processes running inside them, creating a live topology of your entire Kubernetes environment.

The real power comes from its instrumentation. The OneAgent automatically injects itself into running processes, capturing method-level performance data without requiring code changes. For a web application like our nginx example, it would capture:

  • Request Traces: Every incoming HTTP request, its latency, the exact Pod and container it hit, and any downstream calls it made.
  • Service Dependencies: If web-app called another service (e.g., a database Pod), Dynatrace would map this dependency automatically.
  • Resource Utilization: CPU, memory, and network usage at the Pod and container level, correlated with application performance.
  • Error Detection: Automatic identification of exceptions, errors, and performance degradations.

Let’s say our web-app suddenly starts responding slowly. Dynatrace’s "Smartscape" view would show you the web-app deployment, its Pods, the Node they’re running on, and any upstream services or downstream dependencies. You’d see a waterfall chart for slow requests, pinpointing exactly which part of the request path is the bottleneck. If the issue is a specific Pod, you can drill down into its resource metrics and logs. If it’s a dependency, you see that too.

The key is that Dynatrace doesn’t just show you metrics; it builds a causal chain. If a Node’s CPU is maxed out, and that Node is hosting slow web-app Pods, Dynatrace connects these dots. It understands that Kubernetes orchestrates Pods onto Nodes, and that Node performance directly impacts Pod performance. It maps the Deployment -> ReplicaSet -> Pod -> Container -> Process hierarchy automatically.

Here’s a snippet of what a Dynatrace "PurePath" (a distributed trace) might look like for a slow request to our web-app:

Request: /index.html
  - Received by Pod: web-app-xyz12 (Node: node-1)
    - Nginx Process (PID: 12345)
      - Time spent in Nginx: 50ms
      - Downstream call to: database-service
        - Received by Pod: database-xyz78 (Node: node-5)
          - PostgreSQL Process (PID: 67890)
            - Time spent in DB: 150ms
            - Response: 200 OK
        - DB call duration: 160ms
    - Total request duration: 210ms

This level of detail, automatically captured and correlated, is what allows you to quickly identify the root cause of performance issues in a dynamic Kubernetes environment. You’re not guessing which Pod is slow; you’re seeing the exact trace and its latency breakdown.

Many teams struggle with understanding the interplay between Kubernetes resource requests/limits and actual application performance. Dynatrace can overlay your Kubernetes resource definitions (e.g., resources.requests.cpu: "100m", resources.limits.memory: "256Mi") directly onto the performance data. This allows you to see if Pods are being throttled due to CPU limits, or if excessive memory usage is causing garbage collection pauses or even OOMKills, all while seeing the direct impact on your application’s end-user experience.

The next challenge is often understanding the network policies and service meshes within your Kubernetes cluster.

Want structured learning?

Take the full Dynatrace course →