Datadog’s tagging system is less about what you’re monitoring and more about how you’re grouping and querying it, turning a firehose of metrics into actionable intelligence.

Let’s say you have a distributed system: a web frontend (React SPA), an API gateway (Kong), and a backend service (Go monolith).

Here’s what that might look like in Datadog, before you’ve thought much about tags:

# Metrics from the frontend
env:prod,host:frontend-1.example.com,service:react-app,version:v2.1.0,deployment:blue
env:prod,host:frontend-2.example.com,service:react-app,version:v2.1.0,deployment:blue

# Metrics from the API gateway
env:prod,host:kong-1.example.com,service:kong,version:2.5.1
env:prod,host:kong-2.example.com,service:kong,version:2.5.1

# Metrics from the backend
env:prod,host:go-app-1.example.com,service:go-app,version:v3.0.0,region:us-east-1
env:prod,host:go-app-2.example.com,service:go-app,version:v3.0.0,region:us-east-1

This is noisy. You want to see all traffic hitting your prod environment, regardless of the specific service. Or, you want to see the performance of v2.1.0 of your react-app across all environments it might be deployed to.

A robust tagging strategy makes this simple. The core idea is consistency and a clear hierarchy. Let’s define some key tags:

  • env: The deployment environment (e.g., prod, staging, dev, qa). This is almost always your top-level filter.
  • service: The logical name of the application or component (e.g., react-app, kong, go-app, redis-cache). This helps delineate distinct parts of your system.
  • region: The geographic region where the service is deployed (e.g., us-east-1, eu-west-2). Crucial for understanding regional performance and blast radius.
  • availability-zone: A more granular location within a region (e.g., us-east-1a, us-east-1b). Useful for high-availability analysis.
  • deployment-tier: Distinguishes between different layers of your architecture (e.g., frontend, gateway, backend, database, cache).
  • version: The specific version of the service. Essential for tracking performance changes between releases.

With these, our metrics look much cleaner:

# Metrics from the frontend
env:prod,service:react-app,region:us-east-1,availability-zone:us-east-1a,deployment-tier:frontend,version:v2.1.0
env:prod,service:react-app,region:us-east-1,availability-zone:us-east-1b,deployment-tier:frontend,version:v2.1.0

# Metrics from the API gateway
env:prod,service:kong,region:us-east-1,availability-zone:us-east-1a,deployment-tier:gateway,version:2.5.1
env:prod,service:kong,region:us-east-1,availability-zone:us-east-1b,deployment-tier:gateway,version:2.5.1

# Metrics from the backend
env:prod,service:go-app,region:us-east-1,availability-zone:us-east-1a,deployment-tier:backend,version:v3.0.0
env:prod,service:go-app,region:us-east-1,availability-zone:us-east-1b,deployment-tier:backend,version:v3.0.0

Now, in Datadog, you can ask powerful questions. To see the total request count for your go-app across all regions and availability zones in production:

sum:http.requests.count{env:prod,service:go-app,deployment-tier:backend} by {region,availability-zone}

To see the latency of your react-app for a specific version in a specific region:

avg:http.request.latency{env:prod,service:react-app,region:us-east-1,version:v2.1.0,deployment-tier:frontend} by {host}

The secret sauce for many users is how Datadog applies tags. When you instrument your application using Datadog’s libraries (e.g., ddtrace-go, dd-trace-js), you can often configure these tags directly in your application’s startup code or via environment variables. For infrastructure components like EC2 instances or Kubernetes pods, Datadog’s integrations automatically pull in cloud provider tags, which you can then map to your desired Datadog tags. For instance, an EC2 tag Environment: Production can be mapped to env:prod in Datadog.

The most surprising truth about Datadog tagging is that the absence of a tag on a metric is just as meaningful as its presence, and Datadog’s query language is built to leverage this. You can query for * (all values), ? (any single character), or even !tag_name to exclude metrics that have a specific tag. This allows for very precise filtering, like finding all services in production that aren’t tagged with a version yet, indicating a potential deployment gap.

The next step is to explore how to use these tags to build sophisticated monitors that alert you to anomalies based on your defined service boundaries, rather than just host-level issues.

Want structured learning?

Take the full Datadog course →