Tagging your APM services with environments and custom labels is how you make sense of your distributed systems as they grow.
Let’s say you have a service called user-service. Without tags, all traces and metrics for user-service look identical, regardless of whether they came from your development, staging, or production environment. This is a problem because you want to see how production is performing, not get bogged down by noisy dev traffic.
Here’s a quick look at what this looks like in Datadog:
Imagine you have traces for user-service. By default, they all just show up under user-service.
Traces for user-service:
- Trace ID: abcdef123456
- Trace ID: fedcba654321
- Trace ID: 123456abcdef
Now, let’s apply some tags. We can tag this instance of user-service with env:prod.
Traces for user-service (env:prod):
- Trace ID: abcdef123456
- Trace ID: fedcba654321
And another instance with env:staging.
Traces for user-service (env:staging):
- Trace ID: 123456abcdef
Suddenly, you can filter and slice your data. You can create dashboards showing only production user-service performance, or compare latency between staging and prod. This is the core benefit: granularity.
The system works by having your APM agent automatically collect certain metadata about the environment it’s running in, and then allowing you to layer on your own custom metadata. The agent typically runs as part of your application process or alongside it. It intercepts outgoing requests and incoming traces, and injects or annotates them with these tags.
The key is that these tags become searchable dimensions in your APM tool. Think of them like columns in a spreadsheet. You have your trace data (the rows), and the tags are the columns that let you filter, group, and aggregate that data.
The most common and arguably most important tag is the environment. This is typically set via an environment variable that your APM agent reads. For example, in Datadog, you might set DD_ENV="production" or DD_ENV="staging". The agent then automatically applies this tag to all traces and metrics originating from that application instance.
Another critical tag is the service version, often called version. This is invaluable for understanding the impact of deployments. If you deploy a new version of user-service and see an increase in errors, you can immediately correlate it to the version tag. This is often set via DD_VERSION="v1.2.3".
Custom labels, or tags as they’re often called in APM tools, are where you get really specific. These can be anything relevant to your business or infrastructure: team:backend, customer:enterprise, region:us-east-1, feature_flag:new_checkout. The exact mechanism for setting these varies by agent, but it’s usually through environment variables or direct configuration in the agent’s setup. For Datadog, it’s DD_TAGS="team:backend,customer:enterprise".
When you configure your APM agent (like the Datadog Agent, OpenTelemetry Collector, or New Relic Agent), you specify these tags. The agent then ensures that every piece of telemetry it sends – traces, metrics, logs – is stamped with these identifiers.
For example, if your user-service is running in Kubernetes, the agent might automatically pick up the Kubernetes pod name, namespace, and labels. You can then explicitly add your own tags on top of that. This allows you to filter traces not just by environment, but by the specific Kubernetes deployment or even the individual pod that handled a request.
The power comes from combining these tags. You can ask: "Show me all traces for user-service where env:staging AND team:backend AND version:v1.2.3." This level of detail is what allows you to pinpoint issues in complex, multi-environment systems.
The one thing most people don’t immediately grasp is how deeply these tags permeate all your telemetry. It’s not just about tracing. When you tag a service with env:prod, all metrics generated by that service – CPU usage, memory, request rates – are also tagged with env:prod. This means you can create a single dashboard that shows both application performance metrics (like request latency) and infrastructure metrics (like CPU utilization) for user-service in production, all filtered and correlated by the env:prod tag.
The next step after mastering environments and custom labels is usually understanding how to use these tags to build effective anomaly detection and alerting policies.