Datadog metrics aren’t just for infrastructure; they’re a powerful tool for understanding your application’s behavior, but sending too many can decimate your budget.

Let’s see some custom metrics in action. Imagine you’re running a web service and want to track how long certain API endpoints take to respond.

import time
from datadog import initialize, statsd

# Initialize Datadog client (replace with your actual API/APP keys)
options = {
    'api_key': 'YOUR_API_KEY',
    'app_key': 'YOUR_APP_KEY',
    'statsd_host': '127.0.0.1', # Or your Datadog agent host
    'statsd_port': 8125
}
initialize(**options)

def process_request(request_id):
    start_time = time.time()
    # Simulate doing some work
    time.sleep(0.1)
    end_time = time.time()
    duration = (end_time - start_time) * 1000 # Duration in milliseconds

    # Send the custom metric
    statsd.timing('my_app.api.request_duration', duration, tags=['endpoint:/process', f'request_id:{request_id}'])
    print(f"Processed request {request_id} in {duration:.2f}ms")

# Simulate a few requests
for i in range(5):
    process_request(f"req-{i}")

When this code runs, it sends a timing metric named my_app.api.request_duration to Datadog for each request. This metric includes the duration of the request in milliseconds and tags to identify the endpoint and a specific request ID. In Datadog, you’d then see this metric appear, allowing you to visualize request latency, set up alerts on high durations, and slice and dice by endpoint.

The core problem Datadog custom metrics solve is providing granular, application-specific insights that infrastructure metrics alone can’t capture. You can track user actions, business events, or internal application states. The statsd library (part of the Datadog Python client) is the common interface. It uses the DogStatsD protocol, which is a UDP-based protocol designed for high-throughput, low-latency metric submission. This means it’s fast and doesn’t block your application, but it also means you don’t get guaranteed delivery – though for metrics, this is usually acceptable.

Here’s how you build a mental model: Think of Datadog as a giant, time-series database. Custom metrics are like adding your own tables and columns to that database, tailored to your application’s unique needs. You have metric types (gauge, count, rate, histogram, timing) and tags. Tags are key-value pairs that allow you to filter and aggregate your metrics. A timing metric, like the one above, is essentially a histogram that measures how long an operation takes. Datadog automatically calculates percentiles (like p95, p99) from these timings, which are crucial for understanding user experience.

When you send a metric like statsd.increment('my_app.login.failed', tags=['reason:bad_password']), you’re not just sending a number; you’re sending a timestamped event with a value and descriptive labels. Datadog then collects these events, aggregates them over time intervals, and stores them. You can query this data using Datadog’s API or UI to plot graphs, create dashboards, and trigger alerts. The statsd_host and statsd_port in the initialization point to where your application sends these UDP packets – usually a Datadog Agent running on the same host or in your network. The agent then forwards these metrics to Datadog’s backend.

The key levers you control are:

  • Metric Name: Be descriptive and hierarchical (e.g., service.component.operation.status).
  • Metric Type: Choose the right type for the data (e.g., gauge for current values, count for discrete events, timing for durations).
  • Tags: Use them liberally but thoughtfully. They are your primary tool for slicing and dicing data. Avoid high-cardinality tags (like user IDs or request IDs in the example above unless you specifically need to debug a single instance) as they can dramatically increase the number of unique metric series Datadog has to track, leading to higher costs.
  • Aggregation Frequency: If you’re sending metrics very frequently, consider if you can aggregate them client-side before sending or if you can sample them.

A common pitfall is using high-cardinality tags on metrics that are aggregated globally. For instance, if you tag every request duration with a unique request_id, you’re creating millions of distinct time series, even if the underlying application behavior is consistent. Datadog’s pricing is heavily influenced by the number of unique metric series ingested. A series is defined by a metric name and a unique combination of tag values. Instead of tagging individual request IDs, consider tagging broader categories like user_tier:premium or region:us-east-1. For debugging specific requests, you might log the ID and use distributed tracing, which is often more cost-effective for that use case.

The next concept you’ll grapple with is setting up effective alerting based on these custom metrics.

Want structured learning?

Take the full Datadog course →