Datadog’s Continuous Profiler can reveal surprising inefficiencies that aren’t obvious from traditional metrics, often pointing to code paths consuming disproportionate CPU or memory that would otherwise go unnoticed.

Let’s see it in action. Imagine a web application experiencing intermittent latency spikes. We’ve got Datadog APM tracing requests, but the underlying cause isn’t clear. We enable the Continuous Profiler for the webserver service.

Here’s a snippet of a typical profile we might see:

{
  "sample_count": 10000,
  "total_sampling_interval": "10s",
  "profile": {
    "sample": [
      {
        "location": [
          {
            "mapping_index": 0,
            "function_index": 5,
            "line": 42
          }
        ],
        "value": [
          1500000000, // CPU nanoseconds
          500000000   // Memory bytes
        ]
      },
      // ... more samples
    ],
    "location": [
      {
        "mapping_index": 0,
        "function_index": 5,
        "line": 42
      }
    ],
    "function": [
      {
        "name": "com.example.myapp.DataProcessor.processBatch",
        "system_name": "com.example.myapp.DataProcessor.processBatch",
        "filename": "DataProcessor.java"
      },
      // ... more functions
    ],
    "mapping": [
      {
        "file_name": "/app/java/myapp.jar"
      }
    ]
  }
}

In the Datadog UI, this translates into flame graphs. We’d see a significant portion of CPU time (represented by the width of the bars) and memory allocations attributed to com.example.myapp.DataProcessor.processBatch. This function, when called, is consuming a large percentage of the CPU and allocating memory that’s eventually garbage collected.

The problem this solves is the "unknown unknowns" in performance. APM tells you which requests are slow, but not why the code within them is inefficient. Traditional metrics might show high CPU usage for the host, but not which process or function is responsible. The Continuous Profiler bridges this gap by showing CPU and memory consumption at the function level, continuously, without requiring manual sampling or profiling sessions.

Internally, the Datadog agent on the host periodically (every 100ms by default) inspects the call stacks of running threads. For CPU profiling, it samples the program counter. For memory profiling, it hooks into the language’s memory allocator (e.g., malloc in C/C++, new in Java) to track allocations. These samples are aggregated and sent to Datadog, where they’re visualized as flame graphs. The width of a frame in the flame graph represents its contribution to the sampled metric (CPU time or memory allocation).

The key levers you control are:

  • Service and Environment Tagging: Ensure your application is tagged correctly so you can filter profiles by service, env, region, etc. This is crucial for isolating performance issues.
  • Profiling Interval: The agent’s sampling frequency (e.g., DD_PROFILING_CPU_INTERVAL, DD_PROFILING_MEM_INTERVAL). A higher frequency provides more detail but increases agent overhead. The defaults (100ms for CPU, 100ms for memory) are usually a good balance.
  • Profiling Duration: Profiles are typically collected over a 1-minute window by default. You can adjust this in Datadog under APM -> Profiler settings. Longer durations can help capture intermittent spikes, but also increase the amount of data processed.
  • Language Support: Datadog supports profiling for Java, Go, Python, Ruby, .NET, and Node.js. Ensure you’re using a supported language and that the Datadog agent is configured correctly for it.

When analyzing the flame graph, look for "wide" stacks that represent a significant portion of the application’s execution or memory activity. These are your primary candidates for optimization. It’s not just about finding the deepest stack, but the one that consumes the most resources relative to its necessity. A function that is called infrequently but takes a very long time or allocates a lot of memory can still be a major performance bottleneck if it’s part of a critical path.

The most surprising aspect for many users is how often seemingly innocuous code, like string manipulation, inefficient iteration, or poorly designed caching, can dominate resource usage under load. The profiler often points to these "hidden" costs that wouldn’t be flagged by unit tests or basic performance monitoring. For example, repeated creation of large temporary objects within a tight loop, even if the objects themselves are small, can lead to significant garbage collection pressure and CPU overhead.

Once you’ve identified an inefficient code path, the next step is to dive into the specific lines of code indicated by the profiler and refactor them, perhaps by using more efficient data structures, algorithmic improvements, or reducing redundant computations.

Want structured learning?

Take the full Datadog course →