Datadog APM for Node.js is more about observing your application’s behavior than just collecting metrics; it’s about understanding the emergent properties of distributed systems.
Let’s trace a request through a basic Express app instrumented with Datadog.
// server.js
const express = require('express');
const tracer = require('dd-trace').init(); // Initialize Datadog tracer
const app = express();
const port = 3000;
app.get('/', (req, res) => {
res.send('Hello World!');
});
app.get('/user/:id', async (req, res) => {
const userId = req.params.id;
// Simulate fetching user data from a database
const userData = await fetchUserData(userId);
res.json({ id: userId, ...userData });
});
async function fetchUserData(userId) {
// Simulate a database call
return new Promise(resolve => {
setTimeout(() => {
resolve({ name: `User ${userId}`, email: `user${userId}@example.com` });
}, 100);
});
}
app.listen(port, () => {
console.log(`App listening at http://localhost:${port}`);
});
To run this, you’ll need express and dd-trace:
npm install express dd-trace
Before starting the app, set your Datadog API key as an environment variable:
export DD_API_KEY='YOUR_DATADOG_API_KEY'
node server.js
Now, if you hit http://localhost:3000/user/123 in your browser or with curl, Datadog will automatically create traces. You’ll see a trace representing the incoming HTTP request, and within that, a sub-span for fetchUserData. This is the magic: dd-trace automatically patches common Node.js modules (like http, async_hooks, and database drivers) to create these spans.
The core problem Datadog APM solves is the "unknown unknowns" in distributed systems. When a request slows down or fails across multiple services, pinpointing the bottleneck is like finding a needle in a haystack. APM provides the map.
Here’s how it works internally: dd-trace uses Node.js’s async_hooks API to track asynchronous operations. When an async operation starts (like an HTTP request or a database query), a new "span" is created. When that operation finishes, the span is closed, and its duration is recorded. These spans are then linked together to form a "trace," which represents the entire lifecycle of a single request. The tracer also injects and extracts trace context (like trace ID and span ID) into outgoing requests (e.g., HTTP headers), allowing traces to span across multiple services.
The primary levers you control are through environment variables. DD_SERVICE names your application in Datadog. DD_ENV specifies the environment (e.g., prod, staging). DD_VERSION tracks deployments. For custom instrumentation, you can manually create spans:
app.get('/user/:id', async (req, res) => {
const span = tracer.startSpan('custom.user.lookup'); // Start a custom span
span.setTag('user.id', req.params.id);
try {
const userId = req.params.id;
const userData = await fetchUserData(userId);
res.json({ id: userId, ...userData });
span.finish(); // Finish the span on success
} catch (error) {
span.setTag('error', error); // Tag with error
span.finish(); // Finish the span on error
throw error;
}
});
This manual span creation is crucial for understanding business logic or internal workflows that aren’t automatically captured by module patching.
Most people understand that APM traces requests. What’s less obvious is how dd-trace automatically associates asynchronous operations that don’t explicitly await them within the same function scope. For example, if you have a series of Promise.all calls where some promises resolve after others, dd-trace’s use of async_hooks ensures that the spans correctly nest and attribute time to the parent asynchronous context, even if the explicit await is at the very end. This is why you see database calls nested under your route handlers without explicit manual tracing in many cases.
The next step is exploring distributed tracing and how to propagate trace context across multiple microservices.