Datadog synthetic tests don’t just check if a server is up; they simulate actual user journeys through your application to catch issues before your users do.
Let’s say you’re running an e-commerce site. A basic "ping" test might tell you if your web server is responding, but it won’t tell you if a user can actually add an item to their cart or complete a checkout. That’s where Datadog synthetic tests shine. They let you script these exact user interactions.
Here’s a simplified example of a synthetic test that checks the login flow of a web application:
from datadog_checks.base import AgentCheck
from datadog_checks.base.utils.http import requests
class MyLoginCheck(AgentCheck):
def check(self, instance):
url = "https://your-app.example.com/login"
username = "testuser"
password = "testpassword"
# Step 1: Navigate to the login page
try:
response = requests.get(url, timeout=10)
response.raise_for_status() # Raise an exception for bad status codes
self.log.info(f"Successfully reached login page: {url}")
except requests.exceptions.RequestException as e:
self.log.error(f"Failed to reach login page {url}: {e}")
self.gauge("myapp.login.step.reach_login_page.error", 1)
return
# Step 2: Submit login credentials
login_payload = {
"username": username,
"password": password
}
try:
# In a real scenario, you'd likely use a POST request to a specific login endpoint
# This is a simplified representation
login_url = "https://your-app.example.com/api/login"
response = requests.post(login_url, json=login_payload, timeout=10)
response.raise_for_status()
self.log.info("Successfully submitted login credentials.")
# Assuming a successful login returns a 200 OK and perhaps a session token
if response.status_code == 200:
self.gauge("myapp.login.step.submit_credentials.success", 1)
else:
self.gauge("myapp.login.step.submit_credentials.error", 1)
self.log.error(f"Login submission returned non-200 status: {response.status_code}")
return
except requests.exceptions.RequestException as e:
self.log.error(f"Failed to submit login credentials: {e}")
self.gauge("myapp.login.step.submit_credentials.error", 1)
return
# Step 3: Verify successful login (e.g., by checking for a specific element on the dashboard)
try:
dashboard_url = "https://your-app.example.com/dashboard"
response = requests.get(dashboard_url, timeout=10)
response.raise_for_status()
if "Welcome, testuser" in response.text: # Simple text check
self.log.info("Successfully verified login on dashboard.")
self.gauge("myapp.login.flow.success", 1)
else:
self.log.error("Login successful, but dashboard content not as expected.")
self.gauge("myapp.login.flow.error", 1)
except requests.exceptions.RequestException as e:
self.log.error(f"Failed to verify login on dashboard: {e}")
self.gauge("myapp.login.flow.error", 1)
This script, when run by Datadog’s synthetic monitoring agents, will execute these steps. It sends metrics like myapp.login.flow.success or myapp.login.step.reach_login_page.error back to Datadog. You can then visualize these metrics on dashboards, set up alerts, and see trends over time.
The core problem synthetic tests solve is the disconnect between infrastructure health and application functionality. Your servers might be humming along, but a broken database connection or a misconfigured API gateway can render your application unusable. Synthetic tests act as a vigilant user, interacting with your app from the outside, mimicking real-world scenarios.
Internally, Datadog’s synthetic monitoring platform deploys agents to various global locations. These agents execute your defined test scripts. When a test runs, it can perform actions like:
- HTTP Requests: Fetching pages, sending data to APIs.
- Assertions: Checking HTTP status codes, verifying response content (e.g., looking for specific text, checking JSON structures), validating response headers.
- Browser Tests: For more complex front-end interactions, Datadog can launch a full browser (via Puppeteer or Playwright) to click buttons, fill forms, and navigate through JavaScript-heavy Single Page Applications (SPAs). This is crucial for UIs that rely heavily on client-side rendering.
- API Tests: Directly testing your API endpoints, useful for microservices architectures or for validating backend logic independently of the frontend.
You control these tests through the Datadog UI or via infrastructure-as-code tools like Terraform. You define the URL, the steps, the assertions, the frequency of checks, and the locations from which the tests should run. You can also configure alerting rules, so if a test fails for a sustained period, you get notified immediately.
The most powerful aspect, and often overlooked, is how Datadog correlates these synthetic test results with your other telemetry. If a synthetic test for your checkout flow starts failing, Datadog can automatically correlate that event with spikes in error rates from your backend services, increased latency in your database, or even specific deployment events that occurred around the same time. This correlation is key to rapidly diagnosing the root cause, rather than just knowing that something is broken.
The next step after successfully monitoring uptime and user flows is to start layering in distributed tracing to understand the performance bottlenecks within those user flows.