The most surprising thing about suppressing Dynatrace alerts during maintenance windows is that you’re not actually suppressing them; you’re telling Dynatrace to ignore the conditions that would normally trigger an alert, effectively making the alert invisible for a defined period.
Let’s see this in action. Imagine you’re about to deploy a new version of your critical user-service. You know this will cause a temporary spike in error rates, and you absolutely do not want your on-call engineers getting paged for something you’re actively causing.
Here’s a typical Dynatrace alert rule for error rates:
{
"name": "High Error Rate in User Service",
"conditions": {
"rule": "ALWAYS_TRUE",
"conditions": [
{
"type": "ALERT_CONDITION",
"metric": "com.dynatrace.builtin:service.errors.server",
"operator": "GT",
"threshold": 10,
"entityId": "SERVICE-1234567890ABCDEF",
"timeframe": "5m"
}
]
},
"enabled": true,
"severity": "ERROR",
"impactsSla": true,
"notifyAboutDeactivation": true,
"notifyAboutAbnormalDegradation": true,
"notifyAboutSloSlaViolations": true,
"tags": ["critical-service", "production"]
}
If you just leave this rule enabled during your deployment, you’ll get flooded with alerts.
Instead, you’ll use Dynatrace’s "Maintenance Windows" feature. You can access this through the Dynatrace UI under "Settings" -> "Events & alerts" -> "Maintenance windows".
When you create a maintenance window, you’re not disabling the alert rule itself. You’re defining a period where Dynatrace should not evaluate the conditions for specific alerts.
Let’s create a maintenance window for our user-service deployment. We’ll go to the maintenance windows section and click "Add maintenance window".
Name: User Service Deployment - 2023-10-27
Description: Deploying new version of user-service, expect temporary error rate increase.
Start time: 2023-10-27T14:00:00Z
End time: 2023-10-27T16:00:00Z
Type: Planned
Scope: This is crucial. You need to define what this maintenance window applies to. You can scope it by:
* Hosts: If you’re only updating specific servers.
* Services: This is what we want. We’ll select our user-service.
* Applications: The overall application containing the service.
* Environment: The entire Dynatrace environment (use with caution!).
* Custom properties: If your entities have specific tags or properties.
For our scenario, we’ll select "Services" and then pick our user-service (which Dynatrace identifies by its unique entity ID, e.g., SERVICE-1234567890ABCDEF).
Alert Suppression: This is where the magic happens. You have a few options:
* Suppress all alerts for scoped entities: This would silence everything for the user-service. Usually too broad.
* Suppress alerts matching specific tags: You could tag your alert rule with maintenance and then tell the window to suppress alerts with that tag.
* Suppress specific alert rules: This is often the most precise. You’d select the "High Error Rate in User Service" alert rule we defined earlier.
We’ll choose to "Suppress specific alert rules" and select our "High Error Rate in User Service" rule.
Why this works: When the maintenance window is active, Dynatrace still collects data for the user-service and still evaluates the alert conditions. However, the internal logic of the alerting engine is modified. Instead of triggering an alert when the com.dynatrace.builtin:service.errors.server metric exceeds 10 for 5 minutes on SERVICE-1234567890ABCDEF, it checks if an active, applicable maintenance window is present for that specific alert rule and entity. If one is found, the alert is not fired. The data is still there, and the alert rule is still configured, but the action of firing is deferred or prevented.
You can also use the Dynatrace API to programmatically create and manage maintenance windows, which is ideal for automated deployments. For example, using curl to create a planned maintenance window:
curl -X POST \
'https://{your-environment-id}.live.dynatrace.com/api/v2/maintenancewindows' \
-H 'Content-Type: application/json' \
-H 'Authorization: Api-Token {your-api-token}' \
-d '{
"name": "API Managed User Service Deploy",
"description": "Maintenance window for automated user-service deployment",
"type": "PLANNED",
"start": "2023-10-27T14:00:00Z",
"end": "2023-10-27T16:00:00Z",
"scope": {
"type": "SERVICE",
"id": "SERVICE-1234567890ABCDEF"
},
"suppression": {
"alertRuleIds": ["ALERT-RULE-XYZ789"]
}
}'
(Replace {your-environment-id}, {your-api-token}, and ALERT-RULE-XYZ789 with your actual values.)
The key takeaway is that Dynatrace doesn’t stop collecting data or evaluating conditions during a maintenance window; it simply abstains from notifying you based on those conditions for the specified scope and rules. The alert rule itself remains active and will resume firing immediately after the maintenance window ends if the conditions are still met.
The next thing you’ll likely run into is needing to ensure that your maintenance windows are properly cleaned up, especially if they are planned and not automatically ending.