Datadog’s agent is failing to collect metrics from your containers because a filter in the Datadog agent’s configuration is actively excluding them.
Here’s what’s likely happening and how to fix it:
Cause 1: Wildcard Mismatch in exclude_filters
You’ve probably got a broad wildcard in your exclude_filters that’s catching more than you intended.
Diagnosis:
Check your datadog-agent.yaml file (often located at /etc/datadog-agent/datadog-agent.yaml on Linux or within the agent’s data directory on other OSes). Look for the exclude_filters section.
Example:
# datadog-agent.yaml
logs_enabled: true
filters:
exclude_filters:
- 'exclude_container_labels:com.datadog.agent.logs.exclude=true'
- 'exclude_container_env_vars:MY_APP_ENV=staging'
- 'exclude_image_name:my-internal-repo/my-app-staging:*'
In this example, if you have containers with labels that don’t match com.datadog.agent.logs.exclude=true, or environment variables that aren’t MY_APP_ENV=staging, or image names that aren’t in the my-internal-repo/my-app-staging pattern, they will not be excluded by these specific filters. The problem is likely a more general wildcard.
Fix:
Carefully review your exclude_filters. If you see a pattern like exclude_image_name: '*', or a label/env var filter that’s too broad, refine it.
Example Refined Fix: If you intended to exclude only staging containers and accidentally used a too-broad filter, change it to something specific:
# datadog-agent.yaml
logs_enabled: true
filters:
exclude_filters:
- 'exclude_image_name:my-internal-repo/my-app-staging:*' # More specific
- 'exclude_container_labels:com.datadog.agent.logs.exclude=true'
- 'exclude_container_env_vars:MY_APP_ENV=staging'
Why it works: This makes the exclusion rule more precise, only targeting containers that precisely match the specified image name pattern, thus allowing other containers through.
Cause 2: Incorrectly Formatted exclude_filters
The syntax for exclude_filters is quite specific. A typo can render the filter ineffective or, worse, cause it to exclude everything.
Diagnosis:
Again, scrutinize datadog-agent.yaml for the filters.exclude_filters section. Pay close attention to colons, quotes, and the exact phrasing of the filter types (exclude_image_name, exclude_container_labels, exclude_container_env_vars).
Example of Incorrect Format:
# datadog-agent.yaml
filters:
exclude_filters:
- 'exclude_image_name my-app:*' # Missing colon
- 'exclude_container_labels com.datadog.agent.logs.exclude=true' # Missing colon
Fix:
Ensure each filter follows the filter_type:value format, enclosed in single quotes.
Example Corrected Format:
# datadog-agent.yaml
filters:
exclude_filters:
- 'exclude_image_name:my-app:*'
- 'exclude_container_labels:com.datadog.agent.logs.exclude=true'
Why it works: The Datadog agent parses these filters strictly. Correcting the syntax ensures the agent can properly interpret and apply the exclusion rules.
Cause 3: Overlapping include_filters and exclude_filters
If you have both include_filters and exclude_filters configured, the interaction can be tricky. Exclusions generally take precedence, but complex combinations can lead to unexpected behavior.
Diagnosis:
Examine both filters.include_filters and filters.exclude_filters in datadog-agent.yaml.
Example Scenario:
# datadog-agent.yaml
filters:
include_filters:
- 'include_image_name:my-app:*'
exclude_filters:
- 'exclude_image_name:my-app-staging:*'
In this case, my-app-staging containers would be excluded even though they match the include_filter.
Fix:
Simplify your filter strategy. If you’re trying to include specific items and exclude others, ensure your exclude_filters are precise enough not to catch your intended include_filters. Often, it’s better to have a very specific include_filter and minimal or no exclude_filters.
Example Refined Fix:
If you want only production my-app containers:
# datadog-agent.yaml
filters:
include_filters:
- 'include_image_name:my-app:production' # Very specific include
exclude_filters: [] # Remove or empty exclude filters if not strictly needed
Why it works: By making the include_filter highly specific and removing potentially conflicting exclude_filters, you ensure only the desired containers are processed.
Cause 4: Agent Not Restarted After Configuration Change
You’ve made the change, but the agent is still running with the old configuration.
Diagnosis:
Check the agent’s status to see if it’s running. If you’ve edited datadog-agent.yaml, the changes won’t take effect until the agent is restarted.
Fix: Restart the Datadog agent. The command depends on your OS and how you installed the agent.
Common Restart Commands:
- Systemd (most modern Linux):
sudo systemctl restart datadog-agent - SysVinit (older Linux):
sudo service datadog-agent restart - Docker (if running agent as a container):
docker restart <datadog-agent-container-id> - Kubernetes (DaemonSet):
Delete the pod, and the DaemonSet will recreate it.
kubectl delete pod <datadog-agent-pod-name> -n <datadog-namespace>
Why it works: Restarting the agent forces it to reload its configuration file, applying the corrected filter settings.
Cause 5: Misunderstanding Container Label/Env Var Filtering
You might be trying to filter based on labels or environment variables that don’t actually exist on your containers, or are misspelled.
Diagnosis: Inspect the labels and environment variables of a problematic container.
- Docker:
docker inspect <container-id> | grep -E '"Labels":|\\"Env":' - Kubernetes:
Look for thekubectl describe pod <pod-name> -n <namespace>LabelsandEnvironment:sections.
Example Mismatch:
You have exclude_container_env_vars:MY_APP_ENV=staging in your Datadog config, but the container’s environment variable is actually APP_ENV=staging (missing MY_).
Fix:
Correct the filter in datadog-agent.yaml to precisely match the actual label key/value or environment variable name/value present on your containers.
Example Corrected Fix:
# datadog-agent.yaml
filters:
exclude_filters:
- 'exclude_container_env_vars:APP_ENV=staging' # Matches actual env var
Why it works: Ensures the filter criteria accurately reflect the metadata attached to the containers, allowing for correct inclusion or exclusion.
Cause 6: Using exclude_image_name with Tag vs. Digest
If you’re using image digests (sha256:...) in your Dockerfile or Kubernetes manifests, exclude_image_name might not work as expected if you’re trying to match it against a tag (latest, v1.0).
Diagnosis:
Check how your images are referenced in your deployment. Compare this to the exclude_image_name pattern.
Example Scenario:
Your datadog-agent.yaml has exclude_image_name:my-app:latest.
Your container is deployed using image: my-app@sha256:abcdef123456....
Fix:
Adjust your exclude_image_name pattern to match how the image is referenced. If you need to exclude by digest, the pattern needs to be precise. More commonly, you’d exclude by image name and tag.
Example Corrected Fix (if excluding by tag):
# datadog-agent.yaml
filters:
exclude_filters:
- 'exclude_image_name:my-app:latest' # This would exclude containers built with my-app:latest tag
If you must exclude by digest, it’s less common and requires precise matching of the digest string.
Why it works: The agent matches the exclude_image_name against the image name and tag (or digest) as reported by the container runtime. Ensuring consistency between your deployment and the agent’s filter prevents misinterpretations.
After applying these fixes and restarting the agent, you should start seeing metrics from your containers. The next issue you might encounter is Error sending metrics to the Datadog Agent: context deadline exceeded if the agent itself is overloaded or experiencing network issues connecting to the Datadog intake.