The Datadog Forwarder failed to send logs to the Datadog API because an upstream network component rejected its connection attempts too many times.

This usually means your Datadog Forwarder, a small agent running on your hosts, can’t reach the Datadog API endpoints. It’s not that Datadog is down, but that the network path between your hosts and Datadog is blocked or misconfigured.

Cause 1: Network ACLs or Security Groups Blocking Outbound Traffic

Your cloud provider’s network access control lists (ACLs) or security groups are likely preventing the Forwarder from making outbound HTTPS (port 443) connections to Datadog’s IP ranges.

  • Diagnosis: On your AWS EC2 instance, check your Security Group rules associated with the instance. On GCP, check your Firewall rules. Look for rules that allow 0.0.0.0/0 (or specific Datadog IPs) on TCP port 443.
  • Fix: Add or modify your security group/firewall rules to explicitly allow outbound TCP traffic on port 443 to Datadog’s IP ranges. For example, in AWS, you’d edit the Security Group attached to your EC2 instances and add an outbound rule: Type: Custom TCP, Port Range: 443, Destination: 0.0.0.0/0. If you know Datadog’s IPs, you can restrict it further, but 0.0.0.0/0 is common for general outbound access.
  • Why it works: This explicitly permits the Forwarder agent to initiate connections to Datadog’s API servers over the internet.

Cause 2: Proxy Configuration Issues

If your network requires outbound traffic to go through a proxy server, the Datadog Forwarder might not be configured to use it, or the proxy itself might be misconfigured or overloaded.

  • Diagnosis: Check the Datadog Forwarder configuration file (usually /etc/datadog-agent/datadog.yaml on Linux or within the agent’s installation directory on Windows). Look for http_proxy and https_proxy settings. Also, check your proxy server’s logs for connection refusals or errors related to Datadog’s API endpoints (e.g., api.datadoghq.com).
  • Fix: Ensure the https_proxy and http_proxy settings in datadog.yaml point to your valid proxy server and port. Example:
    # In datadog.yaml
    https_proxy: http://your-proxy.example.com:8080
    http_proxy: http://your-proxy.example.com:8080
    
    If the proxy is the issue, you’ll need to troubleshoot your proxy server itself.
  • Why it works: This directs the Forwarder’s outbound requests through the designated proxy, which then forwards them to Datadog’s API.

Cause 3: DNS Resolution Problems

The Forwarder might be unable to resolve Datadog’s API endpoints (e.g., api.datadoghq.com) to their correct IP addresses.

  • Diagnosis: On the host running the Forwarder, try to ping or curl the Datadog API endpoint:
    ping api.datadoghq.com
    # or
    curl -v https://api.datadoghq.com
    
    If these commands fail with "unknown host" or similar, DNS is the problem. Check your host’s /etc/resolv.conf (Linux) or network adapter settings (Windows) to ensure it’s pointing to valid DNS servers.
  • Fix: Correct your host’s DNS configuration to use resolvable DNS servers. For example, update /etc/resolv.conf to point to Google’s DNS (8.8.8.8) or your internal DNS server:
    nameserver 8.8.8.8
    nameserver 8.8.4.4
    
    Then restart the Datadog agent.
  • Why it works: Correct DNS resolution provides the Forwarder with the actual IP addresses it needs to connect to Datadog’s servers.

Cause 4: Datadog Agent Not Running or Malfunctioning

While less common for "too many retries" (more likely for "connection refused" or "host unreachable"), a severely degraded agent state could manifest this way.

  • Diagnosis: Check the agent’s status:
    sudo datadog-agent status
    
    Look for any errors reported in the output or in the agent’s logs (/var/log/datadog/agent.log or similar).
  • Fix: Restart the Datadog agent:
    sudo datadog-agent restart
    
    If issues persist, consider reinstalling the agent.
  • Why it works: A fresh restart can clear internal state corruption or resource exhaustion within the agent process.

Cause 5: Network Latency or Packet Loss

High network latency or consistent packet loss between your host and Datadog’s API endpoints can cause TCP connection attempts to time out, leading to retries and eventual failure.

  • Diagnosis: Use mtr (My Traceroute) or ping with a larger packet size for an extended period to check for packet loss and latency to api.datadoghq.com.
    mtr --report api.datadoghq.com
    
    Look for any hops showing significant packet loss (above 1-2%) or consistently high latency.
  • Fix: This often requires network infrastructure troubleshooting. You might need to involve your network team to identify and resolve routing issues, congestion, or firewall performance problems along the path to Datadog. There’s no direct agent-level fix.
  • Why it works: Reducing latency and eliminating packet loss allows TCP connections to establish reliably and quickly.

Cause 6: Datadog API Rate Limiting (Less Common for this Specific Error)

While "Too Many Retries" usually points to network issues before reaching Datadog, in rare cases, if all your agents are hitting Datadog simultaneously and exceeding their allocated rate limits, Datadog might start rejecting connections temporarily. This is more likely to manifest as 429 Too Many Requests errors in agent logs if you could inspect them directly, but the forwarding layer might abstract it to retries.

  • Diagnosis: Check your Datadog account’s usage and limits. Look for any notifications or alerts within Datadog about exceeding intake limits. This is very unlikely to be the primary cause of the "Too Many Retries" error seen at the network layer.
  • Fix: If this is the case, you’ll need to scale up your Datadog plan or investigate why your agents are sending an excessive volume of data. This might involve filtering logs at the source or adjusting collection configurations.
  • Why it works: Ensuring your data volume stays within your Datadog plan limits prevents the API from throttling your requests.

After resolving these, you might encounter ERR: No such host if DNS is still problematic, or ERR: dial tcp: lookup api.datadoghq.com: no such host if the issue is purely DNS resolution.

Want structured learning?

Take the full Datadog course →