The most surprising thing about blocking bots at the CDN edge is that you’re probably already doing it, but not as effectively as you think.

Let’s watch a request from a malicious bot trying to scrape a product catalog.

curl -A "BadBot/1.0" -H "X-Forwarded-For: 192.0.2.1" https://your-awesome-site.com/products

A typical CDN, like Cloudflare or Akamai, sees this. First, it checks the User-Agent string. "BadBot/1.0" is a dead giveaway. Most CDNs have a default rule that flags or blocks known bad user agents. If that doesn’t catch it, the CDN looks at the IP address, 192.0.2.1. It might check this IP against threat intelligence feeds. Is it a known botnet IP? A Tor exit node? A datacenter IP commonly used for scraping? If so, it gets blocked.

If the bot is a bit smarter and spoofs a legitimate user agent like Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36, the CDN can’t rely on simple signature matching. This is where behavioral analysis and rate limiting come in. The CDN observes the pattern of requests from 192.0.2.1. Is it making hundreds of requests per minute to /products? Is it accessing pages in a non-human sequence, like fetching product_1.html, then product_500.html, then product_2.html? Most CDNs can be configured to detect and block such anomalous behavior.

Here’s a simplified Cloudflare configuration snippet that illustrates some of these controls:

{
  "rules": [
    {
      "description": "Block known bad user agents",
      "expression": "http.user_agent contains \"BadBot\" or http.user_agent contains \"ScrapingBot\"",
      "action": "block"
    },
    {
      "description": "Block requests from known malicious IPs",
      "expression": "ip.src in {192.0.2.1 198.51.100.5}",
      "action": "block"
    },
    {
      "description": "Rate limit aggressive scrapers",
      "expression": "(http.request.uri.path contains \"/products\" or http.request.uri.path contains \"/api\")",
      "rate_limit": {
        "requests_per_period": 100,
        "period_seconds": 60,
        "action": "block"
      }
    },
    {
      "description": "Challenge suspicious browser behavior",
      "expression": "cf.threat_score > 5",
      "action": "challenge"
    }
  ]
}

In this example, the first rule directly blocks requests with specific malicious user agents. The second rule blocks IPs known to be problematic. The third rule implements rate limiting: if any single IP makes more than 100 requests to /products or /api within a minute, it’s blocked. The fourth rule, cf.threat_score > 5, leverages Cloudflare’s internal threat scoring, which considers factors like IP reputation, TLS fingerprint, and behavioral patterns to assign a score. A score above 5 triggers a JavaScript challenge, which most sophisticated bots cannot solve.

The real power of edge blocking is that it stops unwanted traffic before it consumes your origin server’s resources, reduces bandwidth costs, and prevents sensitive data from being exfiltrated. It also protects against application-level attacks like credential stuffing or pricing abuse that would otherwise hit your backend.

What most people don’t realize is how much the CDN’s "threat score" or similar metrics are a dynamic aggregation of thousands of signals, not just simple IP or user-agent blocklists. It includes analyzing the TLS handshake characteristics (which differ between browser clients and bot clients), looking for unusual HTTP header combinations, and even analyzing the timing and sequence of requests within a session. A bot might pass the user-agent check and not hit a simple rate limit, but its TLS fingerprint might be identical to thousands of other requests from the same IP range, or it might exhibit a lack of typical browser interaction patterns, all contributing to a higher threat score.

The next step after effectively blocking bots is understanding and mitigating cache-busting techniques they employ.

Want structured learning?

Take the full Cdn course →