Account-level throttling is a surprisingly blunt instrument, and its primary purpose isn’t to protect your backend services, but rather to protect the API Gateway itself from excessive, unbounded load that could lead to instability and cost overruns.
Let’s see it in action. Imagine you have an API Gateway set up and you’re seeing a surge of requests. Without account-level throttling, a runaway client or a distributed denial-of-service attack could hammer the gateway, potentially exhausting its resources or incurring massive bills before you can even detect and block the malicious source.
Here’s a basic setup in AWS CLI to illustrate:
aws apigateway update-account --patch-operations op=replace,path=/throttleSettings,value='{"burstLimit": 1000,"rateLimit": 500}'
This command sets the account-level burstLimit to 1000 requests and the rateLimit to 500 requests per second. The burstLimit is like a bucket that can hold up to 1000 requests. When requests come in, they fill this bucket. If the bucket is full, subsequent requests are throttled. The rateLimit determines how fast the bucket refills, in this case, 500 requests per second. So, even if the bucket is empty, you can’t exceed 500 requests per second.
The problem this solves is unbounded request volume. While stage-level or method-level throttling is granular and protects specific backend resources, account-level throttling acts as a global circuit breaker. It’s your first line of defense against a complete system meltdown due to sheer volume, regardless of which API or method is being hit.
Internally, the API Gateway maintains a token bucket algorithm for each account. The rateLimit dictates the refill rate of tokens into the bucket, and the burstLimit is the maximum capacity of that bucket. When a request arrives, it attempts to consume a token. If a token is available, the request is allowed. If not, it’s throttled. This is why you can briefly exceed the rateLimit up to the burstLimit – it’s the "burst" capacity.
The exact levers you control are the burstLimit and rateLimit values. These are integers. burstLimit is the maximum number of requests that can be in "in-flight" or "queued" at any given millisecond. rateLimit is the number of requests per second that the gateway will allow. Setting rateLimit to 0 effectively disables rate limiting but not burst limiting. Setting burstLimit to 0 disables burst limiting but not rate limiting. Setting both to 0 disables account-level throttling entirely.
The most surprising thing about account-level throttling is that it’s not a hard cap on your backend performance, but a soft limit on the gateway’s capacity to serve requests on your behalf. If your backend can handle 2000 requests per second, but you set your account-level rateLimit to 500, you will be throttled at 500 requests per second, even though your backend is perfectly capable of handling more. The gateway itself becomes the bottleneck, not your application. This often leads to confusion when developers expect the gateway to simply pass through traffic until their backend shows signs of strain, when in reality, the gateway is protecting itself first.
The next concept you’ll likely encounter is how to define more nuanced throttling policies at the API or method level to protect specific backend integrations.