The most surprising thing about API Gateway quotas is that they’re not really about limiting your usage, but about preventing a single customer from bogging down the entire shared infrastructure.
Let’s see this in action. Imagine you’ve got a new service, user-service, and you want to expose it via API Gateway. You’ve set up a GET /users method.
{
"paths": {
"/users": {
"get": {
"summary": "Get user list",
"operationId": "listUsers",
"responses": {
"200": {
"description": "A list of users"
}
},
"x-amazon-apigateway-integration": {
"type": "aws_proxy",
"integrationHttpMethod": "POST",
"uri": "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:123456789012:function:user-service-dev/invocations",
"credentials": "arn:aws:iam::123456789012:role/apigateway-lambda-exec-role"
}
}
}
}
}
By default, API Gateway has a throttle limit of 10,000 requests per second (RPS) per API. That sounds like a lot, but if your user-service suddenly becomes popular and starts hitting this limit, you’ll see 429 Too Many Requests errors, not just for your users, but potentially impacting other APIs sharing the same underlying infrastructure.
The problem API Gateway quotas solve is this: imagine a thousand different applications all hitting the same API Gateway endpoint. Without limits, one rogue application could consume all available capacity, starving the others. So, API Gateway imposes these limits to ensure fair usage and stability for everyone.
Internally, API Gateway uses a token bucket algorithm for throttling. Each API has a burst capacity and a steady-state rate. When a request comes in, it tries to consume a token. If a token is available, the request is allowed. If not, it’s throttled. The burst capacity allows for short spikes of traffic above the steady-state rate, while the steady-state rate ensures consistent throughput over time.
You control these limits at the API level, and you can also set per-stage and per-method overrides. To increase the default 10,000 RPS, you’d navigate to your API in the AWS console, select "Throttling" from the left-hand menu, and adjust the "Rate" and "Burst" values.
For example, to increase the rate to 20,000 RPS and the burst to 30,000, you’d input these values. The "Rate" here is the steady-state number of requests per second, and "Burst" is the maximum number of concurrent requests that can be handled. Increasing the "Rate" means API Gateway will allow up to 20,000 requests per second on average. Increasing the "Burst" allows for temporary spikes of up to 30,000 requests in a very short period, giving your application some breathing room.
Beyond the API-wide settings, you can also define usage plans. These plans allow you to associate APIs with specific API keys and set custom throttling and quota limits per API key. This is where you can truly differentiate between customers.
Let’s say you have a "Free" tier customer and a "Premium" tier customer. You’d create two usage plans:
-
Free Tier Plan:
- Associated API Keys:
key_free_tier_abc - Rate: 100 RPS
- Burst: 200
- Quota: 1,000,000 requests per month
- Associated API Keys:
-
Premium Tier Plan:
- Associated API Keys:
key_premium_tier_xyz - Rate: 1,000 RPS
- Burst: 2,000
- Quota: 30,000,000 requests per month
- Associated API Keys:
When a request comes in with key_premium_tier_xyz, API Gateway checks the limits defined in the "Premium Tier Plan" rather than the API’s default throttling. The "Quota" setting is a hard limit on the total number of requests allowed within a specified period (e.g., daily, weekly, monthly), which is distinct from the RPS throttling.
The one thing most people don’t realize is that increasing your API’s overall throttle limit also increases the potential blast radius if something goes wrong. If your API Gateway is configured with a massive throttle limit and your backend service has a bug that causes an infinite loop or excessive resource consumption, it can now impact a much larger volume of requests before hitting the API Gateway limit, potentially overwhelming your backend more severely.
Once you’ve adjusted your throttling and quota settings, the next thing you’ll likely encounter is configuring caching to improve response times for frequently accessed data.