API Gateway access logs are your primary tool for understanding traffic hitting your services, but their default format often buries the signal in the noise.
Let’s see this in action. Imagine a request comes in.
{
"resource": "/users",
"path": "/users",
"httpMethod": "GET",
"headers": {
"Accept": "application/json",
"Authorization": "Bearer <token>",
"User-Agent": "curl/7.64.1"
},
"requestContext": {
"identity": {
"sourceIp": "192.0.2.1"
},
"requestId": "a1b2c3d4-e5f6-7890-1234-567890abcdef"
},
"stage": "prod",
"pathParameters": null,
"queryStringParameters": {
"status": "active"
},
"body": null,
"apiId": "abcdef1234",
"protocol": "HTTP/1.1"
}
This is the raw input. API Gateway, by default, logs a subset of this, often in a JSON format that’s verbose but not always tailored to what you need for quick analysis. You might be looking for specific headers, the exact latency of your backend, or a consolidated view of request and response details.
The problem is that the default log format is a one-size-fits-all approach. You can’t easily extract metrics like the number of requests per API key, the distribution of response latencies from your integrated Lambda function, or the specific user agent strings that are hitting your endpoints without significant post-processing. This makes debugging, security analysis, and performance tuning much harder.
The solution lies in customizing the access log format within your API Gateway stage settings. You define a template that specifies exactly which fields from the incoming request and outgoing response you want to capture, and how they should be structured. This allows you to create logs that are immediately useful for your specific needs.
Here’s how you’d configure a custom format. You navigate to your API Gateway in the AWS console, select your API, then go to "Stages." Choose the stage you want to configure (e.g., prod). Under the "Logs/Tracing" tab, you’ll find "Access logging." You’ll enable it and then specify a "Log format."
Let’s say you want to capture the request ID, the HTTP method, the invoked backend latency, the caller’s IP address, the requested resource path, the stage, and the response status code. Your custom log format template might look like this:
{
"requestId": "$context.requestId",
"httpMethod": "$context.httpMethod",
"backendLatency": "$context.integration.latency",
"sourceIp": "$context.identity.sourceIp",
"resourcePath": "$context.resourcePath",
"stage": "$context.stage",
"responseStatusCode": "$context.status"
}
When a request is processed with this format, the log entry for the example request above would look something like this:
{
"requestId": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
"httpMethod": "GET",
"backendLatency": "250",
"sourceIp": "192.0.2.1",
"resourcePath": "/users",
"stage": "prod",
"responseStatusCode": "200"
}
This is far more digestible. You can now easily parse this for metrics. For example, to find the average backendLatency for GET /users requests in the prod stage, you can directly query your CloudWatch Logs or any other log aggregation service.
The context object is where all the magic happens. It’s a special variable in API Gateway that provides access to a rich set of runtime information. You can access request details like context.httpMethod, context.resourcePath, context.identity.sourceIp, and context.requestId. You can also access integration details like context.integration.latency (the time taken by your backend integration, like Lambda) and context.integration.status. If you’re using request validation, you can log context.validation statusCode. For authorizers, you can access context.authorizer.principalId.
The power comes from combining these. You can build formats that are hyper-specific to your debugging needs. For instance, if you’re troubleshooting authentication issues, you might include:
{
"requestId": "$context.requestId",
"authorizerPrincipal": "$context.authorizer.principalId",
"authType": "$context.identity.authType",
"cognitoIdentityId": "$context.identity.cognitoIdentityId",
"resourcePath": "$context.resourcePath"
}
This helps you trace exactly which authorizer information is being passed and how it relates to the request.
Crucially, when you include context.integration.latency, be aware that this measures the time from when API Gateway receives the response from your backend integration until it sends its own response back to the client. It does not include the network latency between API Gateway and your backend, nor does it include the time API Gateway spends processing the request before invoking the integration (e.g., authorization, throttling). For more granular timing, you’d need to instrument your backend service itself.
If you’ve configured custom domain names, you can also log context.domainName and context.domainPrefix. For WebSocket APIs, you’ll have access to different context variables like context.connectionId.
The most surprising thing about customizing log formats is how much granular control you have over the timing of when certain pieces of information are captured. For example, context.integration.latency is only populated after the backend integration has responded. If your integration times out before sending a response, this field might be empty or represent a partial duration, which is vital for distinguishing between a slow backend and a completely unresponsive one.
Once you have your logs in a structured format, you can create CloudWatch Metrics filters. For example, to create a metric for 5xx errors:
- Go to CloudWatch -> Log groups.
- Select your API Gateway log group (e.g.,
/aws/api-gateway/<api-id>/<stage-name>). - Click "Metric filters."
- Create metric filter.
- Filter pattern:
{ $.responseStatusCode >= 500 } - Select "Create new metric."
- Metric name:
ApiGateway5xxErrors - Metric namespace:
MyApiMetrics - Metric value:
1(this counts each matching log entry) - Click "Create filter."
Now you have a real-time metric for 5xx errors that you can alarm on.
The next logical step after mastering access log formats is to explore request and response transformation.