Cloudflare logs are a goldmine of information, but keeping them long-term in Cloudflare is expensive and impractical. Pushing them to S3 is the standard way to archive them for compliance, deep dives, or even training ML models.

Let’s see this in action. Imagine you’re debugging a sudden spike in 5xx errors. You’ve got your logs flowing into S3, and you can query them directly with Athena.

SELECT
    count(*) as error_count,
    ray_id,
    client_request.host as hostname,
    client_request.uri as uri,
    client_request.method as method
FROM
    "your_log_database"."your_log_table"
WHERE
    http_response.status >= 500
    AND date_parse(event_timestamp, '%Y-%m-%dT%H:%i:%s.%fZ') BETWEEN TIMESTAMP '2023-10-27 00:00:00' AND TIMESTAMP '2023-10-27 23:59:59'
GROUP BY
    ray_id, hostname, uri, method
ORDER BY
    error_count DESC
LIMIT 100;

This query, run in AWS Athena against your S3-bucketed Cloudflare logs, immediately surfaces the top offending ray_ids, hostnames, and URIs experiencing 5xx errors on that specific day.

The core problem Cloudflare logpush solves is the cost and accessibility barrier of long-term log retention within Cloudflare itself. Cloudflare’s free and Pro plans offer very limited log retention, and even higher tiers can become prohibitively expensive for large volumes of data. By pushing logs to S3, you gain:

  • Cost-Effectiveness: S3 storage is significantly cheaper than Cloudflare’s log retention.
  • Durability & Availability: S3 offers robust durability and availability guarantees for your historical data.
  • Advanced Analytics: You can leverage powerful AWS services like Athena, Redshift, or even EMR to query, analyze, and process your logs at scale.
  • Compliance: Many industries require long-term log retention for auditing and compliance purposes.

The magic behind Cloudflare Logpush is its webhook-style delivery. You configure an endpoint (typically an S3 bucket) and Cloudflare streams your chosen log types (Access, Firewall, WAF, etc.) to that destination in near real-time. It’s not a batch export; it’s a continuous flow.

Log Types and Destinations

Cloudflare supports pushing several log types:

  • Access Logs: Information about requests made to your origin servers. This includes client IP, user agent, request method, URI, status code, Ray ID, and more.
  • Firewall Rules Logs: Details about requests that triggered specific firewall rules.
  • WAF Logs: Information about requests that were matched by Web Application Firewall rules.
  • Page Shield Logs: Data related to client-side threats detected by Page Shield.
  • Workers Logs: Logs generated by your Cloudflare Workers.

You can choose to push all logs or filter them based on specific criteria. The destination is typically an S3 bucket, but Cloudflare also supports other destinations like Splunk, Datadog, and Google Cloud Storage. For this discussion, we’ll focus on S3.

Configuration in Cloudflare

  1. Create an S3 Bucket: In your AWS account, create a new S3 bucket. It’s good practice to have a dedicated bucket for logs. Configure appropriate bucket policies for security and access. For example, to allow Cloudflare to write to your bucket, you might have a policy like this (replace YOUR_BUCKET_NAME and YOUR_AWS_ACCOUNT_ID):

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::455711641835:root"
                },
                "Action": [
                    "s3:PutObject",
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::YOUR_BUCKET_NAME",
                    "arn:aws:s3:::YOUR_BUCKET_NAME/*"
                ]
            }
        ]
    }
    

    Note: The 455711641835 AWS account ID is Cloudflare’s.

  2. Create an IAM Role (Optional but Recommended): For better security, create an IAM role in AWS that your Cloudflare Logpush configuration can assume. This role should have permissions to write to your S3 bucket. You’ll then provide the Role ARN to Cloudflare.

  3. Configure Logpush in Cloudflare:

    • Navigate to Analytics & Logs > Logpush.
    • Click Configure Logpush.
    • Select the log type (e.g., Access Logs).
    • Choose your destination: Amazon S3.
    • Enter your S3 bucket name.
    • Specify a prefix (e.g., cloudflare/access_logs/). This helps organize logs within your bucket.
    • If using an IAM role, provide the Role ARN. Otherwise, Cloudflare might prompt for AWS Access Key ID and Secret Access Key (less secure).
    • Choose a Timestamp Format (e.g., YYYY-MM-DD/HH). This dictates how Cloudflare partitions your logs into subfolders within S3, which is crucial for efficient querying. A common and effective format is YYYY-MM-DD/HH.
    • Select Fields to include. You can choose all fields or a subset.
    • Enable the configuration.

Cloudflare will then start sending logs. The logs are typically delivered in JSON format, compressed with GZIP. The partitioning strategy you choose (e.g., YYYY-MM-DD/HH) directly maps to S3 folder structures, allowing services like Athena to efficiently scan only the relevant partitions for a given time range.

The most surprising thing about Cloudflare Logpush is how granular you can get with the fields you send. While you can send everything, selectively choosing only the fields you actually need for analysis can dramatically reduce your S3 storage costs and improve query performance in tools like Athena. For instance, if you’re only ever interested in status codes and Ray IDs for error debugging, you don’t need to store IP addresses or TLS versions, saving space and cost.

The key to efficiently querying these logs in S3 using Athena lies in the partitioning scheme and the data format. When you set up Logpush, choose a partitioning format that aligns with how you’ll query. YYYY-MM-DD/HH is standard because most analysis is time-bound. Athena will automatically detect these partitions if you set up your table correctly. For example, if your S3 path is s3://your-bucket-name/cloudflare/access_logs/YYYY-MM-DD/HH/, when you create an Athena table, you’d define dt (or similar) as a string partition key and specify the path.

The next challenge you’ll face is optimizing your Athena queries for these large, partitioned datasets.

Want structured learning?

Take the full Cloudflare course →