Graviton3 instances can be up to 40% cheaper and 30% faster than comparable x86 instances for workloads that are sensitive to CPU performance.

Let’s see this in action. Imagine we’re running a common web server workload, like Nginx, serving static assets.

Here’s a basic Nginx configuration on a Graviton3 c7g.xlarge instance:

worker_processes auto;
events {
    worker_connections 1024;
}
http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server {
        listen 80;
        server_name localhost;
        location / {
            root /usr/share/nginx/html;
            index index.html index.htm;
        }
    }
}

And here’s the equivalent on an x86 c6i.xlarge instance:

# Configuration is identical for Nginx itself. The difference is in the underlying hardware.
worker_processes auto;
events {
    worker_connections 1024;
}
http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server {
        listen 80;
        server_name localhost;
        location / {
            root /usr/share/nginx/html;
            index index.html index.htm;
        }
    }
}

The Nginx configuration is the same. The magic happens at the hardware and operating system level.

When we send traffic to these instances using a tool like ab (ApacheBench), we can observe the performance difference.

On the c7g.xlarge (Graviton3):

ab -n 10000 -c 100 http://<your-graviton3-instance-ip>/

We might see results like:

Server Software:        nginx/1.20.1
Server Hostname:        localhost
Server Port:            80

Document Path:          /
Document Length:        52 bytes

Concurrency Level:      100
Time taken for tests:   8.500 seconds
Complete requests:      10000
Failed requests:        0
Requests per second:    1176.47 [#/sec] (mean)
Time per request:       8.500 [ms] (mean)
Time per request:       0.085 [ms] (mean, across concurrent requests)
Transfer rate:          60.34 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     1    8   1.5      8      15
Waiting:        0    7   1.4      7      14
Total:          1    8   1.5      8      15

On the c6i.xlarge (x86):

ab -n 10000 -c 100 http://<your-x86-instance-ip>/

We might see results like:

Server Software:        nginx/1.20.1
Server Hostname:        localhost
Server Port:            80

Document Path:          /
Document Length:        52 bytes

Concurrency Level:      100
Time taken for tests:   10.200 seconds
Complete requests:      10000
Failed requests:        0
Requests per second:    980.39 [#/sec] (mean)
Time per request:       10.200 [ms] (mean)
Time per request:       0.102 [ms] (mean, across concurrent requests)
Transfer rate:          50.12 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     1    9   1.8      9      18
Waiting:        0    8   1.7      8      17
Total:          1    9   1.8      9      18

In this simplified example, Graviton3 shows higher requests per second and lower time per request. This is because Graviton3, based on the Arm Neoverse N2 cores, offers a more efficient instruction set architecture (ISA) for many modern workloads. It achieves higher performance-per-watt, which translates directly into better throughput and lower latency for CPU-bound tasks. The underlying architecture is designed for high core counts and excellent memory bandwidth, crucial for handling many concurrent connections efficiently.

The fundamental problem Graviton3 solves is the increasing cost and thermal efficiency limitations of traditional x86 architectures for certain high-performance computing tasks. AWS designed Graviton processors to offer a custom silicon solution that is optimized for cloud workloads, delivering a better price-performance ratio. The instances are built on the Arm Neoverse core architecture, which is known for its power efficiency and strong performance in server environments.

When you choose a Graviton3 instance, you’re essentially opting for a CPU that’s been specifically engineered for the demands of cloud computing. This means it has a high number of cores, large caches, and a memory subsystem designed to keep those cores fed with data. For applications that can leverage these strengths, like web servers, microservices, databases, and high-performance computing (HPC) simulations, the benefits are substantial. The compiler toolchains (like GCC and Clang) have mature support for Arm64, meaning most software will compile and run without modification.

The key levers you control are instance type selection and ensuring your software stack is compiled for the aarch64 (Arm 64-bit) architecture. For many compiled languages (C, C++, Go, Rust) and interpreted languages where the runtime is optimized (Python, Node.js), running on Graviton3 is as simple as launching an instance and deploying your existing code. For containerized applications, you’ll build your Docker images for the aarch64 platform.

The performance uplift isn’t just about raw clock speed; it’s about architectural design. Graviton3’s design prioritizes efficient instruction execution and high memory bandwidth. This means that for operations that involve processing a lot of data or handling many parallel threads, Graviton3 can often complete the work faster and with less energy than a comparable x86 chip. AWS has invested heavily in optimizing the underlying infrastructure, including the Nitro System, to ensure Graviton instances provide a seamless and high-performing experience.

What often surprises people is how little code modification is required for many applications. Most modern languages and frameworks have excellent support for Arm64. The real work is often in the build and deployment process, ensuring your artifacts are compiled for the correct architecture. For example, when building a Go application, you simply set the GOARCH=arm64 environment variable before running go build.

The next logical step is to explore Graviton3 instances for your memory-intensive workloads, like large in-memory databases or caching layers, where its enhanced memory bandwidth can provide even greater benefits.

Want structured learning?

Take the full Ec2 course →