To maximize EC2 network bandwidth, you must first understand that it’s not a single, fixed value, but a dynamic capability influenced by instance type, operating system, and traffic patterns.

Let’s see this in action. Imagine we have a c5.xlarge instance. By default, it might not be hitting its theoretical peak.

# On the EC2 instance
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216
sudo sysctl -w net.ipv4.tcp_rmem='4096 87380 16777216'
sudo sysctl -w net.ipv4.tcp_wmem='4096 16384 16777216'

These commands tune TCP buffer sizes. rmem_max and wmem_max set the maximum receive and send buffer sizes, respectively, to 16MB. The tcp_rmem and tcp_wmem parameters define the minimum, default, and maximum sizes for TCP receive and send buffers. By increasing the maximum values, we allow TCP to dynamically allocate larger buffers, which is crucial for high-bandwidth, high-latency connections where larger buffers can hold more in-flight data, preventing congestion and maximizing throughput.

The core problem EC2 network bandwidth tuning addresses is the discrepancy between an instance’s theoretical network capacity and its actual observed performance. This gap arises from a combination of OS-level configurations, network interface settings, and the way applications utilize network resources.

Here’s how it works internally: EC2 instances connect to the AWS network fabric through virtual network interfaces (VNICs). The bandwidth available is determined by the instance type’s placement group, VPC, and the underlying AWS network infrastructure. However, the operating system’s networking stack, with its default buffer sizes and congestion control algorithms, can become a bottleneck. For instance, a small TCP receive window means the sender has to wait for acknowledgments more frequently, limiting data flow.

The levers you control are primarily within the instance’s operating system and how you configure your application’s network usage. This includes:

  • Instance Type Selection: Different instance types offer vastly different network performance. c5, m5, and r5 instances, for example, generally offer higher bandwidth than older generations. Graviton instances (m6g, c6g, r6g) also provide excellent network performance.
  • Enhanced Networking: For most modern instance types, Enhanced Networking is enabled by default, providing higher packet per second (PPS) performance and lower CPU utilization. This uses SR-IOV (Single Root I/O Virtualization).
  • Jumbo Frames: While not universally supported or always beneficial, enabling Jumbo Frames (MTU > 1500) on supported instance types and VPCs can reduce the overhead of packet processing for large data transfers. You’d typically set the MTU on the network interface: sudo ip link set eth0 mtu 9000.
  • TCP Tuning: As shown above, adjusting sysctl parameters for TCP buffer sizes (net.core.rmem_max, net.core.wmem_max, net.ipv4.tcp_rmem, net.ipv4.tcp_wmem).
  • Application-Level Buffering: Applications themselves often have internal buffers for sending and receiving data. Ensuring these are adequately sized can prevent them from becoming the bottleneck.
  • Placement Groups: Spreading instances across different Availability Zones within a region can limit inter-AZ bandwidth. For maximum throughput between instances, consider using a Cluster Placement Group.

The one thing most people don’t know is that the default TCP congestion control algorithm (cubic) might not be optimal for all high-bandwidth scenarios, especially those with high latency. Algorithms like bbr (Bottleneck Bandwidth and Round-trip propagation time) can sometimes yield significantly better throughput by more actively probing the network for available bandwidth rather than relying solely on packet loss as a signal for congestion. You can experiment with changing this: sudo sysctl -w net.ipv4.tcp_congestion_control=bbr.

After tuning your TCP buffers and potentially the congestion control algorithm, the next step is to benchmark your performance using tools like iperf3 to confirm improvements and identify any remaining bottlenecks.

Want structured learning?

Take the full Ec2 course →