A production-ready EKS cluster isn’t just about spinning up nodes and deploying applications; it’s about building a resilient, secure, and observable system that can handle real-world demands.
Let’s look at a minimal EKS cluster running on AWS, focusing on the core components and how they interact.
provider "aws" {
region = "us-east-1"
}
resource "aws_eks_cluster" "example" {
name = "my-prod-cluster"
version = "1.28" # Always pin to a specific, supported version
vpc_config {
subnet_ids = ["subnet-xxxxxxxxxxxxxxxxx", "subnet-yyyyyyyyyyyyyyyyy"] # Replace with your actual subnet IDs
security_group_ids = ["sg-zzzzzzzzzzzzzzzzz"] # Replace with your actual security group ID
}
tags = {
Environment = "Production"
Project = "MyAwesomeApp"
}
}
resource "aws_eks_node_group" "example" {
cluster_name = aws_eks_cluster.example.name
node_group_name = "my-prod-nodegroup"
node_role_arn = "arn:aws:iam::123456789012:role/eksNodeRole" # Replace with your actual EKS node role ARN
subnet_ids = ["subnet-xxxxxxxxxxxxxxxxx", "subnet-yyyyyyyyyyyyyyyyy"] # Must match cluster's VPC config
scaling_config {
desired_size = 3
max_size = 5
min_size = 1
}
instance_types = ["t3.medium"] # Choose instance types appropriate for your workload
tags = {
Environment = "Production"
Project = "MyAwesomeApp"
}
}
# Example: Add an EKS Add-on for VPC CNI (Kubernetes Network Interface)
resource "aws_eks_addon" "vpc_cni" {
cluster_name = aws_eks_cluster.example.name
addon_name = "vpc-cni"
addon_version = "v1.15.0-eksbuild.1" # Pin to a specific version for stability
}
This Terraform code defines an EKS cluster named my-prod-cluster in us-east-1. It specifies two private subnets and a security group for its VPC configuration. Crucially, it pins the Kubernetes version to 1.28. A node group named my-prod-nodegroup is provisioned with t3.medium instances, configured to scale between 1 and 5 nodes with a desired state of 3. The vpc-cni add-on is also explicitly managed, ensuring a specific version is deployed.
The primary problem EKS solves is abstracting away the complexities of managing Kubernetes control planes and worker nodes, allowing you to focus on deploying and managing your containerized applications. You can think of EKS as providing two main components: the managed Kubernetes control plane (API server, etcd, scheduler, etc.) and the worker nodes (where your pods run). EKS handles the availability, patching, and scaling of the control plane. You are responsible for provisioning and managing the worker nodes, though EKS offers managed node groups and Fargate for further abstraction.
The aws_eks_cluster resource defines the control plane. The version parameter is critical; always pin this to a specific, supported Kubernetes version. AWS manages the lifecycle of the control plane, including upgrades. The vpc_config block defines the networking for your cluster. For production, you’ll typically use private subnets for your worker nodes to enhance security, with appropriate NACLs and security groups controlling ingress/egress. The aws_eks_node_group resource manages the EC2 instances that act as your worker nodes. The scaling_config block allows for automatic scaling based on demand, and instance_types should be selected based on your workload’s CPU, memory, and network requirements.
One of the most overlooked aspects of a production EKS cluster is the careful management of IAM roles and permissions. The EKS node group requires an IAM role (node_role_arn) with specific policies attached, allowing nodes to register with the EKS control plane, pull container images, and interact with other AWS services. Similarly, your applications running within pods will often need to interact with AWS services, which is typically achieved using IAM Roles for Service Accounts (IRSA). This involves creating an IAM role, attaching an OIDC provider to your EKS cluster, and then associating the IAM role with a Kubernetes Service Account. This is a much more secure approach than embedding AWS credentials directly into your applications or using instance profiles for every pod.
Beyond the core cluster setup, several add-ons are essential for production. The vpc-cni add-on is fundamental for pod networking, allowing pods to get IP addresses from your VPC CIDR range. Other critical add-ons include kube-proxy (for service abstraction), coredns (for DNS resolution within the cluster), and the aws-load-balancer-controller (for provisioning AWS Load Balancers for your Kubernetes Services). You also need robust logging and monitoring. Integrating with Amazon CloudWatch Container Insights or Prometheus/Grafana is a common pattern. For security, consider enabling EKS Audit Logs and shipping them to CloudWatch Logs or S3 for analysis.
The most surprising true thing about EKS is that its security posture is heavily influenced by how you configure the underlying AWS networking and IAM, not just Kubernetes RBAC. While RBAC controls access within the cluster, AWS Security Groups and Network ACLs control traffic to and from the cluster and its nodes. A misconfigured Security Group on your EKS cluster’s VPC can expose your control plane or worker nodes to the internet unintentionally, regardless of your Kubernetes network policies.
When you upgrade your EKS cluster version, AWS handles the control plane upgrade, but you are responsible for upgrading your worker nodes. This is typically done by creating a new node group with the desired Kubernetes version and then migrating your workloads, or by performing an in-place upgrade of existing managed node groups if supported for your specific version.
The next concept you’ll likely encounter is implementing robust CI/CD pipelines for deploying applications to your EKS cluster.