EKS Setup and Configuration#

Amazon EKS runs the Kubernetes control plane for you – managed etcd, API server, and controller manager across multiple AZs. You are responsible for the worker nodes, networking configuration, and add-ons.

Cluster Creation Methods#

eksctl is the fastest path for a working cluster. It creates the VPC, subnets, NAT gateway, IAM roles, node groups, and kubeconfig in one command:

eksctl create cluster \
  --name my-cluster \
  --region us-east-1 \
  --version 1.31 \
  --nodegroup-name workers \
  --node-type m6i.large \
  --nodes 3 \
  --nodes-min 2 \
  --nodes-max 10 \
  --managed

For repeatable setups, use a ClusterConfig file:

# cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
  region: us-east-1
  version: "1.31"
managedNodeGroups:
  - name: workers
    instanceType: m6i.large
    minSize: 2
    maxSize: 10
    desiredCapacity: 3
    volumeSize: 50
    iam:
      withAddonPolicies:
        ebs: true
        albIngress: true

Apply with eksctl create cluster -f cluster.yaml. Add spot node groups with spot: true and instanceTypes as a list for diversification.

Terraform gives full control and fits into existing IaC pipelines. The terraform-aws-modules/eks/aws module handles the boilerplate:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "my-cluster"
  cluster_version = "1.31"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  cluster_endpoint_public_access = true

  eks_managed_node_groups = {
    workers = {
      instance_types = ["m6i.large"]
      min_size       = 2
      max_size       = 10
      desired_size   = 3

      labels = { role = "worker" }
    }
  }

  cluster_addons = {
    coredns    = { most_recent = true }
    kube-proxy = { most_recent = true }
    vpc-cni    = { most_recent = true }
    aws-ebs-csi-driver = { most_recent = true }
  }
}

AWS Console is fine for exploration but do not use it for production. Console-created clusters are hard to reproduce and drift silently.

Node Group Types#

Managed node groups are the default choice. AWS handles AMI updates, node draining, and ASG lifecycle. You pick instance types and sizes.

Self-managed node groups run EC2 instances you manage directly. Use these when you need custom AMIs, GPU instances with specific drivers, or Windows nodes.

Fargate profiles run pods without any nodes. Define a profile with namespace and label selectors, and matching pods run on Fargate. Good for batch jobs or low-traffic services. Limitations: no DaemonSets, no privileged containers, no hostNetwork.

Instance Type Guidance#

For general workloads, m6i.large or m7g.large (Graviton) are solid starting points. Graviton instances offer ~20% better price-performance but require ARM64 container images.

Fewer larger nodes are better than many small ones. A cluster of m6i.xlarge (4 vCPU, 16 GiB) wastes less capacity on system overhead (kubelet, kube-proxy, VPC CNI) than equivalent resources split across m6i.medium instances.

VPC and Subnet Requirements#

EKS needs at least two subnets in different AZs. The standard pattern is:

  • Public subnets – for load balancers. Tagged with kubernetes.io/role/elb: 1.
  • Private subnets – for worker nodes and pods. Tagged with kubernetes.io/role/internal-elb: 1.
  • NAT gateway – in each public subnet so private nodes can pull images and reach AWS APIs.

All subnets must be tagged with kubernetes.io/cluster/<cluster-name>: shared (or owned if exclusively used by one cluster). Without these tags, the AWS Load Balancer Controller cannot discover subnets.

EKS Add-Ons#

EKS add-ons are AWS-managed versions of essential cluster components. Install them as add-ons rather than self-managing to get automatic updates and compatibility guarantees.

  • CoreDNS – cluster DNS. Always install.
  • kube-proxy – network rules for Service routing. Always install.
  • VPC CNI (aws-node) – assigns VPC IPs to pods. Always install.
  • EBS CSI driver – required for EBS-backed PersistentVolumes. Install if you use any stateful workloads.
  • EFS CSI driver – for shared EFS file systems. Install if you need ReadWriteMany volumes.
# List available add-on versions
aws eks describe-addon-versions --addon-name vpc-cni --kubernetes-version 1.31

# Install an add-on
aws eks create-addon --cluster-name my-cluster --addon-name aws-ebs-csi-driver \
  --service-account-role-arn arn:aws:iam::123456789012:role/ebs-csi-role

The EBS CSI driver needs an IAM role. Without it, PersistentVolumeClaims stay in Pending and volumes never attach.

kubeconfig Setup#

After the cluster is created, configure kubectl:

aws eks update-kubeconfig --name my-cluster --region us-east-1

# Verify
kubectl get nodes

This writes a context to ~/.kube/config that uses aws eks get-token for authentication. The IAM entity that created the cluster is automatically granted system:masters access. Other users and roles must be added to the aws-auth ConfigMap or EKS access entries.

Autoscaling: Cluster Autoscaler vs Karpenter#

Cluster Autoscaler works with ASGs. It watches for unschedulable pods and scales the ASG. Stable but slow – provisioning takes 2-3 minutes.

Karpenter replaces the Cluster Autoscaler. Instead of scaling predefined ASGs, it launches the right instance type directly based on pending pod requirements, mixing instance types, AZs, and purchase options automatically.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m6i.large", "m6i.xlarge", "m5.large", "m7g.large"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "100"
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m

Karpenter is the better choice for new clusters. It provisions nodes in under 60 seconds, bin-packs efficiently, and consolidates underutilized nodes automatically.