VPC Concepts#
A Virtual Private Cloud is an isolated virtual network inside a cloud provider. Every resource you launch – EC2 instances, RDS databases, Lambda functions with VPC access – lives inside a VPC. The VPC defines an IP address range using CIDR notation, and all resources within it get addresses from that range.
The most common mistake is giving every VPC a /16 (65,536 addresses). This wastes IP space and causes problems later when you need to peer VPCs – overlapping CIDR blocks cannot be peered. Plan your IP allocation before building anything.
A practical CIDR plan for a multi-environment setup:
10.0.0.0/16 - Production VPC (65,536 IPs)
10.1.0.0/16 - Staging VPC
10.2.0.0/16 - Development VPC
10.3.0.0/16 - Shared services (CI/CD, monitoring, logging)
10.10.0.0/16 - Reserved for future useFor smaller environments, a /20 (4,096 addresses) or /18 (16,384 addresses) per VPC is often sufficient. The key constraint is that VPC CIDRs cannot overlap if you ever plan to connect them.
Subnets: Public vs Private#
Subnets divide a VPC into segments. Each subnet lives in exactly one availability zone and gets a subset of the VPC’s CIDR range.
Public subnets have a route to an Internet Gateway, meaning instances with public IPs can receive inbound traffic from the internet. Use these for load balancers, bastion hosts, and NAT Gateways.
Private subnets have no direct internet route. Backend services, databases, and application servers belong here. They can reach the internet through a NAT Gateway in a public subnet, but nothing on the internet can initiate a connection to them.
A typical three-AZ setup in AWS:
VPC: 10.0.0.0/16
Public subnets:
10.0.1.0/24 (us-east-1a) - 254 IPs
10.0.2.0/24 (us-east-1b) - 254 IPs
10.0.3.0/24 (us-east-1c) - 254 IPs
Private subnets (application):
10.0.10.0/24 (us-east-1a)
10.0.11.0/24 (us-east-1b)
10.0.12.0/24 (us-east-1c)
Private subnets (database):
10.0.20.0/24 (us-east-1a)
10.0.21.0/24 (us-east-1b)
10.0.22.0/24 (us-east-1c)Internet Connectivity#
An Internet Gateway (IGW) allows resources in public subnets to send and receive traffic from the internet. It is free and horizontally scaled by AWS – you do not manage its capacity.
A NAT Gateway allows resources in private subnets to initiate outbound connections to the internet (pulling packages, calling external APIs) without exposing them to inbound traffic. NAT Gateways are expensive. AWS charges $0.045/hour ($32.40/month) plus $0.045 per GB of data processed. A Kubernetes cluster pulling container images and calling external APIs can easily generate hundreds of dollars per month in NAT Gateway charges. For cost control, consider NAT instances (self-managed, cheaper but less reliable) or VPC endpoints for AWS service traffic.
Egress-only Internet Gateway is the IPv6 equivalent of a NAT Gateway – allows outbound IPv6 traffic but blocks inbound. No per-GB charge.
Security Groups#
Security groups are stateful firewalls attached to network interfaces. “Stateful” means if you allow inbound traffic on port 443, the response traffic is automatically allowed out without an explicit outbound rule.
The critical best practice: reference other security groups instead of CIDR blocks whenever possible.
# Good: reference the ALB security group
resource "aws_security_group_rule" "app_from_alb" {
type = "ingress"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_group_id = aws_security_group.app.id
source_security_group_id = aws_security_group.alb.id
}
# Avoid: hardcoded CIDR that breaks when subnets change
resource "aws_security_group_rule" "app_from_alb_bad" {
type = "ingress"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_group_id = aws_security_group.app.id
cidr_blocks = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}Security group references are dynamic – when a new instance joins the ALB’s security group, it automatically gains access to the app. CIDR-based rules require manual updates when subnet ranges change.
NACLs vs Security Groups#
Network ACLs (NACLs) operate at the subnet level and are stateless. Every packet is evaluated against both inbound and outbound rules independently. They process rules in numbered order, and the first match wins.
In practice, use security groups for almost everything. NACLs are useful for two cases: blocking a specific IP range at the subnet level (defense in depth) or complying with regulations that require stateless network controls. For everything else, security groups are simpler and more maintainable.
Route Tables#
Every subnet is associated with a route table that determines where network traffic is directed.
A public subnet route table:
Destination Target
10.0.0.0/16 local (traffic within the VPC stays in the VPC)
0.0.0.0/0 igw-abc123 (everything else goes to the Internet Gateway)A private subnet route table:
Destination Target
10.0.0.0/16 local
0.0.0.0/0 nat-xyz789 (outbound internet via NAT Gateway)
10.1.0.0/16 pcx-peer01 (peered VPC traffic via peering connection)VPC Peering#
VPC peering creates a direct network connection between two VPCs. Traffic stays on the cloud provider’s backbone – it never traverses the public internet. A critical property: peering is non-transitive. If VPC A peers with VPC B, and VPC B peers with VPC C, A cannot reach C through B. You must create a direct peering connection between A and C.
Peering works across AWS accounts and regions, making it useful for connecting production and shared-services VPCs owned by different teams.
Transit Gateway#
When you have more than a few VPCs, peering connections become unmanageable (N VPCs require N*(N-1)/2 peerings). Transit Gateway provides a hub-and-spoke model: every VPC connects to the Transit Gateway, and the TGW routes traffic between them.
VPC-Prod ──┐
VPC-Stage ──┤── Transit Gateway ──── On-Prem VPN
VPC-Dev ──┤
VPC-Shared ─┘Transit Gateway also connects to VPN tunnels and Direct Connect gateways, making it the central point for all network connectivity. It costs $0.05/hour per attachment plus $0.02/GB of data processed.
VPN and Direct Connect#
Site-to-site VPN connects your cloud VPC to an on-premises network over encrypted tunnels across the public internet. It is quick to set up (minutes) and cheap, but throughput is limited by your internet connection and latency varies.
AWS Direct Connect / Azure ExpressRoute / GCP Cloud Interconnect provides a dedicated physical connection between your data center and the cloud provider. Latency is consistent, throughput is higher (1-100 Gbps), and traffic does not traverse the public internet. The tradeoff is cost ($0.02-0.03/GB plus port charges) and setup time (weeks to months for physical cross-connects).
Private Endpoints and PrivateLink#
By default, traffic to cloud services like S3, DynamoDB, or SQS goes over the public internet (even from within the VPC). VPC endpoints keep this traffic on the provider’s private network.
Gateway endpoints (S3 and DynamoDB on AWS) are free and route traffic through route table entries. Interface endpoints (everything else) create an elastic network interface in your subnet with a private IP. These cost $0.01/hour plus $0.01/GB.
For Kubernetes clusters that pull images from ECR and write logs to CloudWatch, interface endpoints can significantly reduce NAT Gateway data processing charges.
Cross-Cloud Terminology#
| Concept | AWS | Azure | GCP |
|---|---|---|---|
| Virtual network | VPC | VNet | VPC |
| Subnet | Subnet | Subnet | Subnet |
| Instance firewall | Security Group | NSG | Firewall Rule |
| Subnet firewall | NACL | NSG (subnet-level) | Firewall Rule |
| Internet gateway | Internet Gateway | (implicit) | (implicit) |
| NAT | NAT Gateway | NAT Gateway | Cloud NAT |
| Peering | VPC Peering | VNet Peering | VPC Peering |
| Hub-and-spoke | Transit Gateway | Virtual WAN / Hub | Cloud Router |
| Private endpoint | PrivateLink | Private Endpoint | Private Service Connect |
| Dedicated link | Direct Connect | ExpressRoute | Cloud Interconnect |
Network Design for Kubernetes#
Kubernetes clusters need IP addresses for nodes, pods, and services. On AWS EKS with the VPC CNI plugin, every pod gets a real VPC IP address. A cluster with 50 nodes running 30 pods each needs 1,500 pod IPs plus the node IPs.
Plan your VPC CIDR to accommodate this:
VPC: 10.0.0.0/16 (65,536 IPs)
Node subnets: /24 per AZ (254 IPs per AZ for nodes)
Pod subnets: /18 per AZ (16,384 IPs per AZ for pods)
Service CIDR: 172.20.0.0/16 (cluster-internal, not from VPC CIDR)If you start with a VPC CIDR that is too small, you cannot expand it later without significant rework. It is easier to allocate a large block upfront and leave room for growth than to renumber a production network.