IAM: Identity and Access Management#

IAM controls who can do what in your AWS account. Everything in AWS is an API call, and IAM decides which API calls are allowed. There are three concepts an agent must understand: users, roles, and policies.

Users are long-lived identities for humans or service accounts. Roles are temporary identities that can be assumed by users, services, or other AWS accounts. Policies are JSON documents that define permissions. Roles are always preferred over users for programmatic access because they issue short-lived credentials through STS (Security Token Service).

Create a role for an EC2 instance:

aws iam create-role \
  --role-name web-server-role \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {"Service": "ec2.amazonaws.com"},
      "Action": "sts:AssumeRole"
    }]
  }'

Attach a managed policy to the role:

aws iam attach-role-policy \
  --role-name web-server-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

Create an instance profile (required for EC2 to use the role):

aws iam create-instance-profile --instance-profile-name web-server-profile
aws iam add-role-to-instance-profile \
  --instance-profile-name web-server-profile \
  --role-name web-server-role

Assume a role from the CLI to get temporary credentials:

aws sts assume-role \
  --role-arn arn:aws:iam::123456789012:role/deploy-role \
  --role-session-name agent-session

This returns temporary AccessKeyId, SecretAccessKey, and SessionToken values that expire after one hour by default.

Key IAM principle: always use least privilege. Start with no permissions and add only what is needed. Use aws iam simulate-principal-policy to test whether a role has specific permissions without actually executing the action.

VPC: Virtual Private Cloud#

A VPC is an isolated network within an AWS region. Every resource that needs network connectivity lives in a VPC. The critical components are subnets, route tables, security groups, and NACLs.

Subnets divide a VPC into segments. Public subnets have a route to an internet gateway. Private subnets route through a NAT gateway for outbound internet access, or have no internet access at all.

Create a VPC with subnets:

# Create the VPC
VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
  --query 'Vpc.VpcId' --output text)

# Create a public subnet
PUB_SUBNET=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.1.0/24 --availability-zone us-east-1a \
  --query 'Subnet.SubnetId' --output text)

# Create a private subnet
PRIV_SUBNET=$(aws ec2 create-subnet --vpc-id $VPC_ID \
  --cidr-block 10.0.2.0/24 --availability-zone us-east-1a \
  --query 'Subnet.SubnetId' --output text)

Security groups are stateful firewalls attached to resources. If you allow inbound traffic on port 443, the response traffic is automatically allowed. They operate at the instance level.

SG_ID=$(aws ec2 create-security-group --group-name web-sg \
  --description "Web server security group" --vpc-id $VPC_ID \
  --query 'GroupId' --output text)

aws ec2 authorize-security-group-ingress --group-id $SG_ID \
  --protocol tcp --port 443 --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress --group-id $SG_ID \
  --protocol tcp --port 80 --cidr 0.0.0.0/0

NACLs (Network Access Control Lists) are stateless firewalls at the subnet level. They evaluate rules in order and require explicit allow rules for both inbound and outbound traffic. In practice, security groups handle most filtering. NACLs are a second layer for compliance or broad IP blocking.

EC2: Elastic Compute Cloud#

EC2 provides virtual machines. The critical decisions are instance type, AMI, and placement.

Instance types follow a naming convention: m5.xlarge means family m (general purpose), generation 5, size xlarge. Key families: t3/t4g for burstable workloads, m5/m6i for general purpose, c5/c6i for compute-intensive, r5/r6i for memory-intensive, and g4/p4 for GPU workloads. The g suffix (e.g., m6g) indicates ARM64/Graviton processors, which are cheaper and often faster for server workloads.

Launch an instance:

aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type t3.medium \
  --subnet-id $PRIV_SUBNET \
  --security-group-ids $SG_ID \
  --iam-instance-profile Name=web-server-profile \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=web-01},{Key=env,Value=prod}]' \
  --count 1

Find the latest Amazon Linux 2023 AMI:

aws ec2 describe-images \
  --owners amazon \
  --filters "Name=name,Values=al2023-ami-2023*-x86_64" \
  --query 'sort_by(Images, &CreationDate)[-1].ImageId' \
  --output text

Check instance status:

aws ec2 describe-instance-status --instance-ids i-0abc123 \
  --query 'InstanceStatuses[0].[InstanceState.Name,SystemStatus.Status,InstanceStatus.Status]'

S3: Simple Storage Service#

S3 stores objects (files) in buckets. It is the backbone of AWS – used for backups, static assets, data lakes, Terraform state, and log archival.

# Create a bucket
aws s3 mb s3://myorg-data-2026

# Upload a file
aws s3 cp backup.sql.gz s3://myorg-data-2026/backups/

# Sync a directory
aws s3 sync ./build/ s3://myorg-data-2026/static/ --delete

# List objects with a prefix
aws s3 ls s3://myorg-data-2026/backups/ --recursive

# Generate a presigned URL (temporary access without credentials)
aws s3 presign s3://myorg-data-2026/backups/backup.sql.gz --expires-in 3600

Enable versioning on critical buckets to protect against accidental deletes:

aws s3api put-bucket-versioning --bucket myorg-data-2026 \
  --versioning-configuration Status=Enabled

Storage classes control cost. STANDARD for frequently accessed data. STANDARD_IA for infrequent access (cheaper storage, per-retrieval fee). GLACIER for archival (minutes to hours retrieval time). GLACIER_DEEP_ARCHIVE for compliance data you hope to never read again. Use lifecycle rules to transition objects automatically:

aws s3api put-bucket-lifecycle-configuration --bucket myorg-data-2026 \
  --lifecycle-configuration '{
    "Rules": [{
      "ID": "archive-old-backups",
      "Status": "Enabled",
      "Filter": {"Prefix": "backups/"},
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER"}
      ]
    }]
  }'

RDS: Relational Database Service#

RDS manages relational databases – PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, or Amazon Aurora. AWS handles patching, backups, and failover.

aws rds create-db-instance \
  --db-instance-identifier prod-postgres \
  --db-instance-class db.r6g.xlarge \
  --engine postgres \
  --engine-version 16.2 \
  --master-username admin \
  --master-user-password "$(aws secretsmanager get-random-password \
    --password-length 32 --query RandomPassword --output text)" \
  --allocated-storage 100 \
  --storage-type gp3 \
  --vpc-security-group-ids $SG_ID \
  --db-subnet-group-name prod-db-subnets \
  --multi-az \
  --backup-retention-period 7 \
  --storage-encrypted

Check the instance status:

aws rds describe-db-instances --db-instance-identifier prod-postgres \
  --query 'DBInstances[0].[DBInstanceStatus,Endpoint.Address,Endpoint.Port]'

Create a manual snapshot before risky operations:

aws rds create-db-snapshot \
  --db-instance-identifier prod-postgres \
  --db-snapshot-identifier pre-migration-snapshot

ECS and EKS: Container Orchestration#

ECS (Elastic Container Service) is AWS’s native container orchestrator. It is simpler than Kubernetes and tightly integrated with AWS services. Use ECS with Fargate (serverless) to avoid managing EC2 instances entirely.

EKS (Elastic Kubernetes Service) is managed Kubernetes. Use it when you need Kubernetes-specific features, have existing Kubernetes expertise, or want portability across cloud providers.

Create an EKS cluster:

aws eks create-cluster \
  --name prod-cluster \
  --role-arn arn:aws:iam::123456789012:role/eks-cluster-role \
  --resources-vpc-config subnetIds=$PUB_SUBNET,$PRIV_SUBNET,securityGroupIds=$SG_ID

# Update kubeconfig after cluster is ACTIVE
aws eks update-kubeconfig --name prod-cluster --region us-east-1

Add a managed node group:

aws eks create-nodegroup \
  --cluster-name prod-cluster \
  --nodegroup-name workers \
  --node-role arn:aws:iam::123456789012:role/eks-node-role \
  --subnets $PRIV_SUBNET \
  --instance-types m6g.xlarge \
  --scaling-config minSize=2,maxSize=10,desiredSize=3

Route53: DNS#

Route53 manages DNS records. Agents commonly need to create or update records for deployments.

# Create/update an A record
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890 \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "Z2FDTNDATAQYW2",
          "DNSName": "d111111abcdef8.cloudfront.net",
          "EvaluateTargetHealth": true
        }
      }
    }]
  }'

List hosted zones:

aws route53 list-hosted-zones --query 'HostedZones[*].[Id,Name]' --output table

CloudWatch: Monitoring and Logging#

CloudWatch collects metrics, logs, and alarms. Every AWS service publishes metrics to CloudWatch automatically.

Query recent logs:

aws logs filter-log-events \
  --log-group-name /ecs/web-api \
  --start-time $(date -d '1 hour ago' +%s000) \
  --filter-pattern "ERROR"

Get a specific metric:

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123 \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average

Create an alarm:

aws cloudwatch put-metric-alarm \
  --alarm-name high-cpu-web01 \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 3 \
  --dimensions Name=InstanceId,Value=i-0abc123 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts

Quick Reference: Service Selection#

Need Service When to use
Compute (VMs) EC2 Full OS control, custom software
Containers (simple) ECS Fargate Dockerized apps, AWS-native
Containers (Kubernetes) EKS K8s ecosystem, multi-cloud portability
Relational DB RDS / Aurora Managed PostgreSQL, MySQL
Object storage S3 Files, backups, static assets
DNS Route53 Domain management, health checks
Monitoring CloudWatch Metrics, logs, alarms
Identity IAM + STS All access control