Cloud Cost Optimization

The Cost Optimization Hierarchy#

Cloud cost optimization follows a hierarchy of impact. Work from the top down – fixing the wrong tier of commitment discount matters far less than shutting down resources nobody uses.

Eliminate waste – turn off unused resources, delete orphaned storage
Right-size – match instance sizes to actual usage
Use commitment discounts – reserved instances, savings plans, CUDs
Shift to spot/preemptible – for fault-tolerant workloads
Optimize storage and network – tiering, transfer patterns, caching
Architect for cost – serverless, auto-scaling, multi-region strategy

Eliminating Waste#

The fastest cost reduction comes from finding resources that serve no purpose. Every cloud provider accumulates these: instances left running after a test, snapshots from decommissioned servers, load balancers with no backends, unattached disks.

AWS:

# Find unattached EBS volumes
aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'Volumes[*].[VolumeId,Size,CreateTime]' --output table

# Find unused Elastic IPs (charged when not attached)
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' --output table

# Find idle load balancers (zero healthy targets)
aws elbv2 describe-target-health --target-group-arn TARGET_GROUP_ARN

Azure:

# Find unattached managed disks
az disk list --query "[?managedBy==null].[name,diskSizeGb,resourceGroup]" --output table

# Find stopped (but still allocated) VMs
az vm list -d --query "[?powerState=='VM deallocated'].[name,resourceGroup]" --output table

GCP:

# Find unattached persistent disks
gcloud compute disks list --filter="NOT users:*" \
  --format="table(name,sizeGb,zone,status)"

# Find instances that have been stopped for over 30 days
gcloud compute instances list --filter="status=TERMINATED" \
  --format="table(name,zone,lastStartTimestamp)"

Right-Sizing#

Right-sizing means matching instance types to actual resource consumption. Most cloud workloads are over-provisioned because developers choose sizes based on peak estimates rather than observed usage.

The process:

Collect at least two weeks of CPU, memory, and network metrics
Identify the P95 utilization – this is your actual ceiling
Pick the smallest instance type that accommodates P95 with 20-30% headroom
Apply changes during the next maintenance window
Monitor for one week after the change

AWS:

# Get CPU utilization over 14 days
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123 \
  --start-time $(date -u -d '14 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Maximum Average \
  --output json

# AWS Compute Optimizer provides right-sizing recommendations
aws compute-optimizer get-ec2-instance-recommendations \
  --instance-arns arn:aws:ec2:us-east-1:123456789012:instance/i-0abc123

Azure:

# Azure Advisor provides right-sizing recommendations
az advisor recommendation list \
  --filter "Category eq 'Cost'" --output table

GCP:

# GCP Recommender provides sizing suggestions
gcloud recommender recommendations list \
  --project=my-prod-project \
  --location=us-east1-b \
  --recommender=google.compute.instance.MachineTypeRecommender \
  --format="table(content.operationGroups[0].operations[0].resource,priority,description)"

When to right-size down: average CPU under 40% and P95 under 70% for two weeks. When not to: batch workloads that spike to 100% briefly, or workloads with seasonal patterns you have not yet observed.

Commitment Discounts: Reserved, Savings Plans, and CUDs#

All three providers offer discounts for committing to a usage level for one or three years. The terms differ but the concept is the same: trade flexibility for lower per-hour rates.

AWS: Reserved Instances vs Savings Plans#

Reserved Instances (RIs) are tied to a specific instance type, region, and platform. They offer 30-40% savings for one year and 55-65% for three years. Standard RIs cannot be changed. Convertible RIs can be exchanged for a different type.

Savings Plans are more flexible. Compute Savings Plans apply to any EC2 instance, Fargate, or Lambda usage across all regions. EC2 Instance Savings Plans are locked to a family and region but give slightly deeper discounts.

# View current RI coverage
aws ce get-reservation-coverage \
  --time-period Start=2026-01-01,End=2026-02-22 \
  --group-by Type=DIMENSION,Key=INSTANCE_TYPE

# Get RI purchase recommendations
aws ce get-reservation-purchase-recommendation \
  --service "Amazon Elastic Compute Cloud - Compute" \
  --lookback-period-in-days SIXTY_DAYS \
  --term-in-years ONE_YEAR \
  --payment-option NO_UPFRONT

# Get Savings Plans recommendations
aws ce get-savings-plans-purchase-recommendation \
  --savings-plans-type COMPUTE_SP \
  --term-in-years ONE_YEAR \
  --payment-option NO_UPFRONT \
  --lookback-period-in-days SIXTY_DAYS

Decision: Use Compute Savings Plans for general workloads where instance types may change. Use EC2 Instance Savings Plans when you are committed to a specific family (e.g., you know you will run m6i in us-east-1 for the next year). Use Standard RIs only for databases and other workloads that truly never change.

Azure: Reservations#

Azure Reservations apply to VMs, SQL Database, Cosmos DB, and other services. They offer up to 72% savings for three years.

# View reservation utilization
az consumption reservation summary list \
  --reservation-order-id ORDER_ID \
  --grain monthly

# Azure Cost Management provides recommendations
az costmanagement query --type Usage \
  --timeframe MonthToDate \
  --dataset-aggregation '{"totalCost": {"name": "Cost", "function": "Sum"}}'

Azure also offers Savings Plans (similar to AWS Compute Savings Plans) that apply across VM families and regions.

GCP: Committed Use Discounts (CUDs)#

GCP CUDs commit to a minimum amount of vCPUs and memory in a region. They offer 57% savings for three years and 20% for one year.

# Create a commitment
gcloud compute commitments create prod-commitment \
  --region=us-east1 \
  --resources=vcpu=100,memory=400GB \
  --plan=36-month

# List active commitments
gcloud compute commitments list --format="table(name,region,status,plan,endTimestamp)"

Spend-based CUDs apply to specific services like Cloud SQL or BigQuery and work more like Savings Plans.

Spot and Preemptible Instances#

Spot (AWS), Spot VMs (Azure), and Preemptible/Spot VMs (GCP) offer 60-90% discounts in exchange for the provider being able to reclaim the instances at any time.

Appropriate workloads: batch processing, CI/CD runners, stateless web servers behind auto-scaling groups, data processing pipelines, dev/test environments.

Inappropriate workloads: databases, single-instance services, anything that cannot tolerate sudden termination.

AWS Spot:

# Request spot instances
aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type m6i.xlarge \
  --instance-market-options '{"MarketType": "spot", "SpotOptions": {"SpotInstanceType": "one-time"}}' \
  --count 5

# Check current spot pricing
aws ec2 describe-spot-price-history \
  --instance-types m6i.xlarge \
  --product-descriptions "Linux/UNIX" \
  --start-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --query 'SpotPriceHistory[*].[AvailabilityZone,SpotPrice]' --output table

Azure Spot:

az vm create --resource-group dev-rg --name batch-worker \
  --image Ubuntu2204 --size Standard_D4s_v5 \
  --priority Spot --eviction-policy Delete --max-price 0.10

GCP Spot:

gcloud compute instances create batch-worker \
  --zone=us-east1-b \
  --machine-type=e2-standard-4 \
  --provisioning-model=SPOT \
  --instance-termination-action=DELETE

Best practice: Use diverse instance types to reduce the chance of all instances being reclaimed simultaneously. In EKS/AKS/GKE, use node pools with multiple machine types and the cluster autoscaler.

Storage Tiering#

All three providers have storage tiers that trade access speed and retrieval cost for lower storage cost. Incorrect tiering is one of the most common sources of waste.

Access Pattern	AWS	Azure	GCP
Frequent (daily)	S3 Standard	Hot	Standard
Infrequent (monthly)	S3 Standard-IA	Cool	Nearline
Rare (quarterly)	S3 Glacier IR	Cold	Coldline
Archive (yearly)	S3 Glacier DA	Archive	Archive

Automate tiering with lifecycle policies. Do not rely on manual reclassification. Set rules based on object age at the bucket level.

For S3, enable Intelligent-Tiering when you cannot predict access patterns. It monitors per-object access and moves objects between tiers automatically with no retrieval fees:

aws s3api put-bucket-intelligent-tiering-configuration \
  --bucket myorg-data-2026 \
  --id entire-bucket \
  --intelligent-tiering-configuration '{
    "Id": "entire-bucket",
    "Status": "Enabled",
    "Tierings": [
      {"AccessTier": "ARCHIVE_ACCESS", "Days": 90},
      {"AccessTier": "DEEP_ARCHIVE_ACCESS", "Days": 180}
    ]
  }'

Network Cost Reduction#

Data transfer is the hidden cost in cloud bills. Inbound is usually free. Outbound to the internet and cross-region transfer costs add up fast.

Patterns to reduce network costs:

Keep traffic in the same AZ when possible. Cross-AZ transfer is cheap but not free (typically $0.01/GB on AWS).
Use VPC/VNet endpoints for cloud services. Traffic to S3, DynamoDB, or Azure Blob through an endpoint stays on the provider’s backbone and avoids NAT gateway charges.
Cache at the edge. CloudFront, Azure CDN, or Cloud CDN reduce origin egress.
Compress data in transit. Gzip API responses. Compress log streams before sending to storage.
Use private connectivity between clouds. If you run multi-cloud, dedicated interconnects (AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect) cost less per GB than internet transfer at scale.

# AWS: Create a VPC endpoint for S3 (eliminates NAT gateway data charges for S3)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-abc123 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-abc123

Cost Allocation and Tagging#

You cannot optimize what you cannot measure. Tags (labels in GCP) are the foundation of cost allocation. Without consistent tagging, cost reports show one large number with no way to attribute it.

Mandatory tags for every resource:

Tag Key	Purpose	Example
`env`	Environment	prod, staging, dev
`team`	Owning team	platform, backend, data
`service`	Application name	web-api, worker, ingestion
`cost-center`	Finance allocation	engineering, marketing

Enforce tags with policy:

# AWS: SCP or tag policy through Organizations
# Azure: Azure Policy
az policy assignment create \
  --name require-env-tag \
  --policy "/providers/Microsoft.Authorization/policyDefinitions/POLICY_ID" \
  --params '{"tagName": {"value": "env"}}' \
  --scope /subscriptions/SUB_ID

# GCP: Organization Policy constraints
gcloud resource-manager org-policies set-policy policy.yaml --project=my-prod-project

FinOps Practices#

FinOps is the discipline of managing cloud spend as a team sport between engineering, finance, and management. Practical habits that make a difference:

Weekly cost reviews. Pull cost data weekly and review trends. A 10% increase caught in week one is a conversation. Caught in month three, it is a crisis.
Budgets and alerts. Set budgets at the account/subscription/project level and alert at 80% and 100%.
Anomaly detection. All three providers offer anomaly detection on billing data. Enable it.
Showback reports. Allocate costs to teams using tags and publish monthly reports. Teams that see their costs reduce them.
Commitment coverage targets. Aim for 60-70% of steady-state compute covered by commitments. Higher coverage risks paying for unused reservations. Lower coverage leaves savings on the table.

# AWS: Create a budget with alert
aws budgets create-budget --account-id 123456789012 \
  --budget '{
    "BudgetName": "monthly-total",
    "BudgetLimit": {"Amount": "50000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {"NotificationType": "ACTUAL", "ComparisonOperator": "GREATER_THAN", "Threshold": 80},
    "Subscribers": [{"SubscriptionType": "EMAIL", "Address": "ops@example.com"}]
  }]'

Decision Matrix: Which Optimization to Apply#

Situation	First Action	Expected Savings
Untagged resources everywhere	Implement tagging policy	Enables all other optimizations
CPU averaging under 20%	Right-size instances	30-50% on those instances
Stable, predictable workloads	Savings plans or CUDs	30-60% on committed usage
Batch or CI/CD workloads	Move to spot/preemptible	60-90% on those workloads
Large S3/Blob/GCS bills	Implement lifecycle tiering	40-70% on storage costs
High data transfer bills	VPC endpoints + CDN	20-50% on network costs
No cost visibility	Weekly reviews + budgets	Prevents runaway spend

Start with tagging and visibility. Then eliminate waste. Then right-size. Then commit. Every step depends on the one before it.