Cloud Behavioral Divergence Guide#

Running the “same” workload on AWS, Azure, and GCP does not produce the same behavior. The Kubernetes API is portable, application containers are portable, and SQL queries are portable. Everything else – identity, networking, storage, load balancing, DNS, and managed service behavior – diverges in ways that matter for production reliability.

This guide documents the specific divergence points with practical examples. Use it when translating infrastructure from one cloud to another, when debugging behavior that differs between environments, or when assessing migration risk.

IAM Model Differences#

Identity and access management is the most significant behavioral divergence between clouds. Each cloud has a fundamentally different model for how workloads authenticate and authorize.

AWS: IAM Roles and IRSA#

AWS IAM is built around roles and policies. A role is an identity that can be assumed by users, services, or other AWS accounts. A policy is a JSON document specifying which API actions are allowed on which resources.

For Kubernetes workloads on EKS, IRSA (IAM Roles for Service Accounts) bridges Kubernetes ServiceAccounts to IAM roles using OIDC federation. The flow:

EKS cluster has an OIDC provider registered with IAM
A Kubernetes ServiceAccount is annotated with an IAM role ARN
When a pod using that ServiceAccount calls AWS APIs, the kubelet injects a web identity token
AWS STS exchanges the token for temporary IAM credentials
The pod assumes the IAM role with its attached policies

# Create the IAM role with a trust policy for the OIDC provider
aws iam create-role --role-name app-role \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"},
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:default:app-sa"
        }
      }
    }]
  }'

# Attach a policy
aws iam attach-role-policy --role-name app-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

# Annotate the Kubernetes ServiceAccount
kubectl annotate serviceaccount app-sa \
  eks.amazonaws.com/role-arn=arn:aws:iam::123456789012:role/app-role

Key behavioral detail: IRSA tokens are projected into the pod at /var/run/secrets/eks.amazonaws.com/serviceaccount/token. AWS SDKs automatically detect and use this token. If the OIDC provider is not configured or the trust policy condition does not match the ServiceAccount’s namespace and name exactly, the pod gets no credentials silently – no error at pod startup, only at the first AWS API call.

GCP: Service Accounts and Workload Identity#

GCP uses service accounts as the machine identity primitive. A service account is a GCP identity with an email address (name@project.iam.gserviceaccount.com) and IAM role bindings at various scopes (organization, folder, project, resource).

For GKE workloads, Workload Identity maps Kubernetes ServiceAccounts to GCP service accounts without key files:

GKE cluster has a Workload Identity pool
A Kubernetes ServiceAccount is annotated with a GCP service account email
The GCP service account has an IAM binding allowing the Kubernetes ServiceAccount to impersonate it
GKE’s metadata server intercepts credential requests from pods and returns GCP tokens

# Create the GCP service account
gcloud iam service-accounts create app-sa \
  --display-name="Application Service Account"

# Grant the GCP SA permissions
gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:app-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

# Allow the Kubernetes SA to impersonate the GCP SA
gcloud iam service-accounts add-iam-policy-binding \
  app-sa@my-project.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:my-project.svc.id.goog[default/app-sa]"

# Annotate the Kubernetes ServiceAccount
kubectl annotate serviceaccount app-sa \
  iam.gke.io/gcp-service-account=app-sa@my-project.iam.gserviceaccount.com

Key behavioral detail: Workload Identity requires the GKE metadata server. Pods query 169.254.169.254 for tokens, and GKE intercepts this and returns GCP credentials. If Workload Identity is not enabled on the node pool, pods fall back to the node’s default service account, which typically has broader permissions than intended. This is a security risk that does not exist on EKS (where pods with no IRSA annotation simply have no AWS credentials).

Azure: Managed Identities and Azure Workload Identity#

Azure uses Managed Identities (system-assigned or user-assigned) for Azure-hosted resources and Azure Workload Identity for AKS workloads. Managed Identities eliminate credential management entirely – Azure rotates the credentials automatically.

For AKS workloads, Azure Workload Identity (which replaced AAD Pod Identity in 2024) uses OIDC federation similar to AWS IRSA:

AKS cluster has an OIDC issuer URL
A user-assigned managed identity is created with a federated credential pointing to the AKS OIDC issuer, namespace, and ServiceAccount name
The Kubernetes ServiceAccount is annotated with the managed identity’s client ID
The Azure Identity SDK exchanges the projected token for Azure AD tokens

# Create a user-assigned managed identity
az identity create --resource-group prod-rg --name app-identity

# Get the client ID and principal ID
CLIENT_ID=$(az identity show --resource-group prod-rg --name app-identity \
  --query clientId --output tsv)

# Create a federated credential
az identity federated-credential create \
  --identity-name app-identity \
  --resource-group prod-rg \
  --name app-federated-cred \
  --issuer "$(az aks show --resource-group prod-rg --name prod-cluster \
    --query oidcIssuerProfile.issuerUrl --output tsv)" \
  --subject "system:serviceaccount:default:app-sa" \
  --audiences "api://AzureADTokenExchange"

# Assign a role
az role assignment create --assignee $CLIENT_ID \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/SUB_ID/resourceGroups/prod-rg

Key behavioral detail: Azure Workload Identity requires a mutating webhook (azure-workload-identity-webhook) running in the AKS cluster. This webhook injects environment variables (AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_FEDERATED_TOKEN_FILE) and a projected token volume into pods that use annotated ServiceAccounts. If the webhook is not installed or the AKS OIDC issuer is not enabled, the annotation does nothing.

IAM Summary Table#

Aspect	AWS (IRSA)	GCP (Workload Identity)	Azure (Workload Identity)
Machine identity	IAM Role	GCP Service Account	Managed Identity
Pod-to-cloud mapping	ServiceAccount annotation + OIDC trust	ServiceAccount annotation + IAM binding	ServiceAccount annotation + federated credential
Token location	`/var/run/secrets/eks.amazonaws.com/...`	GKE metadata server (169.254.169.254)	Projected volume (path set by webhook)
Failure mode if misconfigured	No credentials at API call time	Falls back to node SA (too-broad access)	No credentials (webhook injects nothing)
Setup complexity	Medium (OIDC provider + trust policy)	Medium (WI pool + IAM binding)	High (identity + federated cred + webhook)

Networking Differences#

VPC/VNet Architecture#

AWS VPCs are regional. A VPC spans all availability zones in a region, but subnets are AZ-specific. Each subnet has a route table. VPC peering connects two VPCs (same or cross-region, same or cross-account). Transit Gateway connects many VPCs in a hub-and-spoke model.

GCP VPCs are global. A single VPC spans all regions. Subnets are regional. This means two subnets in different regions within the same VPC can communicate without peering. VPC Network Peering connects two VPCs and is bidirectional (but requires setup from both sides).

Azure VNets are regional, similar to AWS VPCs. VNet Peering connects two VNets (same or cross-region, same or cross-subscription). Virtual WAN provides hub-and-spoke connectivity.

The practical difference: on GCP, cross-region communication within a VPC “just works” because the VPC is global. On AWS and Azure, cross-region communication requires explicit VPC/VNet peering or a Transit Gateway/Virtual WAN.

Firewall Models#

Concept	AWS	GCP	Azure
Instance-level firewall	Security Groups (stateful)	Firewall Rules with target tags (stateful)	NSGs on NIC (stateful)
Subnet-level firewall	NACLs (stateless)	N/A (firewall rules are VPC-wide)	NSGs on subnet (stateful)
Rule evaluation	All rules evaluated (allow wins)	Priority-ordered (first match wins)	Priority-ordered (first match wins)
Default behavior	Deny all inbound, allow all outbound	Deny all inbound, allow all outbound	Deny all inbound by default
Scope	Per-instance (SG attached to ENI)	Per-VPC (targeted by tags or SA)	Per-NIC or per-subnet

Gotcha: AWS Security Groups evaluate all rules and allow traffic if any rule permits it. GCP and Azure firewall rules are priority-ordered and stop at the first match. A rule at priority 100 (allow) overrides a rule at priority 200 (deny) on GCP and Azure, but on AWS there is no priority – all allow rules are additive.

Load Balancer Behavior#

Feature	AWS	GCP	Azure
L7 load balancer	ALB (Application Load Balancer)	HTTP(S) Load Balancer (global)	Application Gateway
L4 load balancer	NLB (Network Load Balancer)	Network Load Balancer (regional/global)	Azure Load Balancer
K8s integration	AWS Load Balancer Controller	GCE Ingress Controller (built-in)	Azure load-balancer in cloud-provider
Default scope	Regional	Global (HTTP) or Regional (TCP/UDP)	Regional
SSL termination	ACM certificates on ALB	Google-managed certificates	Azure Key Vault certificates
Health checks	Target group health checks	Backend service health checks	Health probes

Gotcha: GCP’s HTTP(S) Load Balancer is global by default – it has a single anycast IP that routes to the nearest backend. AWS ALB and Azure Application Gateway are regional. If you design for GCP’s global load balancing and then migrate to AWS, you need CloudFront in front of ALB, or a Global Accelerator, to achieve similar global routing.

Storage Driver Differences#

Block Storage for Kubernetes#

Aspect	AWS (EBS CSI)	GCP (PD CSI)	Azure (Disk CSI)
CSI driver	`ebs.csi.aws.com`	`pd.csi.storage.gke.io`	`disk.csi.azure.com`
Default storage class	`gp3`	`standard-rw` (pd-standard)	`managed-csi` (StandardSSD_LRS)
High-performance option	`io2` (up to 256K IOPS)	`pd-ssd` (up to 100K IOPS)	`managed-csi-premium` (Premium_LRS)
Volume attachment	One AZ, one node	One zone, one node	One zone, one node
Resize support	Online resize (gp2, gp3, io1, io2)	Online resize	Online resize
Snapshot support	EBS Snapshots	Persistent Disk Snapshots	Azure Disk Snapshots
Max volume size	64 TiB (gp3)	64 TiB (pd-ssd)	64 TiB (Premium_LRS)

Gotcha: EBS volumes are AZ-locked. If a pod is rescheduled to a node in a different AZ, the PVC cannot follow. This is the same on all three clouds for block storage, but the failure manifests differently. On EKS, you get AttachVolume.Attach failed for volume: ...node ... is in different AZ from PV. On GKE, you get a similar zone mismatch error. The fix is the same (topology-aware scheduling), but the error messages and the zone label format differ (topology.kubernetes.io/zone values look like us-east-1a on AWS, us-east1-b on GCP, and eastus-1 on Azure).

Object Storage from Kubernetes#

Aspect	AWS (S3)	GCP (Cloud Storage)	Azure (Blob Storage)
CSI driver (FUSE)	Mountpoint for S3 (`s3.csi.aws.com`)	Cloud Storage FUSE (`gcsfuse.csi.storage.gke.io`)	Blob CSI (`blob.csi.azure.com`)
SDK/API	AWS SDK (S3 API)	Google Cloud Client Libraries	Azure SDK
S3-compatible API	Native	Via interop XML API	Not natively (use SDK)
Auth from pods	IRSA	Workload Identity	Azure Workload Identity

Gotcha: GCP Cloud Storage supports the S3 XML API for interoperability, but not all S3 features are supported (e.g., no object lock, no select). Azure Blob Storage does not support the S3 API at all – applications using S3 SDKs must be rewritten to use Azure SDK or MinIO gateway.

Managed Database Behavioral Differences#

Connection Methods#

Aspect	AWS (RDS)	GCP (Cloud SQL)	Azure (Azure Database)
Standard connection	Endpoint DNS + username/password	IP + username/password	Hostname + username/password
IAM auth	IAM database authentication (token-based)	Cloud SQL IAM database authentication	Azure AD authentication
Proxy/sidecar	RDS Proxy (connection pooling + IAM auth)	Cloud SQL Auth Proxy (sidecar container)	No equivalent (direct connection)
Private connectivity	VPC endpoints (PrivateLink)	Private Services Access or PSC	Private Endpoints
From K8s pods	VPC-internal endpoint or RDS Proxy	Cloud SQL Auth Proxy sidecar	Private endpoint or direct

Gotcha: Cloud SQL Auth Proxy is almost always required for GKE workloads connecting to Cloud SQL. It handles SSL, IAM authentication, and connection management. There is no equivalent automatic sidecar injection – you must add the proxy as a sidecar container in your pod spec. Forgetting the proxy is a common migration failure when moving from AWS (where RDS is reachable directly from the VPC) to GCP.

Failover Behavior#

Aspect	AWS (RDS Multi-AZ)	GCP (Cloud SQL HA)	Azure (Azure SQL)
HA mechanism	Synchronous replication to standby	Regional instance with failover replica	Zone-redundant or geo-replication
Failover time	60-120 seconds	~60 seconds	Typically under 30 seconds
DNS behavior	Same endpoint, DNS TTL update	Same IP, transparent failover	Same connection string
Connection drop	Yes – applications must reconnect	Yes – applications must reconnect	Yes – applications must reconnect
Read replicas	Cross-region read replicas (async)	Cross-region read replicas (async)	Active geo-replication (async)

All three clouds drop connections during failover. Applications must handle reconnection. The difference is in DNS propagation – AWS RDS updates the DNS CNAME to point to the new primary, which means applications caching DNS may continue connecting to the old (now standby) instance. GCP Cloud SQL keeps the same IP. Azure keeps the same connection string.

Backup Patterns#

Aspect	AWS (RDS)	GCP (Cloud SQL)	Azure (Azure Database)
Automated backups	Daily, retention 1-35 days	Daily, retention 1-365 days	Daily, retention 1-35 days
Point-in-time recovery	To any second within retention	To any second within retention	To any second within retention
Manual snapshots	Unlimited, persist until deleted	On-demand backups	Long-term retention (LTR)
Cross-region backups	Copy snapshot to another region	Cross-region backup (automated)	Geo-redundant backup storage
Backup storage cost	Free up to DB size, then per-GB	Included in instance cost (to a limit)	Included (LRS), extra for GRS

DNS and Service Discovery#

Aspect	AWS	GCP	Azure
Managed DNS	Route 53	Cloud DNS	Azure DNS
Private DNS zones	Route 53 Private Hosted Zones (per VPC)	Cloud DNS Private Zones (per VPC network)	Azure Private DNS Zones (per VNet)
Service discovery	Cloud Map	Service Directory	N/A (use Private DNS or Traffic Manager)
K8s external DNS	ExternalDNS with Route 53 provider	ExternalDNS with Cloud DNS provider	ExternalDNS with Azure DNS provider
Split-horizon DNS	Supported (private + public zones same name)	Supported	Supported

Gotcha: Route 53 Private Hosted Zones must be explicitly associated with each VPC that needs to resolve the records. If you peer two VPCs, the peered VPC does not automatically get access to the other VPC’s private hosted zones – you must create an association. GCP Private DNS Zones work similarly (must be attached to VPC networks). Azure Private DNS Zones must be linked to each VNet.

Cross-Cloud Gotcha Table#

Behavior	AWS	GCP	Azure	Trap
VPC scope	Regional	Global	Regional	GCP cross-region traffic within a VPC just works. On AWS/Azure you need peering
Default pod networking	VPC CNI (pods get VPC IPs)	Native GKE networking (alias IPs)	Azure CNI (pods get VNet IPs) or kubenet	IP exhaustion risk differs – AWS VPC CNI uses one ENI per pod, consuming subnet IPs fast
Pod identity fallback	No credentials	Node SA (too-broad)	No credentials	GCP Workload Identity misconfiguration silently grants broad node-level access
Load balancer scope	Regional	Global (HTTP)	Regional	Moving from GCP global LB to AWS requires adding CloudFront or Global Accelerator
IAM policy language	JSON (allow/deny, resource ARNs)	IAM roles (predefined or custom)	RBAC (role definitions + scope)	AWS IAM policies are the most granular. GCP and Azure use role-based, not resource-based, defaults
Storage class naming	`gp3`, `gp2`, `io1`, `io2`	`standard-rw`, `premium-rw`	`managed-csi`, `managed-csi-premium`	Hardcoded storage class names in manifests break on cloud migration
Metadata endpoint	`169.254.169.254` (IMDSv2)	`169.254.169.254`	`169.254.169.254`	Same IP, different response formats and auth mechanisms
NAT gateway cost	$32/mo + $0.045/GB	Cloud NAT per-VM charge + $0.045/GB	Azure NAT Gateway $32/mo + $0.045/GB	GCP NAT charges per VM using it, not a flat fee. Can be cheaper or more expensive
Private DB access	VPC endpoint (PrivateLink)	Private Services Access or PSC	Private Endpoint	Three different private connectivity models with different setup requirements
Container registry	ECR (per-region)	Artifact Registry (multi-region)	ACR (per-resource group)	ECR images are regional. Pulling cross-region adds latency and egress cost
K8s version lag	EKS: often 1-2 months behind upstream	GKE: rapid channel available day-one	AKS: usually 1-3 months behind	GKE Rapid channel gets new K8s versions weeks before EKS/AKS
Egress pricing	$0.09/GB (first 10 TB)	$0.12/GB (first 1 TB)	$0.087/GB (first 5 TB)	GCP is the most expensive for egress at low volumes. All three get cheaper at scale

Practical Translation Guide#

When migrating a workload or translating infrastructure between clouds, work through these layers in order:

Application containers – No changes needed. OCI images are portable. Verify architecture (AMD64 vs ARM64).
Kubernetes manifests – Remove or translate cloud-specific annotations. Update StorageClass references. Update Ingress annotations for the target cloud’s LB controller.
IAM integration – Rewrite entirely. IRSA trust policies do not translate to Workload Identity bindings or Azure federated credentials. The identity model is different on each cloud.
Networking – Redesign VPC/VNet architecture for the target cloud’s model. Translate security groups to NSGs or firewall rules. Update CIDR ranges if there are conflicts.
Managed services – Replace or reconfigure. RDS becomes Cloud SQL or Azure Database. S3 becomes Cloud Storage or Blob Storage. Update connection strings, authentication methods, and backup configurations.
Terraform/IaC – Rewrite provider-specific resources. The Terraform Kubernetes provider resources are portable. The aws, google, and azurerm provider resources are not.
Monitoring and logging – Replace or use a portable layer (Prometheus, Grafana, OpenTelemetry). CloudWatch, Cloud Monitoring, and Azure Monitor are not interchangeable.

Items 1 and 2 are usually days of work. Items 3 through 7 are weeks to months, depending on the complexity of the integration.