Cloud Behavioral Divergence Guide#

Running the “same” workload on AWS, Azure, and GCP does not produce the same behavior. The Kubernetes API is portable, application containers are portable, and SQL queries are portable. Everything else – identity, networking, storage, load balancing, DNS, and managed service behavior – diverges in ways that matter for production reliability.

This guide documents the specific divergence points with practical examples. Use it when translating infrastructure from one cloud to another, when debugging behavior that differs between environments, or when assessing migration risk.

IAM Model Differences#

Identity and access management is the most significant behavioral divergence between clouds. Each cloud has a fundamentally different model for how workloads authenticate and authorize.

AWS: IAM Roles and IRSA#

AWS IAM is built around roles and policies. A role is an identity that can be assumed by users, services, or other AWS accounts. A policy is a JSON document specifying which API actions are allowed on which resources.

For Kubernetes workloads on EKS, IRSA (IAM Roles for Service Accounts) bridges Kubernetes ServiceAccounts to IAM roles using OIDC federation. The flow:

  1. EKS cluster has an OIDC provider registered with IAM
  2. A Kubernetes ServiceAccount is annotated with an IAM role ARN
  3. When a pod using that ServiceAccount calls AWS APIs, the kubelet injects a web identity token
  4. AWS STS exchanges the token for temporary IAM credentials
  5. The pod assumes the IAM role with its attached policies
# Create the IAM role with a trust policy for the OIDC provider
aws iam create-role --role-name app-role \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"},
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:default:app-sa"
        }
      }
    }]
  }'

# Attach a policy
aws iam attach-role-policy --role-name app-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

# Annotate the Kubernetes ServiceAccount
kubectl annotate serviceaccount app-sa \
  eks.amazonaws.com/role-arn=arn:aws:iam::123456789012:role/app-role

Key behavioral detail: IRSA tokens are projected into the pod at /var/run/secrets/eks.amazonaws.com/serviceaccount/token. AWS SDKs automatically detect and use this token. If the OIDC provider is not configured or the trust policy condition does not match the ServiceAccount’s namespace and name exactly, the pod gets no credentials silently – no error at pod startup, only at the first AWS API call.

GCP: Service Accounts and Workload Identity#

GCP uses service accounts as the machine identity primitive. A service account is a GCP identity with an email address (name@project.iam.gserviceaccount.com) and IAM role bindings at various scopes (organization, folder, project, resource).

For GKE workloads, Workload Identity maps Kubernetes ServiceAccounts to GCP service accounts without key files:

  1. GKE cluster has a Workload Identity pool
  2. A Kubernetes ServiceAccount is annotated with a GCP service account email
  3. The GCP service account has an IAM binding allowing the Kubernetes ServiceAccount to impersonate it
  4. GKE’s metadata server intercepts credential requests from pods and returns GCP tokens
# Create the GCP service account
gcloud iam service-accounts create app-sa \
  --display-name="Application Service Account"

# Grant the GCP SA permissions
gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:app-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

# Allow the Kubernetes SA to impersonate the GCP SA
gcloud iam service-accounts add-iam-policy-binding \
  app-sa@my-project.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:my-project.svc.id.goog[default/app-sa]"

# Annotate the Kubernetes ServiceAccount
kubectl annotate serviceaccount app-sa \
  iam.gke.io/gcp-service-account=app-sa@my-project.iam.gserviceaccount.com

Key behavioral detail: Workload Identity requires the GKE metadata server. Pods query 169.254.169.254 for tokens, and GKE intercepts this and returns GCP credentials. If Workload Identity is not enabled on the node pool, pods fall back to the node’s default service account, which typically has broader permissions than intended. This is a security risk that does not exist on EKS (where pods with no IRSA annotation simply have no AWS credentials).

Azure: Managed Identities and Azure Workload Identity#

Azure uses Managed Identities (system-assigned or user-assigned) for Azure-hosted resources and Azure Workload Identity for AKS workloads. Managed Identities eliminate credential management entirely – Azure rotates the credentials automatically.

For AKS workloads, Azure Workload Identity (which replaced AAD Pod Identity in 2024) uses OIDC federation similar to AWS IRSA:

  1. AKS cluster has an OIDC issuer URL
  2. A user-assigned managed identity is created with a federated credential pointing to the AKS OIDC issuer, namespace, and ServiceAccount name
  3. The Kubernetes ServiceAccount is annotated with the managed identity’s client ID
  4. The Azure Identity SDK exchanges the projected token for Azure AD tokens
# Create a user-assigned managed identity
az identity create --resource-group prod-rg --name app-identity

# Get the client ID and principal ID
CLIENT_ID=$(az identity show --resource-group prod-rg --name app-identity \
  --query clientId --output tsv)

# Create a federated credential
az identity federated-credential create \
  --identity-name app-identity \
  --resource-group prod-rg \
  --name app-federated-cred \
  --issuer "$(az aks show --resource-group prod-rg --name prod-cluster \
    --query oidcIssuerProfile.issuerUrl --output tsv)" \
  --subject "system:serviceaccount:default:app-sa" \
  --audiences "api://AzureADTokenExchange"

# Assign a role
az role assignment create --assignee $CLIENT_ID \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/SUB_ID/resourceGroups/prod-rg

Key behavioral detail: Azure Workload Identity requires a mutating webhook (azure-workload-identity-webhook) running in the AKS cluster. This webhook injects environment variables (AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_FEDERATED_TOKEN_FILE) and a projected token volume into pods that use annotated ServiceAccounts. If the webhook is not installed or the AKS OIDC issuer is not enabled, the annotation does nothing.

IAM Summary Table#

Aspect AWS (IRSA) GCP (Workload Identity) Azure (Workload Identity)
Machine identity IAM Role GCP Service Account Managed Identity
Pod-to-cloud mapping ServiceAccount annotation + OIDC trust ServiceAccount annotation + IAM binding ServiceAccount annotation + federated credential
Token location /var/run/secrets/eks.amazonaws.com/... GKE metadata server (169.254.169.254) Projected volume (path set by webhook)
Failure mode if misconfigured No credentials at API call time Falls back to node SA (too-broad access) No credentials (webhook injects nothing)
Setup complexity Medium (OIDC provider + trust policy) Medium (WI pool + IAM binding) High (identity + federated cred + webhook)

Networking Differences#

VPC/VNet Architecture#

AWS VPCs are regional. A VPC spans all availability zones in a region, but subnets are AZ-specific. Each subnet has a route table. VPC peering connects two VPCs (same or cross-region, same or cross-account). Transit Gateway connects many VPCs in a hub-and-spoke model.

GCP VPCs are global. A single VPC spans all regions. Subnets are regional. This means two subnets in different regions within the same VPC can communicate without peering. VPC Network Peering connects two VPCs and is bidirectional (but requires setup from both sides).

Azure VNets are regional, similar to AWS VPCs. VNet Peering connects two VNets (same or cross-region, same or cross-subscription). Virtual WAN provides hub-and-spoke connectivity.

The practical difference: on GCP, cross-region communication within a VPC “just works” because the VPC is global. On AWS and Azure, cross-region communication requires explicit VPC/VNet peering or a Transit Gateway/Virtual WAN.

Firewall Models#

Concept AWS GCP Azure
Instance-level firewall Security Groups (stateful) Firewall Rules with target tags (stateful) NSGs on NIC (stateful)
Subnet-level firewall NACLs (stateless) N/A (firewall rules are VPC-wide) NSGs on subnet (stateful)
Rule evaluation All rules evaluated (allow wins) Priority-ordered (first match wins) Priority-ordered (first match wins)
Default behavior Deny all inbound, allow all outbound Deny all inbound, allow all outbound Deny all inbound by default
Scope Per-instance (SG attached to ENI) Per-VPC (targeted by tags or SA) Per-NIC or per-subnet

Gotcha: AWS Security Groups evaluate all rules and allow traffic if any rule permits it. GCP and Azure firewall rules are priority-ordered and stop at the first match. A rule at priority 100 (allow) overrides a rule at priority 200 (deny) on GCP and Azure, but on AWS there is no priority – all allow rules are additive.

Load Balancer Behavior#

Feature AWS GCP Azure
L7 load balancer ALB (Application Load Balancer) HTTP(S) Load Balancer (global) Application Gateway
L4 load balancer NLB (Network Load Balancer) Network Load Balancer (regional/global) Azure Load Balancer
K8s integration AWS Load Balancer Controller GCE Ingress Controller (built-in) Azure load-balancer in cloud-provider
Default scope Regional Global (HTTP) or Regional (TCP/UDP) Regional
SSL termination ACM certificates on ALB Google-managed certificates Azure Key Vault certificates
Health checks Target group health checks Backend service health checks Health probes

Gotcha: GCP’s HTTP(S) Load Balancer is global by default – it has a single anycast IP that routes to the nearest backend. AWS ALB and Azure Application Gateway are regional. If you design for GCP’s global load balancing and then migrate to AWS, you need CloudFront in front of ALB, or a Global Accelerator, to achieve similar global routing.

Storage Driver Differences#

Block Storage for Kubernetes#

Aspect AWS (EBS CSI) GCP (PD CSI) Azure (Disk CSI)
CSI driver ebs.csi.aws.com pd.csi.storage.gke.io disk.csi.azure.com
Default storage class gp3 standard-rw (pd-standard) managed-csi (StandardSSD_LRS)
High-performance option io2 (up to 256K IOPS) pd-ssd (up to 100K IOPS) managed-csi-premium (Premium_LRS)
Volume attachment One AZ, one node One zone, one node One zone, one node
Resize support Online resize (gp2, gp3, io1, io2) Online resize Online resize
Snapshot support EBS Snapshots Persistent Disk Snapshots Azure Disk Snapshots
Max volume size 64 TiB (gp3) 64 TiB (pd-ssd) 64 TiB (Premium_LRS)

Gotcha: EBS volumes are AZ-locked. If a pod is rescheduled to a node in a different AZ, the PVC cannot follow. This is the same on all three clouds for block storage, but the failure manifests differently. On EKS, you get AttachVolume.Attach failed for volume: ...node ... is in different AZ from PV. On GKE, you get a similar zone mismatch error. The fix is the same (topology-aware scheduling), but the error messages and the zone label format differ (topology.kubernetes.io/zone values look like us-east-1a on AWS, us-east1-b on GCP, and eastus-1 on Azure).

Object Storage from Kubernetes#

Aspect AWS (S3) GCP (Cloud Storage) Azure (Blob Storage)
CSI driver (FUSE) Mountpoint for S3 (s3.csi.aws.com) Cloud Storage FUSE (gcsfuse.csi.storage.gke.io) Blob CSI (blob.csi.azure.com)
SDK/API AWS SDK (S3 API) Google Cloud Client Libraries Azure SDK
S3-compatible API Native Via interop XML API Not natively (use SDK)
Auth from pods IRSA Workload Identity Azure Workload Identity

Gotcha: GCP Cloud Storage supports the S3 XML API for interoperability, but not all S3 features are supported (e.g., no object lock, no select). Azure Blob Storage does not support the S3 API at all – applications using S3 SDKs must be rewritten to use Azure SDK or MinIO gateway.

Managed Database Behavioral Differences#

Connection Methods#

Aspect AWS (RDS) GCP (Cloud SQL) Azure (Azure Database)
Standard connection Endpoint DNS + username/password IP + username/password Hostname + username/password
IAM auth IAM database authentication (token-based) Cloud SQL IAM database authentication Azure AD authentication
Proxy/sidecar RDS Proxy (connection pooling + IAM auth) Cloud SQL Auth Proxy (sidecar container) No equivalent (direct connection)
Private connectivity VPC endpoints (PrivateLink) Private Services Access or PSC Private Endpoints
From K8s pods VPC-internal endpoint or RDS Proxy Cloud SQL Auth Proxy sidecar Private endpoint or direct

Gotcha: Cloud SQL Auth Proxy is almost always required for GKE workloads connecting to Cloud SQL. It handles SSL, IAM authentication, and connection management. There is no equivalent automatic sidecar injection – you must add the proxy as a sidecar container in your pod spec. Forgetting the proxy is a common migration failure when moving from AWS (where RDS is reachable directly from the VPC) to GCP.

Failover Behavior#

Aspect AWS (RDS Multi-AZ) GCP (Cloud SQL HA) Azure (Azure SQL)
HA mechanism Synchronous replication to standby Regional instance with failover replica Zone-redundant or geo-replication
Failover time 60-120 seconds ~60 seconds Typically under 30 seconds
DNS behavior Same endpoint, DNS TTL update Same IP, transparent failover Same connection string
Connection drop Yes – applications must reconnect Yes – applications must reconnect Yes – applications must reconnect
Read replicas Cross-region read replicas (async) Cross-region read replicas (async) Active geo-replication (async)

All three clouds drop connections during failover. Applications must handle reconnection. The difference is in DNS propagation – AWS RDS updates the DNS CNAME to point to the new primary, which means applications caching DNS may continue connecting to the old (now standby) instance. GCP Cloud SQL keeps the same IP. Azure keeps the same connection string.

Backup Patterns#

Aspect AWS (RDS) GCP (Cloud SQL) Azure (Azure Database)
Automated backups Daily, retention 1-35 days Daily, retention 1-365 days Daily, retention 1-35 days
Point-in-time recovery To any second within retention To any second within retention To any second within retention
Manual snapshots Unlimited, persist until deleted On-demand backups Long-term retention (LTR)
Cross-region backups Copy snapshot to another region Cross-region backup (automated) Geo-redundant backup storage
Backup storage cost Free up to DB size, then per-GB Included in instance cost (to a limit) Included (LRS), extra for GRS

DNS and Service Discovery#

Aspect AWS GCP Azure
Managed DNS Route 53 Cloud DNS Azure DNS
Private DNS zones Route 53 Private Hosted Zones (per VPC) Cloud DNS Private Zones (per VPC network) Azure Private DNS Zones (per VNet)
Service discovery Cloud Map Service Directory N/A (use Private DNS or Traffic Manager)
K8s external DNS ExternalDNS with Route 53 provider ExternalDNS with Cloud DNS provider ExternalDNS with Azure DNS provider
Split-horizon DNS Supported (private + public zones same name) Supported Supported

Gotcha: Route 53 Private Hosted Zones must be explicitly associated with each VPC that needs to resolve the records. If you peer two VPCs, the peered VPC does not automatically get access to the other VPC’s private hosted zones – you must create an association. GCP Private DNS Zones work similarly (must be attached to VPC networks). Azure Private DNS Zones must be linked to each VNet.

Cross-Cloud Gotcha Table#

Behavior AWS GCP Azure Trap
VPC scope Regional Global Regional GCP cross-region traffic within a VPC just works. On AWS/Azure you need peering
Default pod networking VPC CNI (pods get VPC IPs) Native GKE networking (alias IPs) Azure CNI (pods get VNet IPs) or kubenet IP exhaustion risk differs – AWS VPC CNI uses one ENI per pod, consuming subnet IPs fast
Pod identity fallback No credentials Node SA (too-broad) No credentials GCP Workload Identity misconfiguration silently grants broad node-level access
Load balancer scope Regional Global (HTTP) Regional Moving from GCP global LB to AWS requires adding CloudFront or Global Accelerator
IAM policy language JSON (allow/deny, resource ARNs) IAM roles (predefined or custom) RBAC (role definitions + scope) AWS IAM policies are the most granular. GCP and Azure use role-based, not resource-based, defaults
Storage class naming gp3, gp2, io1, io2 standard-rw, premium-rw managed-csi, managed-csi-premium Hardcoded storage class names in manifests break on cloud migration
Metadata endpoint 169.254.169.254 (IMDSv2) 169.254.169.254 169.254.169.254 Same IP, different response formats and auth mechanisms
NAT gateway cost $32/mo + $0.045/GB Cloud NAT per-VM charge + $0.045/GB Azure NAT Gateway $32/mo + $0.045/GB GCP NAT charges per VM using it, not a flat fee. Can be cheaper or more expensive
Private DB access VPC endpoint (PrivateLink) Private Services Access or PSC Private Endpoint Three different private connectivity models with different setup requirements
Container registry ECR (per-region) Artifact Registry (multi-region) ACR (per-resource group) ECR images are regional. Pulling cross-region adds latency and egress cost
K8s version lag EKS: often 1-2 months behind upstream GKE: rapid channel available day-one AKS: usually 1-3 months behind GKE Rapid channel gets new K8s versions weeks before EKS/AKS
Egress pricing $0.09/GB (first 10 TB) $0.12/GB (first 1 TB) $0.087/GB (first 5 TB) GCP is the most expensive for egress at low volumes. All three get cheaper at scale

Practical Translation Guide#

When migrating a workload or translating infrastructure between clouds, work through these layers in order:

  1. Application containers – No changes needed. OCI images are portable. Verify architecture (AMD64 vs ARM64).

  2. Kubernetes manifests – Remove or translate cloud-specific annotations. Update StorageClass references. Update Ingress annotations for the target cloud’s LB controller.

  3. IAM integration – Rewrite entirely. IRSA trust policies do not translate to Workload Identity bindings or Azure federated credentials. The identity model is different on each cloud.

  4. Networking – Redesign VPC/VNet architecture for the target cloud’s model. Translate security groups to NSGs or firewall rules. Update CIDR ranges if there are conflicts.

  5. Managed services – Replace or reconfigure. RDS becomes Cloud SQL or Azure Database. S3 becomes Cloud Storage or Blob Storage. Update connection strings, authentication methods, and backup configurations.

  6. Terraform/IaC – Rewrite provider-specific resources. The Terraform Kubernetes provider resources are portable. The aws, google, and azurerm provider resources are not.

  7. Monitoring and logging – Replace or use a portable layer (Prometheus, Grafana, OpenTelemetry). CloudWatch, Cloud Monitoring, and Azure Monitor are not interchangeable.

Items 1 and 2 are usually days of work. Items 3 through 7 are weeks to months, depending on the complexity of the integration.