Cloud Behavioral Divergence Guide#

Running the “same” workload on AWS, Azure, and GCP does not produce the same behavior. The Kubernetes API is portable, application containers are portable, and SQL queries are portable. Everything else – identity, networking, storage, load balancing, DNS, and managed service behavior – diverges in ways that matter for production reliability.

This guide documents the specific divergence points with practical examples. Use it when translating infrastructure from one cloud to another, when debugging behavior that differs between environments, or when assessing migration risk.

IAM Model Differences#

Identity and access management is the most significant behavioral divergence between clouds. Each cloud has a fundamentally different model for how workloads authenticate and authorize.

AWS: IAM Roles and IRSA#

AWS IAM is built around roles and policies. A role is an identity that can be assumed by users, services, or other AWS accounts. A policy is a JSON document specifying which API actions are allowed on which resources.

For Kubernetes workloads on EKS, IRSA (IAM Roles for Service Accounts) bridges Kubernetes ServiceAccounts to IAM roles using OIDC federation. The flow:

  1. EKS cluster has an OIDC provider registered with IAM
  2. A Kubernetes ServiceAccount is annotated with an IAM role ARN
  3. When a pod using that ServiceAccount calls AWS APIs, the kubelet injects a web identity token
  4. AWS STS exchanges the token for temporary IAM credentials
  5. The pod assumes the IAM role with its attached policies
# Create the IAM role with a trust policy for the OIDC provider
aws iam create-role --role-name app-role \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"},
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:default:app-sa"
        }
      }
    }]
  }'

# Attach a policy
aws iam attach-role-policy --role-name app-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

# Annotate the Kubernetes ServiceAccount
kubectl annotate serviceaccount app-sa \
  eks.amazonaws.com/role-arn=arn:aws:iam::123456789012:role/app-role

Key behavioral detail: IRSA tokens are projected into the pod at /var/run/secrets/eks.amazonaws.com/serviceaccount/token. AWS SDKs automatically detect and use this token. If the OIDC provider is not configured or the trust policy condition does not match the ServiceAccount’s namespace and name exactly, the pod gets no credentials silently – no error at pod startup, only at the first AWS API call.

GCP: Service Accounts and Workload Identity#

GCP uses service accounts as the machine identity primitive. A service account is a GCP identity with an email address (name@project.iam.gserviceaccount.com) and IAM role bindings at various scopes (organization, folder, project, resource).

For GKE workloads, Workload Identity maps Kubernetes ServiceAccounts to GCP service accounts without key files:

  1. GKE cluster has a Workload Identity pool
  2. A Kubernetes ServiceAccount is annotated with a GCP service account email
  3. The GCP service account has an IAM binding allowing the Kubernetes ServiceAccount to impersonate it
  4. GKE’s metadata server intercepts credential requests from pods and returns GCP tokens
# Create the GCP service account
gcloud iam service-accounts create app-sa \
  --display-name="Application Service Account"

# Grant the GCP SA permissions
gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:app-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

# Allow the Kubernetes SA to impersonate the GCP SA
gcloud iam service-accounts add-iam-policy-binding \
  app-sa@my-project.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:my-project.svc.id.goog[default/app-sa]"

# Annotate the Kubernetes ServiceAccount
kubectl annotate serviceaccount app-sa \
  iam.gke.io/gcp-service-account=app-sa@my-project.iam.gserviceaccount.com

Key behavioral detail: Workload Identity requires the GKE metadata server. Pods query 169.254.169.254 for tokens, and GKE intercepts this and returns GCP credentials. If Workload Identity is not enabled on the node pool, pods fall back to the node’s default service account, which typically has broader permissions than intended. This is a security risk that does not exist on EKS (where pods with no IRSA annotation simply have no AWS credentials).

Azure: Managed Identities and Azure Workload Identity#

Azure uses Managed Identities (system-assigned or user-assigned) for Azure-hosted resources and Azure Workload Identity for AKS workloads. Managed Identities eliminate credential management entirely – Azure rotates the credentials automatically.

For AKS workloads, Azure Workload Identity (which replaced AAD Pod Identity in 2024) uses OIDC federation similar to AWS IRSA:

  1. AKS cluster has an OIDC issuer URL
  2. A user-assigned managed identity is created with a federated credential pointing to the AKS OIDC issuer, namespace, and ServiceAccount name
  3. The Kubernetes ServiceAccount is annotated with the managed identity’s client ID
  4. The Azure Identity SDK exchanges the projected token for Azure AD tokens
# Create a user-assigned managed identity
az identity create --resource-group prod-rg --name app-identity

# Get the client ID and principal ID
CLIENT_ID=$(az identity show --resource-group prod-rg --name app-identity \
  --query clientId --output tsv)

# Create a federated credential
az identity federated-credential create \
  --identity-name app-identity \
  --resource-group prod-rg \
  --name app-federated-cred \
  --issuer "$(az aks show --resource-group prod-rg --name prod-cluster \
    --query oidcIssuerProfile.issuerUrl --output tsv)" \
  --subject "system:serviceaccount:default:app-sa" \
  --audiences "api://AzureADTokenExchange"

# Assign a role
az role assignment create --assignee $CLIENT_ID \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/SUB_ID/resourceGroups/prod-rg

Key behavioral detail: Azure Workload Identity requires a mutating webhook (azure-workload-identity-webhook) running in the AKS cluster. This webhook injects environment variables (AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_FEDERATED_TOKEN_FILE) and a projected token volume into pods that use annotated ServiceAccounts. If the webhook is not installed or the AKS OIDC issuer is not enabled, the annotation does nothing.

IAM Summary Table#

AspectAWS (IRSA)GCP (Workload Identity)Azure (Workload Identity)
Machine identityIAM RoleGCP Service AccountManaged Identity
Pod-to-cloud mappingServiceAccount annotation + OIDC trustServiceAccount annotation + IAM bindingServiceAccount annotation + federated credential
Token location/var/run/secrets/eks.amazonaws.com/...GKE metadata server (169.254.169.254)Projected volume (path set by webhook)
Failure mode if misconfiguredNo credentials at API call timeFalls back to node SA (too-broad access)No credentials (webhook injects nothing)
Setup complexityMedium (OIDC provider + trust policy)Medium (WI pool + IAM binding)High (identity + federated cred + webhook)

Networking Differences#

VPC/VNet Architecture#

AWS VPCs are regional. A VPC spans all availability zones in a region, but subnets are AZ-specific. Each subnet has a route table. VPC peering connects two VPCs (same or cross-region, same or cross-account). Transit Gateway connects many VPCs in a hub-and-spoke model.

GCP VPCs are global. A single VPC spans all regions. Subnets are regional. This means two subnets in different regions within the same VPC can communicate without peering. VPC Network Peering connects two VPCs and is bidirectional (but requires setup from both sides).

Azure VNets are regional, similar to AWS VPCs. VNet Peering connects two VNets (same or cross-region, same or cross-subscription). Virtual WAN provides hub-and-spoke connectivity.

The practical difference: on GCP, cross-region communication within a VPC “just works” because the VPC is global. On AWS and Azure, cross-region communication requires explicit VPC/VNet peering or a Transit Gateway/Virtual WAN.

Firewall Models#

ConceptAWSGCPAzure
Instance-level firewallSecurity Groups (stateful)Firewall Rules with target tags (stateful)NSGs on NIC (stateful)
Subnet-level firewallNACLs (stateless)N/A (firewall rules are VPC-wide)NSGs on subnet (stateful)
Rule evaluationAll rules evaluated (allow wins)Priority-ordered (first match wins)Priority-ordered (first match wins)
Default behaviorDeny all inbound, allow all outboundDeny all inbound, allow all outboundDeny all inbound by default
ScopePer-instance (SG attached to ENI)Per-VPC (targeted by tags or SA)Per-NIC or per-subnet

Gotcha: AWS Security Groups evaluate all rules and allow traffic if any rule permits it. GCP and Azure firewall rules are priority-ordered and stop at the first match. A rule at priority 100 (allow) overrides a rule at priority 200 (deny) on GCP and Azure, but on AWS there is no priority – all allow rules are additive.

Load Balancer Behavior#

FeatureAWSGCPAzure
L7 load balancerALB (Application Load Balancer)HTTP(S) Load Balancer (global)Application Gateway
L4 load balancerNLB (Network Load Balancer)Network Load Balancer (regional/global)Azure Load Balancer
K8s integrationAWS Load Balancer ControllerGCE Ingress Controller (built-in)Azure load-balancer in cloud-provider
Default scopeRegionalGlobal (HTTP) or Regional (TCP/UDP)Regional
SSL terminationACM certificates on ALBGoogle-managed certificatesAzure Key Vault certificates
Health checksTarget group health checksBackend service health checksHealth probes

Gotcha: GCP’s HTTP(S) Load Balancer is global by default – it has a single anycast IP that routes to the nearest backend. AWS ALB and Azure Application Gateway are regional. If you design for GCP’s global load balancing and then migrate to AWS, you need CloudFront in front of ALB, or a Global Accelerator, to achieve similar global routing.

Storage Driver Differences#

Block Storage for Kubernetes#

AspectAWS (EBS CSI)GCP (PD CSI)Azure (Disk CSI)
CSI driverebs.csi.aws.compd.csi.storage.gke.iodisk.csi.azure.com
Default storage classgp3standard-rw (pd-standard)managed-csi (StandardSSD_LRS)
High-performance optionio2 (up to 256K IOPS)pd-ssd (up to 100K IOPS)managed-csi-premium (Premium_LRS)
Volume attachmentOne AZ, one nodeOne zone, one nodeOne zone, one node
Resize supportOnline resize (gp2, gp3, io1, io2)Online resizeOnline resize
Snapshot supportEBS SnapshotsPersistent Disk SnapshotsAzure Disk Snapshots
Max volume size64 TiB (gp3)64 TiB (pd-ssd)64 TiB (Premium_LRS)

Gotcha: EBS volumes are AZ-locked. If a pod is rescheduled to a node in a different AZ, the PVC cannot follow. This is the same on all three clouds for block storage, but the failure manifests differently. On EKS, you get AttachVolume.Attach failed for volume: ...node ... is in different AZ from PV. On GKE, you get a similar zone mismatch error. The fix is the same (topology-aware scheduling), but the error messages and the zone label format differ (topology.kubernetes.io/zone values look like us-east-1a on AWS, us-east1-b on GCP, and eastus-1 on Azure).

Object Storage from Kubernetes#

AspectAWS (S3)GCP (Cloud Storage)Azure (Blob Storage)
CSI driver (FUSE)Mountpoint for S3 (s3.csi.aws.com)Cloud Storage FUSE (gcsfuse.csi.storage.gke.io)Blob CSI (blob.csi.azure.com)
SDK/APIAWS SDK (S3 API)Google Cloud Client LibrariesAzure SDK
S3-compatible APINativeVia interop XML APINot natively (use SDK)
Auth from podsIRSAWorkload IdentityAzure Workload Identity

Gotcha: GCP Cloud Storage supports the S3 XML API for interoperability, but not all S3 features are supported (e.g., no object lock, no select). Azure Blob Storage does not support the S3 API at all – applications using S3 SDKs must be rewritten to use Azure SDK or MinIO gateway.

Managed Database Behavioral Differences#

Connection Methods#

AspectAWS (RDS)GCP (Cloud SQL)Azure (Azure Database)
Standard connectionEndpoint DNS + username/passwordIP + username/passwordHostname + username/password
IAM authIAM database authentication (token-based)Cloud SQL IAM database authenticationAzure AD authentication
Proxy/sidecarRDS Proxy (connection pooling + IAM auth)Cloud SQL Auth Proxy (sidecar container)No equivalent (direct connection)
Private connectivityVPC endpoints (PrivateLink)Private Services Access or PSCPrivate Endpoints
From K8s podsVPC-internal endpoint or RDS ProxyCloud SQL Auth Proxy sidecarPrivate endpoint or direct

Gotcha: Cloud SQL Auth Proxy is almost always required for GKE workloads connecting to Cloud SQL. It handles SSL, IAM authentication, and connection management. There is no equivalent automatic sidecar injection – you must add the proxy as a sidecar container in your pod spec. Forgetting the proxy is a common migration failure when moving from AWS (where RDS is reachable directly from the VPC) to GCP.

Failover Behavior#

AspectAWS (RDS Multi-AZ)GCP (Cloud SQL HA)Azure (Azure SQL)
HA mechanismSynchronous replication to standbyRegional instance with failover replicaZone-redundant or geo-replication
Failover time60-120 seconds~60 secondsTypically under 30 seconds
DNS behaviorSame endpoint, DNS TTL updateSame IP, transparent failoverSame connection string
Connection dropYes – applications must reconnectYes – applications must reconnectYes – applications must reconnect
Read replicasCross-region read replicas (async)Cross-region read replicas (async)Active geo-replication (async)

All three clouds drop connections during failover. Applications must handle reconnection. The difference is in DNS propagation – AWS RDS updates the DNS CNAME to point to the new primary, which means applications caching DNS may continue connecting to the old (now standby) instance. GCP Cloud SQL keeps the same IP. Azure keeps the same connection string.

Backup Patterns#

AspectAWS (RDS)GCP (Cloud SQL)Azure (Azure Database)
Automated backupsDaily, retention 1-35 daysDaily, retention 1-365 daysDaily, retention 1-35 days
Point-in-time recoveryTo any second within retentionTo any second within retentionTo any second within retention
Manual snapshotsUnlimited, persist until deletedOn-demand backupsLong-term retention (LTR)
Cross-region backupsCopy snapshot to another regionCross-region backup (automated)Geo-redundant backup storage
Backup storage costFree up to DB size, then per-GBIncluded in instance cost (to a limit)Included (LRS), extra for GRS

DNS and Service Discovery#

AspectAWSGCPAzure
Managed DNSRoute 53Cloud DNSAzure DNS
Private DNS zonesRoute 53 Private Hosted Zones (per VPC)Cloud DNS Private Zones (per VPC network)Azure Private DNS Zones (per VNet)
Service discoveryCloud MapService DirectoryN/A (use Private DNS or Traffic Manager)
K8s external DNSExternalDNS with Route 53 providerExternalDNS with Cloud DNS providerExternalDNS with Azure DNS provider
Split-horizon DNSSupported (private + public zones same name)SupportedSupported

Gotcha: Route 53 Private Hosted Zones must be explicitly associated with each VPC that needs to resolve the records. If you peer two VPCs, the peered VPC does not automatically get access to the other VPC’s private hosted zones – you must create an association. GCP Private DNS Zones work similarly (must be attached to VPC networks). Azure Private DNS Zones must be linked to each VNet.

Cross-Cloud Gotcha Table#

BehaviorAWSGCPAzureTrap
VPC scopeRegionalGlobalRegionalGCP cross-region traffic within a VPC just works. On AWS/Azure you need peering
Default pod networkingVPC CNI (pods get VPC IPs)Native GKE networking (alias IPs)Azure CNI (pods get VNet IPs) or kubenetIP exhaustion risk differs – AWS VPC CNI uses one ENI per pod, consuming subnet IPs fast
Pod identity fallbackNo credentialsNode SA (too-broad)No credentialsGCP Workload Identity misconfiguration silently grants broad node-level access
Load balancer scopeRegionalGlobal (HTTP)RegionalMoving from GCP global LB to AWS requires adding CloudFront or Global Accelerator
IAM policy languageJSON (allow/deny, resource ARNs)IAM roles (predefined or custom)RBAC (role definitions + scope)AWS IAM policies are the most granular. GCP and Azure use role-based, not resource-based, defaults
Storage class naminggp3, gp2, io1, io2standard-rw, premium-rwmanaged-csi, managed-csi-premiumHardcoded storage class names in manifests break on cloud migration
Metadata endpoint169.254.169.254 (IMDSv2)169.254.169.254169.254.169.254Same IP, different response formats and auth mechanisms
NAT gateway cost$32/mo + $0.045/GBCloud NAT per-VM charge + $0.045/GBAzure NAT Gateway $32/mo + $0.045/GBGCP NAT charges per VM using it, not a flat fee. Can be cheaper or more expensive
Private DB accessVPC endpoint (PrivateLink)Private Services Access or PSCPrivate EndpointThree different private connectivity models with different setup requirements
Container registryECR (per-region)Artifact Registry (multi-region)ACR (per-resource group)ECR images are regional. Pulling cross-region adds latency and egress cost
K8s version lagEKS: often 1-2 months behind upstreamGKE: rapid channel available day-oneAKS: usually 1-3 months behindGKE Rapid channel gets new K8s versions weeks before EKS/AKS
Egress pricing$0.09/GB (first 10 TB)$0.12/GB (first 1 TB)$0.087/GB (first 5 TB)GCP is the most expensive for egress at low volumes. All three get cheaper at scale

Practical Translation Guide#

When migrating a workload or translating infrastructure between clouds, work through these layers in order:

  1. Application containers – No changes needed. OCI images are portable. Verify architecture (AMD64 vs ARM64).

  2. Kubernetes manifests – Remove or translate cloud-specific annotations. Update StorageClass references. Update Ingress annotations for the target cloud’s LB controller.

  3. IAM integration – Rewrite entirely. IRSA trust policies do not translate to Workload Identity bindings or Azure federated credentials. The identity model is different on each cloud.

  4. Networking – Redesign VPC/VNet architecture for the target cloud’s model. Translate security groups to NSGs or firewall rules. Update CIDR ranges if there are conflicts.

  5. Managed services – Replace or reconfigure. RDS becomes Cloud SQL or Azure Database. S3 becomes Cloud Storage or Blob Storage. Update connection strings, authentication methods, and backup configurations.

  6. Terraform/IaC – Rewrite provider-specific resources. The Terraform Kubernetes provider resources are portable. The aws, google, and azurerm provider resources are not.

  7. Monitoring and logging – Replace or use a portable layer (Prometheus, Grafana, OpenTelemetry). CloudWatch, Cloud Monitoring, and Azure Monitor are not interchangeable.

Items 1 and 2 are usually days of work. Items 3 through 7 are weeks to months, depending on the complexity of the integration.