Cloud-Native vs Portable Infrastructure#
Every infrastructure decision sits on a spectrum between portability and fidelity. On one end, you have generic Kubernetes running on minikube or kind – it works everywhere, costs nothing, and captures the behavior of the Kubernetes API itself. On the other end, you have cloud-native managed services – EKS with IRSA and ALB Ingress Controller, GKE with Workload Identity and Cloud Load Balancing, AKS with Azure AD Pod Identity and Azure Load Balancer. These capture the behavior of the actual platform your workloads will run on.
The question is not which is “better.” The question is which level of fidelity you need for the task at hand. Getting this wrong in either direction wastes time: testing on cloud-native when generic would suffice is expensive and slow, while testing on generic when cloud-native behavior matters produces results that do not transfer to production.
What “Portable” Actually Means in Practice#
When people say infrastructure is “portable,” they mean different things. Here is what portable infrastructure actually consists of and where its boundaries are.
The Portable Layer#
These components behave the same regardless of the underlying cloud or local environment:
The Kubernetes API itself. A Deployment, Service, ConfigMap, or Secret works identically on minikube, kind, EKS, GKE, and AKS. The scheduling behavior, pod lifecycle, rolling updates, readiness probes, and container runtime interface are all governed by the Kubernetes spec. If your concern is whether your YAML manifests are valid and your pods start correctly, generic Kubernetes is sufficient.
Helm charts (with generic values). A Helm chart that uses standard Kubernetes resources – Deployments, Services, ConfigMaps, Ingress with no provider-specific annotations – is portable. You can template, lint, and install it on any cluster. The chart structure, value overrides, hooks, and release management work the same everywhere.
Terraform with generic providers. Terraform’s Kubernetes provider, Helm provider, and local providers work against any cluster. If your Terraform defines Kubernetes resources rather than cloud infrastructure, it is portable. The moment you use the aws, google, or azurerm providers, you are no longer portable.
Application containers. A Docker image runs identically everywhere (assuming architecture compatibility – x86_64 vs ARM64 is a real concern, but not a cloud-specific one). The application code does not know or care whether it is running on EKS or minikube.
Internal networking. Pod-to-pod communication, service discovery via CoreDNS, ClusterIP services, and headless services work the same across all Kubernetes implementations. If service A needs to call service B at http://service-b.namespace.svc.cluster.local:8080, that works everywhere.
Where Portability Ends#
The portable layer covers maybe 60-70% of a typical production deployment. The remaining 30-40% is where cloud-native behavior diverges, and it is exactly the part that matters most for production reliability.
What Cloud-Native Adds#
Cloud-native managed Kubernetes is not just “Kubernetes but on AWS.” It is a different deployment surface with behaviors that generic Kubernetes does not capture.
Managed Node Groups and Autoscaling#
On generic Kubernetes, you configure cluster-autoscaler to manage node counts. On EKS, you use managed node groups with launch templates, AMI updates, and Capacity Reservations. On GKE, you use node auto-provisioning or Autopilot mode where Google manages nodes entirely. On AKS, you use Virtual Machine Scale Sets with Azure-managed images.
The practical difference: on generic k8s, your autoscaling test tells you whether the HPA and cluster-autoscaler configs are valid. On cloud-native, your test tells you whether the node group will actually provision nodes fast enough, whether the AMI has the required packages, and whether the instance type you selected is available in your target availability zone.
Cloud IAM Integration#
This is the single largest behavioral divergence between generic and cloud-native Kubernetes. On generic k8s, workloads use Kubernetes ServiceAccounts and RBAC. In production, workloads need to access cloud services – S3 buckets, databases, KMS keys – and that requires cloud-native identity.
EKS uses IRSA (IAM Roles for Service Accounts). A Kubernetes ServiceAccount is annotated with an IAM role ARN. The EKS OIDC provider issues tokens that AWS STS trusts. Pods assume the IAM role automatically. The annotation looks like this:
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/app-roleGKE uses Workload Identity. A Kubernetes ServiceAccount is bound to a GCP service account. GKE’s metadata server intercepts token requests and returns GCP credentials. The binding looks like this:
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
annotations:
iam.gke.io/gcp-service-account: app-sa@project-id.iam.gserviceaccount.comAKS uses Azure Workload Identity (replacing AAD Pod Identity). A Kubernetes ServiceAccount is annotated with a federated credential configuration pointing to an Azure Managed Identity. The annotation:
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
annotations:
azure.workload.identity/client-id: "CLIENT_ID"
labels:
azure.workload.identity/use: "true"None of these work on generic Kubernetes. If your application depends on assuming a cloud identity to access a database or object store, testing on minikube will not reveal IAM misconfiguration, missing trust policies, or incorrect scoping. Your pods will start, but they will fail the moment they try to authenticate to a cloud service.
Cloud-Specific Storage Classes#
Generic Kubernetes provides a standard storage class backed by whatever the local provisioner offers (hostPath on minikube, local volumes on kind). Cloud-native Kubernetes provides storage classes backed by cloud block storage with specific performance characteristics.
EKS: gp3 (default), gp2, io1, io2 backed by EBS via the ebs.csi.aws.com driver. Performance guarantees (IOPS, throughput) are tied to the volume type and size.
GKE: standard-rw (pd-standard), premium-rw (pd-ssd), hyperdisk-balanced backed by Persistent Disk via the pd.csi.storage.gke.io driver.
AKS: managed-csi (StandardSSD_LRS default), managed-csi-premium (Premium_LRS) backed by Azure Managed Disks via the disk.csi.azure.com driver.
If your StatefulSet depends on io1 IOPS guarantees and you test on minikube with hostPath, the test tells you nothing about storage performance. Worse, the PVC will bind successfully on minikube (giving a false positive) and then fail to provision on EKS if the io1 storage class is not configured.
Cloud Load Balancers#
Generic Kubernetes exposes services via NodePort or a local LoadBalancer (metallb, minikube tunnel). Cloud-native Kubernetes creates actual cloud load balancers with provider-specific annotations.
An AWS ALB Ingress (via AWS Load Balancer Controller):
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abc123
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'A GKE managed certificate ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: "gce"
networking.gke.io/managed-certificates: "app-cert"
kubernetes.io/ingress.global-static-ip-name: "app-ip"An AKS Azure Load Balancer service:
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
service.beta.kubernetes.io/azure-load-balancer-resource-group: "prod-rg"These annotations do nothing on generic Kubernetes. Testing with them on minikube does not validate that the ALB will provision, the certificate will attach, or the load balancer will route traffic correctly.
Managed Databases#
The difference between running PostgreSQL in a container and using RDS, Cloud SQL, or Azure Database is not just operational convenience. The connection methods, authentication mechanisms, failover behavior, and backup patterns are fundamentally different.
On generic k8s, your app connects to postgres://user:password@postgres-service:5432/dbname. On EKS with RDS, it connects via IAM authentication tokens, possibly through an RDS Proxy. On GKE with Cloud SQL, it connects through the Cloud SQL Auth Proxy sidecar. On AKS with Azure Database, it connects using Azure AD tokens via managed identity.
Testing your application’s database connectivity on generic Kubernetes with a containerized Postgres tells you whether your SQL queries work. It does not tell you whether your IAM-based database authentication is configured correctly.
The Decision Matrix#
Use this matrix to determine whether generic Kubernetes is sufficient or whether you need cloud-native fidelity.
Use Generic Kubernetes When#
| Scenario | Why Generic Is Sufficient |
|---|---|
| Validating Kubernetes manifests | The K8s API is identical across implementations |
| Testing Helm chart templating and install | Helm behavior is the same everywhere |
| Application-level integration testing | Container behavior does not depend on the cloud |
| Validating pod lifecycle (probes, init containers) | Pod lifecycle is a K8s spec, not cloud-specific |
| Internal service-to-service communication | CoreDNS and ClusterIP work the same everywhere |
| RBAC policy testing (K8s-level) | Kubernetes RBAC is portable |
| Developing and debugging locally | Fast iteration matters more than fidelity |
| CI pipeline unit and integration tests | Speed and cost matter; cloud-specific tests run separately |
Use Cloud-Native When#
| Scenario | Why Cloud-Native Is Required |
|---|---|
| Testing IAM role assumption from pods | IRSA, Workload Identity, Azure WI do not exist on generic k8s |
| Validating cloud load balancer provisioning | ALB, GCE LB, Azure LB annotations need real cloud APIs |
| Testing storage performance characteristics | hostPath does not simulate EBS, PD, or Azure Disk behavior |
| Validating managed database connectivity | IAM auth, Auth Proxy, and managed identity auth need real services |
| Testing cloud-specific networking (VPC CNI) | Pod networking behavior differs between CNI plugins |
| Validating cloud-specific autoscaling | Managed node group scaling is different from cluster-autoscaler on bare k8s |
| Pre-production change validation for CAB review | The test must match the production platform to be evidence |
| Testing DNS integration (Route53, Cloud DNS, Azure DNS) | ExternalDNS behavior depends on the DNS provider |
The Hybrid Approach#
The most practical strategy layers both. Run generic Kubernetes tests first (fast, cheap, catches 70% of issues), then run cloud-native tests for the cloud-specific surface area (slower, costs money, catches the remaining 30%).
A typical CI pipeline:
- Lint and template –
helm lint,helm template,kubevalorkubeconformagainst the target k8s version. Zero cost. Catches manifest errors. - Generic cluster test – Deploy to kind or minikube in CI. Run application-level integration tests. Low cost. Catches application bugs and basic Kubernetes misconfiguration.
- Cloud-native test – Deploy to a sandbox EKS/GKE/AKS cluster. Run cloud-specific integration tests (IAM, storage, load balancer provisioning). Higher cost. Catches cloud integration issues.
Step 3 only runs on PRs targeting main, not on every commit to a feature branch. This balances cost and fidelity.
The Cost-Fidelity Tradeoff#
| Approach | Cost per Test | Fidelity | Catches |
|---|---|---|---|
helm lint + kubeconform |
~$0 | Low | Syntax errors, schema violations, deprecated APIs |
| kind/minikube in CI | ~$0.01-0.05 | Medium | Application bugs, K8s misconfiguration, service communication |
| Sandbox cloud cluster (shared) | ~$0.10-0.50 | High | Cloud IAM, storage drivers, LB provisioning, managed DB connectivity |
| Sandbox cloud cluster (dedicated) | ~$1-5 | Very High | Full production parity, network policies, node group behavior |
| Production clone | ~$10-50 | Near-perfect | Everything, including real data volumes and traffic patterns |
Most teams should operate at the “kind/minikube in CI” level by default and escalate to “sandbox cloud cluster” for changes that touch cloud-specific integration points. Operating at “production clone” for every change is prohibitively expensive and usually unnecessary.
Detecting When You Have Crossed the Portability Boundary#
An agent working on infrastructure should watch for these signals that the task has moved beyond the portable layer:
Annotations with cloud-specific prefixes. Any annotation starting with eks.amazonaws.com, iam.gke.io, azure.workload.identity, alb.ingress.kubernetes.io, or service.beta.kubernetes.io/azure- indicates cloud-native integration. Testing these on generic k8s is meaningless.
StorageClass references that are not standard. If a PVC references gp3, pd-ssd, managed-csi-premium, or any cloud-specific storage class, generic k8s will not have that storage class and the PVC will either fail or bind to a local volume that behaves differently.
External service dependencies. If a pod needs to reach an RDS endpoint, a Cloud SQL instance, or an Azure Database server, testing on generic k8s requires either mocking those services (reducing fidelity) or port-forwarding to real cloud services (adding complexity without the full integration test).
Terraform or Pulumi with cloud providers. If the infrastructure-as-code defines cloud resources (aws_eks_cluster, google_container_cluster, azurerm_kubernetes_cluster), validating it on generic k8s is not possible. You need terraform plan against the real cloud APIs.
Node selectors or tolerations targeting cloud-specific labels. Labels like node.kubernetes.io/instance-type, topology.kubernetes.io/zone, or kubernetes.azure.com/agentpool will not match anything on generic k8s.
Practical Scoping for Agents#
When an agent receives a task involving Kubernetes infrastructure, it should categorize the work:
-
Pure Kubernetes – The task involves Deployments, Services, ConfigMaps, RBAC, Helm charts with no cloud annotations. Scope: generic k8s is sufficient. Use kind or minikube.
-
Kubernetes with cloud integration points – The task involves IAM annotations, cloud load balancer configuration, cloud storage classes, or managed database connectivity. Scope: cloud-native testing is required for the integration points. Test application logic on generic k8s, then test integration points on the target cloud.
-
Cloud infrastructure provisioning – The task involves creating EKS/GKE/AKS clusters, VPCs, subnets, IAM roles, or managed services. Scope: generic k8s is irrelevant. This requires Terraform/Pulumi plan and apply against the target cloud’s APIs.
Getting the scope right means the agent does not over-invest in cloud-native testing when generic would suffice, and does not produce false confidence by testing cloud-specific behavior on a platform that cannot capture it.