Infrastructure Knowledge Scoping for Agents#

An agent working on infrastructure tasks needs to operate at the right level of specificity. Giving generic Kubernetes advice when the user runs EKS with IRSA is unhelpful – the agent misses the IAM integration that will make or break the deployment. Giving EKS-specific advice when the user runs minikube on a laptop is equally unhelpful – the agent references services and configurations that do not exist.

The challenge is that agents often receive tasks without explicit platform context. A request like “deploy a web app with a database” could target a local minikube cluster, an EKS cluster in AWS, a GKE Autopilot cluster in GCP, or an AKS cluster in Azure. Each target requires different knowledge, different configurations, and different validation steps.

This guide defines a scoping hierarchy for infrastructure knowledge and explains how agents should detect, narrow, and apply the correct scope.

The Scoping Hierarchy#

Infrastructure knowledge operates at four levels, from most general to most specific. Each level inherits everything from the levels above it and adds platform-specific constraints and capabilities.

Level 1: Cloud-Agnostic#

Knowledge that applies regardless of the target platform. This is the foundation that transfers everywhere.

What lives here:

Kubernetes API (Deployments, Services, ConfigMaps, Secrets, RBAC)
Helm chart structure, templating, and lifecycle
Container image building and OCI standards
SQL database fundamentals (schemas, queries, migrations)
General networking concepts (DNS, TCP/IP, TLS, load balancing concepts)
Infrastructure-as-code patterns (state management, modularity, idempotency)
Application architecture (microservices, monoliths, event-driven, request-response)
Monitoring concepts (metrics, logs, traces, alerts)

When to operate here: When the platform is unknown, when the user explicitly wants portable solutions, when the task is purely application-level, or when providing educational content.

Level 2: Cloud-Specific#

Knowledge tied to a particular cloud provider. This layer adds the provider’s resource model, IAM system, managed services, and operational patterns.

What lives here:

IAM models (AWS IAM roles/policies, GCP IAM/service accounts, Azure RBAC/managed identities)
Managed Kubernetes specifics (EKS, GKE, AKS – node groups, networking plugins, identity integration)
Managed database behavior (RDS, Cloud SQL, Azure Database – connection methods, failover, backups)
Cloud networking (VPCs/VNets, subnets, security groups/NSGs/firewall rules, peering, load balancers)
Cloud storage (S3, Cloud Storage, Blob Storage – APIs, access patterns, lifecycle policies)
Cloud-specific tooling (AWS CLI, gcloud, az CLI, cloud-specific Terraform providers)

When to operate here: When the target cloud is known. Most production infrastructure work happens at this level.

Level 3: Region-Specific#

Knowledge tied to a specific region within a cloud provider. Regions determine service availability, compliance posture, and performance characteristics.

What lives here:

Service availability (not all services are available in all regions – e.g., some EC2 instance types, GKE features, or Azure services are region-limited)
Compliance and data residency requirements (GDPR in eu-west regions, data sovereignty laws)
Pricing differences (some regions are significantly cheaper or more expensive)
Availability zone count (some regions have 2 AZs, some have 3+, which affects HA architecture)
Latency characteristics (region proximity to end users)
Capacity constraints (some regions run out of specific instance types, especially GPU instances)

When to operate here: When optimizing for cost, compliance, or performance. When designing multi-region or disaster recovery architectures.

Level 4: Account-Specific#

Knowledge tied to a specific AWS account, GCP project, or Azure subscription. This is the most concrete level – it reflects the actual state of the user’s environment.

What lives here:

Existing VPC/VNet configurations (CIDR ranges, subnet layouts, peering connections)
IAM policies and roles already in place
Naming conventions and tagging standards
Existing deployed services and their configurations
Quotas and service limits (default and any limit increases)
Cost allocation tags and billing structure
Security policies and compliance controls
CI/CD pipeline configurations

When to operate here: When making changes to an existing environment. When the agent has access to the actual infrastructure state (via kubectl, cloud CLI, or Terraform state).

What Changes and What Stays the Same#

When moving between clouds, some knowledge transfers directly and some must be replaced entirely. Understanding which is which prevents agents from making incorrect assumptions.

Transfers Directly#

The Kubernetes API. A Deployment spec works identically on EKS, GKE, AKS, and minikube. Pod lifecycle, service discovery, ConfigMaps, Secrets, and RBAC are governed by the Kubernetes spec, not the cloud provider. An agent that knows how to write a Deployment manifest can apply that knowledge anywhere.

SQL and database schemas. PostgreSQL is PostgreSQL regardless of whether it runs in a container, on RDS, on Cloud SQL, or on Azure Database. Schema design, queries, indexes, and migration patterns are portable. (Connection methods and authentication are not – see below.)

Container images. A Docker image built for linux/amd64 runs the same on any cloud’s container runtime. Multi-arch images (linux/amd64 + linux/arm64) run on any cloud’s AMD64 or ARM64 instances.

Helm chart structure. Chart.yaml, values.yaml, template syntax, and Helm lifecycle (install, upgrade, rollback) are identical across platforms. The values may need to change for cloud-specific resources, but the chart mechanism itself is portable.

Monitoring and observability concepts. Prometheus scraping, Grafana dashboards, OpenTelemetry instrumentation, and alerting rule patterns work the same everywhere. (The collection infrastructure – CloudWatch vs Cloud Monitoring vs Azure Monitor – is not portable, but the concepts and open-source tooling are.)

Transfers Conceptually but Diverges in Implementation#

Networking. The concept of VPCs, subnets, firewall rules, and load balancers transfers between clouds. The implementations differ significantly: security groups (AWS) vs firewall rules (GCP) vs NSGs (Azure), VPC peering (AWS) vs VPC Network Peering (GCP) vs VNet Peering (Azure), ALB (AWS) vs HTTP(S) LB (GCP) vs Application Gateway (Azure). An agent that understands networking concepts can learn any cloud’s implementation, but cannot assume that AWS networking knowledge directly applies to GCP.

IAM. Every cloud has identity, authentication, and authorization. The conceptual framework (principals, permissions, scoping) transfers. The implementation is completely different: JSON policies on resource ARNs (AWS) vs predefined/custom IAM roles bound at resource hierarchy levels (GCP) vs RBAC role definitions scoped to subscription/resource-group/resource (Azure). An agent must not use AWS IAM terminology when working on GCP, or vice versa.

Storage classes. The concept of tiered storage (fast SSD, standard HDD, archival) transfers. The storage class names, CSI drivers, and performance characteristics are cloud-specific. gp3 means nothing on GCP. pd-ssd means nothing on AWS.

Does Not Transfer#

Cloud CLI commands. aws, gcloud, and az are completely different tools with different syntax, authentication methods, and output formats. Knowledge of aws s3 cp does not help with gcloud storage cp or az storage blob upload.

Terraform provider resources. aws_instance, google_compute_instance, and azurerm_virtual_machine are different resources with different arguments. Terraform’s HCL syntax and state management are portable, but the provider-specific resource definitions are not.

Cloud-specific annotations on Kubernetes resources. eks.amazonaws.com/role-arn is meaningless on GKE. iam.gke.io/gcp-service-account is meaningless on EKS. These must be replaced, not translated.

Managed service configurations. An RDS instance definition does not map to a Cloud SQL instance definition. The parameters, options, and behavioral defaults are different.

Detecting the Target Platform from Context Clues#

When an agent does not receive explicit platform information, it should look for context clues in the available environment data. Here are the detection strategies, ordered from most reliable to least.

kubectl Config (Most Reliable)#

The kubeconfig context reveals the cluster type immediately.

kubectl config current-context

EKS contexts follow the pattern: arn:aws:eks:REGION:ACCOUNT_ID:cluster/CLUSTER_NAME

GKE contexts follow the pattern: gke_PROJECT_ID_ZONE_CLUSTER_NAME or gke_PROJECT_ID_REGION_CLUSTER_NAME

AKS contexts follow the pattern: CLUSTER_NAME (set by az aks get-credentials)

Minikube: minikube

Kind: kind-CLUSTER_NAME

The cluster server URL also helps:

kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}'

*.eks.amazonaws.com – EKS
*.gke.goog or GCP IP ranges – GKE
*.azmk8s.io – AKS
127.0.0.1 or localhost – Local cluster (minikube, kind, Docker Desktop)

Cloud CLI Presence#

Check which cloud CLIs are installed and configured:

# AWS -- check for configured credentials
aws sts get-caller-identity 2>/dev/null && echo "AWS configured"

# GCP -- check for active project
gcloud config get-value project 2>/dev/null && echo "GCP configured"

# Azure -- check for active subscription
az account show 2>/dev/null && echo "Azure configured"

The presence of configured cloud CLIs does not guarantee the Kubernetes cluster runs on that cloud (a developer might have AWS CLI configured but be working on a local minikube cluster), but it indicates the cloud ecosystem the user operates in.

Node Labels#

Kubernetes nodes carry labels that reveal the underlying platform:

kubectl get nodes -o jsonpath='{.items[0].metadata.labels}' | jq .

EKS nodes have labels like:

eks.amazonaws.com/nodegroup
node.kubernetes.io/instance-type (values like m6g.xlarge)
topology.kubernetes.io/zone (values like us-east-1a)

GKE nodes have labels like:

cloud.google.com/gke-nodepool
cloud.google.com/machine-family
topology.kubernetes.io/zone (values like us-east1-b)

AKS nodes have labels like:

kubernetes.azure.com/agentpool
node.kubernetes.io/instance-type (values like Standard_D4s_v5)
topology.kubernetes.io/zone (values like eastus-1)

Minikube nodes have: minikube.k8s.io/name

Service Annotations and Resources#

Existing resources in the cluster reveal the platform through cloud-specific annotations:

# Check for cloud-specific annotations on services
kubectl get svc -A -o json | jq '.items[].metadata.annotations // {} | keys[]' | sort -u

# Check for cloud-specific storage classes
kubectl get storageclass

Annotations starting with service.beta.kubernetes.io/aws-, cloud.google.com/, or service.beta.kubernetes.io/azure- immediately identify the platform.

Storage class provisioners (ebs.csi.aws.com, pd.csi.storage.gke.io, disk.csi.azure.com) are definitive.

Terraform State and IaC Files#

If the agent has access to the project’s infrastructure code:

# Check Terraform providers
# Look for: provider "aws", provider "google", provider "azurerm"

The Terraform provider block, backend configuration (S3, GCS, azurerm), and resource types reveal the target cloud.

Building Cloud-Specific Mental Models#

Each cloud organizes resources differently. An agent should build the correct mental model for the target platform to reason about scope, permissions, and resource relationships.

AWS Resource Hierarchy#

Organization
  └── Organizational Unit (OU)
       └── Account (billing + isolation boundary)
            └── Region
                 └── VPC (networking boundary)
                      └── Subnet (AZ-specific)
                           └── Resources (EC2, RDS, EKS nodes)

Key insight: The AWS Account is the fundamental isolation boundary. IAM policies, VPCs, and billing are account-scoped. Cross-account access requires explicit IAM trust policies and role assumption. EKS clusters live inside a VPC in a specific region within an account.

GCP Resource Hierarchy#

Organization
  └── Folder (optional grouping)
       └── Project (billing + API + isolation boundary)
            └── VPC Network (global)
                 └── Subnet (regional)
                      └── Resources (Compute Engine, GKE nodes, Cloud SQL)

Key insight: The GCP Project is the fundamental isolation boundary (analogous to an AWS Account). VPC Networks are global within a project – they span all regions automatically. IAM bindings can be set at the organization, folder, project, or resource level and inherit downward.

Azure Resource Hierarchy#

Management Group
  └── Subscription (billing + policy boundary)
       └── Resource Group (lifecycle + access boundary)
            └── Resources (VMs, AKS, Azure SQL, VNets)

Key insight: Azure has two grouping concepts that other clouds do not: Subscriptions (billing boundary, like AWS Accounts) and Resource Groups (logical grouping within a subscription). Every resource belongs to exactly one Resource Group. Resource Groups are regional but can contain resources from any region. Deleting a Resource Group deletes everything in it.

Why the Hierarchy Matters for Agents#

When an agent needs to scope a permission, create a network boundary, or estimate cost, it must reason within the correct hierarchy:

AWS: “This pod needs to read from S3” means creating an IAM role in the same account, attaching an S3 policy, and configuring IRSA on the EKS cluster.
GCP: “This pod needs to read from Cloud Storage” means creating a service account in the same project, binding a Storage IAM role, and configuring Workload Identity on the GKE cluster.
Azure: “This pod needs to read from Blob Storage” means creating a managed identity in the same subscription (possibly different resource group), assigning a Storage Blob role, and configuring Azure Workload Identity on the AKS cluster.

Same task, three different hierarchies to reason about, three different sets of commands.

Practical Scoping Example: Deploy a Web App with Database#

Here is how the same task – “deploy a web application with a PostgreSQL database” – plays out at each scoping level.

Cloud-Agnostic Scope#

The agent produces portable artifacts:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: myorg/web-app:latest
        ports:
        - containerPort: 8080
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: url
---
# postgres deployment (in-cluster)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:16
        env:
        - name: POSTGRES_DB
          value: appdb
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: username
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: password
        volumeMounts:
        - name: pgdata
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: pgdata
        persistentVolumeClaim:
          claimName: postgres-pvc

This works on any Kubernetes cluster. The database runs in a container. The connection uses a username and password from a Secret. There is no cloud-specific integration.

AWS-Scoped (EKS + RDS)#

The agent replaces the in-cluster Postgres with RDS and adds IRSA for authentication:

# ServiceAccount with IRSA
apiVersion: v1
kind: ServiceAccount
metadata:
  name: web-app-sa
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/web-app-role
---
# Deployment using IRSA ServiceAccount
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      serviceAccountName: web-app-sa
      containers:
      - name: web-app
        image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/web-app:latest
        ports:
        - containerPort: 8080
        env:
        - name: DATABASE_HOST
          value: prod-postgres.cluster-xxxx.us-east-1.rds.amazonaws.com
        - name: DATABASE_NAME
          value: appdb
        - name: AWS_REGION
          value: us-east-1
        # Using IAM database auth -- no password needed

The database is now a managed RDS instance accessed via IAM authentication. The container image is pulled from ECR. The ServiceAccount has an IRSA annotation for AWS API access.

GCP-Scoped (GKE + Cloud SQL)#

The agent uses Cloud SQL with Auth Proxy sidecar and Workload Identity:

# ServiceAccount with Workload Identity
apiVersion: v1
kind: ServiceAccount
metadata:
  name: web-app-sa
  annotations:
    iam.gke.io/gcp-service-account: web-app-sa@my-project.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      serviceAccountName: web-app-sa
      containers:
      - name: web-app
        image: us-east1-docker.pkg.dev/my-project/repo/web-app:latest
        ports:
        - containerPort: 8080
        env:
        - name: DATABASE_HOST
          value: "127.0.0.1"  # Cloud SQL Auth Proxy
        - name: DATABASE_PORT
          value: "5432"
        - name: DATABASE_NAME
          value: appdb
        - name: DATABASE_USER
          value: web-app-sa@my-project.iam
      - name: cloud-sql-proxy
        image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0
        args:
        - "--structured-logs"
        - "--auto-iam-authn"
        - "my-project:us-east1:prod-postgres"
        securityContext:
          runAsNonRoot: true

The critical difference: GCP requires a Cloud SQL Auth Proxy sidecar container. The application connects to 127.0.0.1:5432 (the proxy), not directly to the Cloud SQL instance. The proxy handles IAM authentication and SSL. This sidecar pattern does not exist in the AWS or Azure versions.

Azure-Scoped (AKS + Azure Database)#

The agent uses Azure Database for PostgreSQL with Azure Workload Identity:

# ServiceAccount with Azure Workload Identity
apiVersion: v1
kind: ServiceAccount
metadata:
  name: web-app-sa
  annotations:
    azure.workload.identity/client-id: "CLIENT_ID_HERE"
  labels:
    azure.workload.identity/use: "true"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      serviceAccountName: web-app-sa
      containers:
      - name: web-app
        image: myorgregistry.azurecr.io/web-app:latest
        ports:
        - containerPort: 8080
        env:
        - name: DATABASE_HOST
          value: prod-postgres.postgres.database.azure.com
        - name: DATABASE_NAME
          value: appdb
        - name: DATABASE_USER
          value: web-app-identity
        - name: AZURE_CLIENT_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['azure.workload.identity/client-id']

Azure uses a direct connection to the database hostname with Azure AD authentication via the managed identity. No sidecar proxy is needed (unlike GCP), but the Workload Identity webhook must be installed on the AKS cluster and the azure.workload.identity/use: "true" label must be present on the ServiceAccount.

What the Example Shows#

The same conceptual task – “web app with Postgres” – produces four different configurations. The application code itself barely changes (it connects to a database URL either way). But the infrastructure surrounding it – identity, database connectivity, container registry, service annotations – is entirely platform-specific.

An agent that does not scope its knowledge correctly will produce one of two failure modes:

Under-scoping – Produces the cloud-agnostic version when the user needs cloud-native integration. The deployment works but uses an in-cluster Postgres instead of the managed database, misses IAM integration, and does not match the production architecture.
Over-scoping – Produces an AWS-specific version when the user runs GKE. The IRSA annotations are meaningless, ECR image references do not resolve, and the RDS connection string points nowhere.

The correct approach: detect the platform first (using the context clues described above), then scope the output to match. If the platform cannot be determined, ask. If the platform is explicitly cloud-agnostic (minikube, kind, local development), produce portable artifacts. If a specific cloud is identified, produce cloud-native artifacts with the correct IAM integration, managed service connections, and container registry references.

Agent Scoping Checklist#

Before producing infrastructure artifacts, an agent should run through this checklist:

What is the target cluster? Check kubeconfig context. If local (minikube/kind), use cloud-agnostic scope. If managed (EKS/GKE/AKS), use cloud-specific scope.
What managed services exist? Check for cloud-specific storage classes, CSI drivers, and existing service annotations. These reveal what cloud integrations are already in use.
What identity model is in use? Check for IRSA annotations (EKS), Workload Identity annotations (GKE), or Azure Workload Identity labels (AKS). Match the existing pattern.
What container registry? Check existing deployments for image references. Use the same registry (ECR, Artifact Registry, ACR) for consistency.
What is the resource hierarchy? Understand the account/project/subscription context. Ensure new resources are created in the correct scope with appropriate permissions.
What region and zones? Check node labels for topology.kubernetes.io/zone. Ensure new resources (databases, storage, load balancers) are created in compatible zones.

If any of these cannot be determined from context, the agent should ask rather than guess. An incorrect assumption about the target platform produces artifacts that fail on deployment, which is worse than asking a clarifying question.