AKS Setup and Configuration#
Azure Kubernetes Service handles the control plane for you – you pay nothing for it. What you configure is node pools, networking, identity, and add-ons. Getting these right at cluster creation matters because several choices (networking model, managed identity) cannot be changed later without rebuilding the cluster.
Creating a Cluster with az CLI#
The minimal command that produces a production-usable cluster:
az aks create \
--resource-group myapp-rg \
--name myapp-aks \
--location eastus2 \
--node-count 3 \
--node-vm-size Standard_D4s_v5 \
--network-plugin azure \
--network-plugin-mode overlay \
--vnet-subnet-id /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Network/virtualNetworks/myapp-vnet/subnets/aks-subnet \
--enable-managed-identity \
--enable-aad \
--aad-admin-group-object-ids <admin-group-id> \
--generate-ssh-keys \
--tier standardKey flags: --network-plugin azure --network-plugin-mode overlay gives you Azure CNI Overlay, which avoids the IP exhaustion problems of classic Azure CNI. --tier standard enables the financially-backed SLA and uptime guarantees (the free tier has no SLA). --enable-aad integrates Entra ID (formerly Azure AD) for authentication.
Terraform Approach#
For repeatable infrastructure, Terraform with the azurerm provider is the standard:
resource "azurerm_kubernetes_cluster" "aks" {
name = "myapp-aks"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "myapp"
sku_tier = "Standard"
default_node_pool {
name = "system"
node_count = 3
vm_size = "Standard_D4s_v5"
vnet_subnet_id = azurerm_subnet.aks.id
only_critical_addons_enabled = true
}
identity {
type = "SystemAssigned"
}
network_profile {
network_plugin = "azure"
network_plugin_mode = "overlay"
pod_cidr = "10.244.0.0/16"
}
azure_active_directory_role_based_access_control {
azure_rbac_enabled = true
admin_group_object_ids = [var.admin_group_id]
}
}Setting only_critical_addons_enabled = true on the default (system) node pool taints it with CriticalAddonsOnly=true:NoSchedule, preventing application workloads from landing on system nodes. You then add a separate user node pool for your workloads.
Node Pools: System vs User#
AKS requires at least one system node pool for cluster-critical components (CoreDNS, metrics-server, kube-proxy). Separate user pools run your applications. This separation prevents your workloads from starving system components.
# Add a user pool for application workloads
az aks nodepool add \
--resource-group myapp-rg \
--cluster-name myapp-aks \
--name workload \
--node-count 5 \
--node-vm-size Standard_D8s_v5 \
--mode User \
--labels environment=production tier=app
# Add a spot instance pool for batch/dev workloads (up to 90% cheaper)
az aks nodepool add \
--resource-group myapp-rg \
--cluster-name myapp-aks \
--name spotnodes \
--node-count 3 \
--node-vm-size Standard_D4s_v5 \
--mode User \
--priority Spot \
--eviction-policy Delete \
--spot-max-price -1Spot nodes can be evicted at any time. Use them for fault-tolerant workloads (batch jobs, CI runners, dev environments). Set --spot-max-price -1 to accept any price up to the on-demand rate. Spot pools automatically get a kubernetes.azure.com/scalesetpriority:spot:NoSchedule taint, so only pods with a matching toleration will schedule there.
Networking Models#
AKS supports four networking configurations. This choice is permanent for the cluster.
Kubenet: Pods get IPs from a virtual network that is NAT’d. Simpler, uses fewer VNet IPs, but no direct pod-to-pod communication across VNets and no Azure Network Policy support.
Azure CNI (traditional): Every pod gets an IP from the Azure VNet subnet. Enables direct communication with other Azure resources. The problem: a subnet with a /24 CIDR only has 251 usable IPs. With 3 nodes running 30 pods each, you burn 90+ IPs. Subnets need to be sized for maximum pod count.
Azure CNI Overlay: Pods get IPs from a private CIDR (default 10.244.0.0/16) overlaid on top of the VNet. Nodes still get VNet IPs, but pods do not consume subnet addresses. This is the best default for most new clusters – you get Azure CNI features without IP exhaustion.
Azure CNI with Cilium: Uses Cilium as the dataplane instead of the default Azure networking. Gives you Cilium network policies, Hubble observability, and eBPF-based networking. Enable it with --network-dataplane cilium.
Managed Identity and Azure AD Integration#
AKS uses a managed identity to interact with Azure APIs (pull images from ACR, manage load balancers, attach disks). Always use managed identity over service principals – identities are auto-rotated and do not require secret management.
Attach ACR to your cluster so nodes can pull images without explicit credentials:
az aks update \
--resource-group myapp-rg \
--name myapp-aks \
--attach-acr myappcrThis assigns the AcrPull role to the cluster’s kubelet identity on your Azure Container Registry.
Essential Add-Ons#
Enable these at creation time or immediately after:
# Container Insights (monitoring)
az aks enable-addons --resource-group myapp-rg --name myapp-aks \
--addons monitoring --workspace-resource-id /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.OperationalInsights/workspaces/<workspace>
# Azure Policy (enforce guardrails)
az aks enable-addons --resource-group myapp-rg --name myapp-aks \
--addons azure-policy
# Azure Key Vault Secrets Provider
az aks enable-addons --resource-group myapp-rg --name myapp-aks \
--addons azure-keyvault-secrets-providerThe monitoring add-on deploys a containerized OMS agent that ships logs and metrics to a Log Analytics workspace. Azure Policy installs Gatekeeper and syncs Azure Policy definitions as constraint templates. The Key Vault provider installs the Secrets Store CSI driver configured for Azure Key Vault.
Getting Credentials#
After cluster creation, get kubectl credentials:
# Admin credentials (bypasses Azure AD)
az aks get-credentials --resource-group myapp-rg --name myapp-aks --admin
# User credentials (requires Azure AD login)
az aks get-credentials --resource-group myapp-rg --name myapp-aks
# With Azure AD, you also need kubelogin
az aks install-cli # installs kubectl and kubelogin
kubelogin convert-kubeconfig -l azurecliUse --admin only for initial setup or break-glass scenarios. For day-to-day use, Azure AD credentials ensure audit logging and RBAC enforcement.