Kubernetes API Server: Architecture, Authentication, Authorization, and Debugging#
The API server (kube-apiserver) is the front door to your Kubernetes cluster. Every interaction – kubectl commands, controller reconciliation loops, kubelet status updates, admission webhooks – goes through the API server. It is the only component that reads from and writes to etcd. If the API server is down, the cluster is unmanageable. Everything else (scheduler, controllers, kubelets) can tolerate brief API server outages because they cache state locally, but no mutations happen until the API server is back.
Request Lifecycle#
Every request to the API server follows the same pipeline:
Client Request
--> Authentication (who are you?)
--> Authorization (are you allowed to do this?)
--> Admission Control (mutating webhooks, then validating webhooks)
--> Schema Validation (is the object well-formed?)
--> etcd Write (persist the object)
--> ResponseFor read requests, the pipeline skips admission control and etcd write. For watch requests, the API server streams changes from its internal cache rather than polling etcd directly.
Understanding this pipeline is critical for debugging. A 401 means authentication failed. A 403 means authorization denied the request. A request that is rejected by a webhook returns the webhook’s error message. A validation error returns the specific field that failed.
Authentication#
Authentication answers “who are you?” The API server evaluates authenticators in order and uses the first one that succeeds. If none succeed, the request is rejected with 401.
Client Certificates#
The default method for cluster administrators. Your kubeconfig contains a client certificate signed by the cluster CA. The API server extracts the Common Name (CN) as the username and the Organization (O) fields as group memberships.
# Inspect your client certificate
openssl x509 -in ~/.kube/client.crt -noout -subject
# subject=O = system:masters, CN = kubernetes-adminThe system:masters group is hard-coded to bypass RBAC entirely – it always has full access. This is the cluster-admin backdoor. Never distribute certificates with this group beyond break-glass scenarios.
ServiceAccount Tokens#
Pods authenticate using projected service account tokens. Since Kubernetes 1.22, these are short-lived, audience-bound JWTs rather than the legacy long-lived secrets.
# Create a short-lived token for a service account
kubectl create token my-service-account -n my-namespace --duration=1h
# Decode a service account JWT (without verification)
kubectl create token my-service-account -n my-namespace | \
cut -d. -f2 | base64 -d 2>/dev/null | jq .The token includes the service account name, namespace, and expiration. The API server validates the signature against its own signing key.
OIDC Integration#
For human users at scale, OIDC connects the API server to an identity provider like Okta, Keycloak, or Azure AD. Users authenticate against the IdP and receive a JWT that the API server validates.
Configure on the API server:
--oidc-issuer-url=https://idp.example.com/auth/realms/kubernetes
--oidc-client-id=kubernetes
--oidc-username-claim=email
--oidc-groups-claim=groups
--oidc-username-prefix=oidc:
--oidc-groups-prefix=oidc:The prefix flags prevent collisions between OIDC usernames and built-in Kubernetes identities. With this configuration, an OIDC user alice@example.com in group platform-team becomes oidc:alice@example.com in group oidc:platform-team inside Kubernetes RBAC.
Webhook Token Authentication#
For custom auth backends, the API server can call an external webhook to validate bearer tokens:
--authentication-token-webhook-config-file=/etc/kubernetes/auth-webhook.yamlThe webhook receives the token and returns a user info object (username, groups, UID). This is useful for integrating with internal auth systems that don’t support OIDC.
Authorization#
Authorization answers “are you allowed to do this?” The API server evaluates authorization modules in order. If any module allows or denies the request, that decision is final. If a module has no opinion, the next module is consulted. If all modules have no opinion, the request is denied.
RBAC#
The most common mode. Permissions are defined by four objects: Role, ClusterRole, RoleBinding, ClusterRoleBinding. RBAC is purely additive – there are no deny rules. If no binding grants the permission, it is denied.
# Quick test: can a service account create deployments?
kubectl auth can-i create deployments -n production \
--as=system:serviceaccount:production:deployerNode Authorization#
The Node authorizer restricts kubelets to only access resources related to their own node: reading Secrets and ConfigMaps for pods scheduled on that node, updating their own Node status, creating events. Without this, a compromised kubelet could read any Secret in the cluster.
Enabled with --authorization-mode=Node,RBAC (order matters – Node is checked first).
Webhook Authorization#
For centralized policy enforcement, an external webhook makes authorization decisions. The API server sends the request attributes (user, verb, resource, namespace) and the webhook returns allow/deny. This integrates with policy engines like OPA/Gatekeeper or custom authorization services.
API Groups and Versions#
The API is organized into groups, each with independent versioning:
# Core group (legacy, no group prefix)
kubectl get --raw /api/v1 | jq '.resources[].name' | head -5
# "pods", "services", "configmaps", "secrets", "namespaces"
# Named groups
kubectl get --raw /apis/apps/v1 | jq '.resources[].name'
# "deployments", "replicasets", "statefulsets", "daemonsets"
# Discover all available resources
kubectl api-resources --sort-by=name
# Discover all API versions
kubectl api-versions
# Get schema details for a resource
kubectl explain deployment.spec.strategy --api-version=apps/v1Version progression follows a defined path: v1alpha1 (experimental, may break) to v1beta1 (API is stabilizing, may have breaking changes) to v1 (stable, backward-compatible). Alpha APIs are disabled by default and must be explicitly enabled with feature gates.
Aggregated API Servers#
The API can be extended by registering custom API servers. The main API server proxies requests for certain API groups to the aggregated server. The most common example is metrics-server, which serves the metrics.k8s.io API group:
# This request is proxied to the metrics-server pod
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes | jq '.items[0].usage'Aggregated API servers register via APIService objects:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100If the aggregated server is down, requests to its API group return 503. This is a common cause of kubectl slowness – it tries to discover all API groups and stalls on unhealthy aggregated APIs.
API Priority and Fairness#
In large clusters, a misbehaving controller can flood the API server with requests and starve other clients. API Priority and Fairness (APF) prevents this by classifying requests into priority levels and enforcing fair queuing.
apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: FlowSchema
metadata:
name: catch-excessive-controller
spec:
priorityLevelConfiguration:
name: workload-low
matchingPrecedence: 1000
rules:
- subjects:
- kind: ServiceAccount
serviceAccount:
name: noisy-controller
namespace: controllers
resourceRules:
- verbs: ["list", "watch"]
apiGroups: ["*"]
resources: ["*"]
namespaces: ["*"]Check current APF status:
# See flow schemas and their priority levels
kubectl get flowschemas
kubectl get prioritylevelconfigurations
# Check if requests are being queued or rejected
kubectl get --raw /metrics | grep apiserver_flowcontrolDebugging the API Server#
Health Checks#
# Overall health (returns "ok" or details of failures)
kubectl get --raw /healthz?verbose
# Liveness -- is the process alive?
kubectl get --raw /livez
# Readiness -- is it ready to serve traffic?
kubectl get --raw /readyz?verboseIndividual health checks are available as sub-paths: /readyz/etcd, /readyz/informer-sync, /healthz/poststarthook/start-apiextensions-controllers.
Key Metrics#
kubectl get --raw /metrics | grep -E "^apiserver_request_total|^apiserver_request_duration|^apiserver_current_inflight"Critical metrics to monitor:
| Metric | What It Tells You |
|---|---|
apiserver_request_total |
Request rate by verb, resource, and response code |
apiserver_request_duration_seconds |
Latency distribution per verb and resource |
apiserver_current_inflight_requests |
How many requests are in-flight right now |
apiserver_storage_objects |
Object counts per resource (scaling indicator) |
etcd_request_duration_seconds |
Latency of API server to etcd calls |
A spike in apiserver_request_duration_seconds combined with high apiserver_current_inflight_requests indicates the API server is overloaded. Check apiserver_request_total by response code – a sudden increase in 429 (rate limited) or 503 (unavailable) confirms resource exhaustion.
Rate Limiting#
The API server has two inflight request limits configured via flags:
--max-requests-inflight=400 # read requests
--max-mutating-requests-inflight=200 # write requestsWhen these limits are reached, additional requests receive 429 responses. In production, tune these based on your cluster size and controller count. Clusters with many CRDs and operators may need higher limits.
Common Gotchas#
API server OOM with many watchers. Each watch request holds an open connection and consumes memory for its event buffer. Large clusters with many controllers (especially those that watch broad resource types across all namespaces) can push the API server past its memory limit. Monitor apiserver_longrunning_requests and restrict controllers to namespace-scoped watches when possible.
Expired client certificates. Kubelet client certificates auto-rotate by default, but if the rotation mechanism breaks (kubelet was down during renewal, clock skew), the kubelet loses API access and the node goes NotReady. Check with openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -enddate. The API server’s own serving certificate and etcd client certificates also expire and must be rotated – kubeadm certs check-expiration shows all certificate expiry dates.
Aggregated API server failures causing kubectl slowness. When an aggregated API server is unreachable, kubectl commands that trigger API discovery (like kubectl get with no resource specified, or kubectl api-resources) hang or timeout. Check kubectl get apiservices and look for entries with Available: False. Fix the backing service or remove the APIService registration.