Kubernetes Controllers: Reconciliation Loops, the Controller Manager, and Custom Controllers#
Kubernetes is a declarative system. You tell it what you want (a Deployment with 3 replicas), and controllers make it happen. Controllers are the engines that continuously reconcile desired state with actual state. Without controllers, your YAML manifests would be inert data in etcd.
The Controller Pattern#
Every controller follows the same loop:
1. Watch the API server for changes to a specific resource type
2. For each change, compare desired state (spec) to actual state (status)
3. Take action to bring actual state closer to desired state
4. Update status to reflect current actual state
5. RepeatThis is a level-triggered model, not edge-triggered. A controller does not just react to changes – it reconciles the entire state on each pass. If a controller crashes and restarts, it re-reads all objects and converges to the correct state without needing to replay missed events. This makes controllers resilient to transient failures.
The Controller Manager#
The kube-controller-manager is a single binary that runs approximately 30 built-in controllers. Each controller runs as a goroutine within this process. This is an implementation convenience – they could be separate processes, but bundling them reduces operational overhead.
Key Controllers#
Deployment controller. Watches Deployments and manages ReplicaSets. When you update a Deployment’s pod template, the Deployment controller creates a new ReplicaSet with the updated template and scales it up while scaling the old one down. This is how rolling updates work. Rollbacks work by scaling up a previous ReplicaSet.
ReplicaSet controller. Watches ReplicaSets and ensures the correct number of pod replicas exist. If a pod dies, the ReplicaSet controller creates a replacement. It uses label selectors to identify which pods belong to it.
StatefulSet controller. Like ReplicaSet but with ordering guarantees. Pods are created sequentially (pod-0 must be Running before pod-1 is created) and deleted in reverse order. Each pod gets a stable network identity and persistent storage.
DaemonSet controller. Ensures exactly one pod runs on each node matching the selector. When a new node joins the cluster, the DaemonSet controller creates the pod. When a node is removed, the pod is garbage collected.
Job controller. Manages run-to-completion workloads. Tracks successful and failed completions, handles parallelism, and implements backoff for failed pods.
Node controller. Monitors node health via heartbeats. If a node stops updating its lease object (default timeout: 40 seconds), the node controller sets its condition to NotReady. After the pod eviction timeout (default: 5 minutes), it taints the node with node.kubernetes.io/unreachable:NoExecute, which triggers pod eviction by the taint-based eviction controller.
Namespace controller. Handles namespace deletion. When you delete a namespace, the namespace controller must clean up every resource in it – pods, services, configmaps, roles, bindings, custom resources. This is why namespace deletion can be slow or get stuck.
Endpoint/EndpointSlice controller. Watches Services and Pods, and maintains the mapping between service selectors and pod IPs. When a pod becomes Ready or is deleted, the endpoint controller updates the corresponding Endpoints and EndpointSlice objects. This is how kube-proxy and ingress controllers know where to route traffic.
Garbage collector. Handles cascade deletion via owner references. When a parent object is deleted, the garbage collector finds all objects that reference it as an owner and deletes them too.
Leader Election#
In high-availability setups with multiple control plane nodes, multiple instances of the controller manager run simultaneously, but only one is the active leader. The others are standby. Leader election uses a Lease object in the kube-system namespace:
# Check which instance is the current leader
kubectl get lease kube-controller-manager -n kube-system -o yaml
# spec:
# holderIdentity: control-plane-1_<uuid>
# leaseDurationSeconds: 15
# renewTime: "2026-02-22T10:30:45Z"If the leader crashes, the lease expires after leaseDurationSeconds, and another instance acquires the lease. There is a brief gap during failover where no controllers are running.
Owner References and Garbage Collection#
Kubernetes uses owner references to track parent-child relationships between objects. When you create a Deployment, the Deployment controller creates a ReplicaSet with an ownerReferences entry pointing to the Deployment. The ReplicaSet controller in turn creates Pods with an owner reference to the ReplicaSet.
# Pod metadata showing its owner
metadata:
name: my-app-7d4b8c6f9-x2k4p
ownerReferences:
- apiVersion: apps/v1
kind: ReplicaSet
name: my-app-7d4b8c6f9
uid: 3f8a1b2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c
controller: true
blockOwnerDeletion: trueWhen you delete a Deployment, three deletion propagation policies determine what happens:
# Foreground: parent waits for children to be deleted first
kubectl delete deployment my-app --cascade=foreground
# Background (default): parent is deleted immediately, children are cleaned up asynchronously
kubectl delete deployment my-app
# Orphan: parent is deleted, children are left behind (ownerReferences removed)
kubectl delete deployment my-app --cascade=orphanForeground deletion is useful when you need to ensure all resources are cleaned up before proceeding. Orphan deletion is useful for adopting resources into a different parent (for example, during a migration between controllers).
Finalizers#
Finalizers prevent an object from being deleted until cleanup is complete. A controller adds a finalizer string to the object’s metadata.finalizers list. When the object is deleted, Kubernetes sets metadata.deletionTimestamp but does not remove the object from etcd. The controller sees the deletion timestamp, performs cleanup, removes its finalizer, and only then does Kubernetes actually delete the object.
metadata:
name: my-database
finalizers:
- database.example.com/cleanup
deletionTimestamp: "2026-02-22T10:30:00Z"Common finalizer patterns:
- PersistentVolume cleanup. The
kubernetes.io/pv-protectionfinalizer prevents PV deletion while pods are still using it. - Namespace cleanup. The
kubernetesfinalizer on namespaces ensures all resources within are deleted before the namespace itself is removed. - External resource cleanup. A cloud database operator adds a finalizer to its CRD instance to ensure the cloud database is deleted before the Kubernetes object disappears.
Stuck Finalizers#
The most common finalizer problem is a stuck deletion – an object has deletionTimestamp set but a finalizer that will never be removed, usually because the controller that manages the finalizer is gone, crashed, or has a bug.
# Identify stuck resources
kubectl get all -A -o json | jq '.items[] | select(.metadata.deletionTimestamp != null) |
{name: .metadata.name, namespace: .metadata.namespace, finalizers: .metadata.finalizers}'
# Force-remove a stuck finalizer (DANGEROUS: skips cleanup)
kubectl patch pv stuck-volume -p '{"metadata":{"finalizers":null}}' --type=mergeRemoving a finalizer manually bypasses whatever cleanup the controller was supposed to do. For a PersistentVolume backed by a cloud disk, this means the disk is orphaned and you must delete it manually in the cloud console. For namespace finalizers, resources inside the namespace may be orphaned.
Custom Controllers and Operators#
The operator pattern extends Kubernetes with domain-specific automation: define a Custom Resource Definition (CRD) for your application, then build a controller that watches instances of that CRD and manages the underlying resources.
Architecture#
CRD (defines the schema)
+ Custom Controller (watches CRD instances, reconciles state)
= OperatorExample: a PostgreSQL operator defines a PostgresCluster CRD. When you create a PostgresCluster object, the controller provisions a StatefulSet, Services, ConfigMaps, PVCs, and runs initialization SQL. When you update the CRD to change the replica count, the controller scales the StatefulSet.
Frameworks#
Kubebuilder (Go) – the standard framework for building operators. Generates scaffolding, handles boilerplate, provides a controller-runtime library with caching, event handling, and leader election built in.
# Initialize a new operator project
kubebuilder init --domain example.com --repo github.com/example/db-operator
kubebuilder create api --group database --version v1 --kind PostgresClusterOperator SDK (Go, Ansible, Helm) – Red Hat’s framework, builds on Kubebuilder for Go operators and adds Ansible and Helm-based operator support for teams that do not write Go.
Metacontroller – a meta-operator that lets you write controller logic as simple HTTP webhooks in any language. You define a sync hook that receives the parent object and its children, and returns the desired children. Good for simple operators without writing the full controller machinery.
Informers and Work Queues#
Controllers do not poll the API server. They use informers – client-side caches that maintain a synchronized copy of the relevant objects using a watch connection. When an object changes, the informer fires an event handler that enqueues the object’s key (namespace/name) into a work queue. A worker goroutine dequeues keys and runs the reconciliation logic.
API Server --watch--> Informer (local cache) --event--> Work Queue --dequeue--> ReconcilerThis architecture is efficient: the controller makes very few API calls (only when it needs to create, update, or delete objects), and the informer cache serves all read operations locally.
Level-Triggered vs Edge-Triggered#
Kubernetes controllers are level-triggered. The reconciler function receives the full current state of the object and must determine the complete desired state, not just respond to a specific change. This means:
- If the work queue coalesces multiple events into one reconciliation, nothing is lost.
- If the controller restarts, it re-lists all objects and reconciles each one.
- Idempotency is essential: running the reconciler twice with the same input must produce the same result.
Debugging Controllers#
Controller Manager Logs#
# View controller manager logs (static pod on control plane)
kubectl logs -n kube-system kube-controller-manager-control-plane-1 --tail=100
# Filter for a specific controller
kubectl logs -n kube-system kube-controller-manager-control-plane-1 | grep "deployment_controller"Key Metrics#
The controller manager exposes metrics on its /metrics endpoint. Critical metrics for diagnosing performance issues:
| Metric | What It Tells You |
|---|---|
workqueue_depth |
Items waiting to be processed per controller |
workqueue_adds_total |
Rate of items being added to each queue |
workqueue_retries_total |
How often items are re-queued after errors |
workqueue_longest_running_processor_seconds |
Slowest active reconciliation |
workqueue_unfinished_work_seconds |
Time spent on items not yet finished |
A high workqueue_depth with increasing workqueue_retries_total means a controller is failing to reconcile objects and falling behind. Check the controller logs for the specific error.
# Quick check from the control plane node
curl -sk https://127.0.0.1:10257/metrics | grep workqueue_depthCommon Gotchas#
Finalizer deadlocks. A namespace has a finalizer managed by an operator. You uninstall the operator. You then try to delete the namespace. The namespace hangs forever in Terminating because the finalizer controller no longer exists. Solution: always remove CRDs and their instances before uninstalling the operator, or manually patch out the finalizer.
Controller crashes with leader election conflict. Two controller instances believe they are the leader due to clock skew or a long GC pause. One instance’s lease renewal fails and it panics. This is usually transient – the surviving instance becomes the stable leader. If it persists, check NTP synchronization and resource limits on the controller manager pod.
Cascade deletion surprises. Deleting a namespace deletes everything in it, including resources you may not have expected (RoleBindings, ServiceAccounts, PVCs). Use kubectl get all -n <namespace> to audit before deleting, and remember that get all does not show everything – CRDs, configmaps, secrets, and roles require explicit listing.