DaemonSets#

A DaemonSet ensures that a copy of a pod runs on every node in the cluster – or on a selected subset of nodes. When a new node joins the cluster, the DaemonSet controller automatically schedules a pod on it. When a node is removed, the pod is garbage collected.

This is the right abstraction for infrastructure that needs to run everywhere: log collectors, monitoring agents, network plugins, storage drivers, and security tooling.

When to Use DaemonSets#

DaemonSets solve problems where per-node presence matters:

  • Log collection: Fluent Bit, Fluentd, or Promtail reading container logs from each node’s /var/log and forwarding to a central system.
  • Metrics: Prometheus node-exporter exposing hardware and OS metrics from every node.
  • Networking: Calico, Cilium, or kube-proxy running on every node to provide pod networking and network policy enforcement.
  • Storage: CSI drivers that must run on every node to provide volume mount capabilities.
  • Security: Falco, Sysdig, or other runtime security agents monitoring system calls on each node.

Basic DaemonSet#

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
      - name: node-exporter
        image: prom/node-exporter:v1.7.0
        ports:
        - containerPort: 9100
          hostPort: 9100
        resources:
          requests:
            cpu: 50m
            memory: 64Mi
          limits:
            memory: 128Mi
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      hostNetwork: true
      hostPID: true

Node-exporter uses hostNetwork and hostPID because it needs direct access to node-level metrics. Most DaemonSets need some form of host access – log collectors mount /var/log, network plugins mount /opt/cni.

Node Selection#

Not every DaemonSet needs to run on every node. Use nodeSelector or nodeAffinity to restrict placement:

spec:
  template:
    spec:
      nodeSelector:
        node-role.kubernetes.io/worker: ""

For more complex rules, use node affinity:

spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values: ["linux"]
              - key: node-role.kubernetes.io/control-plane
                operator: DoesNotExist

This schedules pods only on Linux worker nodes, excluding control plane nodes.

Tolerations#

Tolerations are critical for DaemonSets. Nodes often have taints to prevent regular workloads from scheduling on them – control plane nodes, GPU nodes, dedicated tenant nodes. A DaemonSet pod without the right tolerations will not schedule on tainted nodes, leaving gaps in your coverage.

For cluster-wide agents (logging, monitoring), tolerate everything:

spec:
  template:
    spec:
      tolerations:
      - operator: Exists

The operator: Exists with no key matches all taints. This ensures the DaemonSet runs everywhere regardless of what taints exist.

For more selective targeting, tolerate specific taints:

spec:
  template:
    spec:
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      - key: node.kubernetes.io/not-ready
        operator: Exists
        effect: NoExecute
      - key: node.kubernetes.io/unreachable
        operator: Exists
        effect: NoExecute

The not-ready and unreachable tolerations are important for monitoring agents – you want them running on unhealthy nodes precisely because those nodes need monitoring the most.

Update Strategies#

RollingUpdate (Default)#

Updates DaemonSet pods one (or more) at a time across nodes:

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 0
  • maxUnavailable: how many nodes can have their DaemonSet pod down simultaneously during the update. Default is 1. Set higher for large clusters where updating one node at a time would take hours.
  • maxSurge (v1.22+): how many extra pods can exist during the update. With maxSurge: 1, Kubernetes creates the new pod before killing the old one on each node, reducing downtime. Not all DaemonSets support this – if the pod uses hostPort or hostNetwork, two pods cannot coexist on the same node.
# Trigger a rolling update by changing the image
kubectl set image daemonset/fluent-bit fluent-bit=fluent/fluent-bit:3.0 -n logging

# Watch the rollout progress
kubectl rollout status daemonset/fluent-bit -n logging

# Roll back if something goes wrong
kubectl rollout undo daemonset/fluent-bit -n logging

OnDelete#

Pods are only replaced when manually deleted:

spec:
  updateStrategy:
    type: OnDelete

This gives you full control over the update pace. Use it for sensitive node-level agents where you want to update one node, verify it works, then proceed. The tradeoff is operational overhead – you must delete pods yourself to trigger the update.

# Update the DaemonSet spec, then manually roll one node at a time
kubectl delete pod fluent-bit-7k2x4 -n logging
# Verify the replacement pod is healthy
kubectl get pod -l app=fluent-bit -n logging --field-selector spec.nodeName=worker-1
# Proceed to next node
kubectl delete pod fluent-bit-9m3z8 -n logging

Resource Management#

DaemonSet pods compete with workload pods for node resources. A log collector with aggressive resource requests can starve application pods on small nodes.

Set requests conservatively and limits generously:

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    memory: 256Mi

This reserves minimal resources for scheduling purposes but allows the pod to burst for short periods. Avoid setting CPU limits on DaemonSets – a log collector that gets throttled during a burst of application logs will fall behind and potentially lose data.

On nodes with limited capacity (e.g., small worker nodes, edge nodes), DaemonSet pods with high requests may be unable to schedule, leaving the pod in Pending state while the DaemonSet controller reports the node as missing coverage.

Priority and Preemption#

Use PriorityClass to ensure critical DaemonSet pods are not evicted when a node is under resource pressure:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: system-node-critical
value: 2000001000
globalDefault: false
description: "Critical node-level infrastructure"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
spec:
  template:
    spec:
      priorityClassName: system-node-critical

Kubernetes provides two built-in priority classes: system-node-critical and system-cluster-critical. Use these for infrastructure DaemonSets that must not be evicted. Application-level DaemonSets should use a custom PriorityClass with a lower value.

DaemonSets vs Static Pods#

Static pods are managed directly by the kubelet, not by the Kubernetes API. They are defined as YAML files in /etc/kubernetes/manifests/ on each node. The control plane components (kube-apiserver, etcd, kube-scheduler) run as static pods.

DaemonSets are managed by the DaemonSet controller through the Kubernetes API. Use DaemonSets for everything except the control plane itself. They support rolling updates, label selectors, resource quotas, and all the lifecycle management that static pods lack.

Debugging DaemonSets#

# Check rollout status
kubectl rollout status daemonset/fluent-bit -n logging

# See which nodes have pods and which are missing
kubectl get pods -l app=fluent-bit -n logging -o wide

# Compare against expected node count
kubectl get nodes --no-headers | wc -l
kubectl get pods -l app=fluent-bit -n logging --no-headers | wc -l

# If a pod is missing from a node, check for scheduling issues
kubectl describe daemonset fluent-bit -n logging
# Look for "Pods Status" and events showing why pods cannot schedule

# Check a specific node for taints that might block scheduling
kubectl describe node worker-3 | grep -A5 Taints

When a DaemonSet pod is missing from a node, the cause is almost always one of: the pod does not tolerate the node’s taints, the pod’s nodeSelector or nodeAffinity excludes the node, or the pod’s resource requests exceed the node’s available capacity.

Common Gotchas#

Node drain blocked by DaemonSet pods: When draining a node, DaemonSet pods are ignored by default (kubectl drain --ignore-daemonsets). However, if DaemonSet pods have a PodDisruptionBudget (PDB), the drain may block. This is unusual but can happen if someone applies a PDB that matches DaemonSet pods by label. The fix is either to exclude DaemonSet pods from the PDB selector or to use --delete-emptydir-data --ignore-daemonsets when draining.

Resource requests too high: If a DaemonSet requests 1 CPU and 2Gi memory, and your nodes have 4 CPUs and 8Gi, you have given 25% of every node’s resources to a single infrastructure pod. Multiply by several DaemonSets (logging, monitoring, networking, security) and you can lose half your node capacity before any application pods schedule. Audit your DaemonSet resource requests regularly.

Forgetting tolerations after adding taints: You add a new taint to a node group and your monitoring agent stops running on those nodes. Always audit DaemonSet tolerations when modifying node taints.

Practical Example: Fluent Bit DaemonSet#

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    app: fluent-bit
spec:
  selector:
    matchLabels:
      app: fluent-bit
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      priorityClassName: system-node-critical
      serviceAccountName: fluent-bit
      tolerations:
      - operator: Exists
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:3.0
        resources:
          requests:
            cpu: 50m
            memory: 64Mi
          limits:
            memory: 256Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: containers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: config
          mountPath: /fluent-bit/etc/
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: containers
        hostPath:
          path: /var/lib/docker/containers
      - name: config
        configMap:
          name: fluent-bit-config

This DaemonSet tolerates all taints (runs on every node including control plane), uses system-node-critical priority (will not be evicted under pressure), mounts host log directories read-only, and uses conservative resource requests to avoid starving application pods. The rolling update strategy updates one node at a time, ensuring log collection continues on other nodes during the rollout.