How Kubernetes Captures Logs#
Containers write to stdout and stderr. The container runtime (containerd, CRI-O) captures these streams and writes them to files on the node. The kubelet manages these files at /var/log/pods/<namespace>_<pod-name>_<pod-uid>/<container-name>/ with symlinks from /var/log/containers/.
The format depends on the runtime. Containerd writes logs in a format with timestamp, stream tag, and the log line:
2026-02-22T10:15:32.123456789Z stdout F {"level":"info","msg":"request handled","status":200}
2026-02-22T10:15:32.456789012Z stderr F error: connection refused to databasekubectl logs reads these files. It only works while the pod exists – once a pod is deleted, its log files are eventually cleaned up. This is why centralized log collection is essential.
Log Rotation#
Without rotation, container logs consume all disk space on the node. Kubelet handles rotation with two settings:
# kubelet configuration
containerLogMaxSize: "50Mi" # rotate when a log file reaches this size
containerLogMaxFiles: 5 # keep this many rotated files per containerSet these in the kubelet config or via flags. On managed Kubernetes (EKS, GKE, AKS), these are typically set to reasonable defaults (10Mi/5 files), but verify for high-volume workloads. A container logging 100MB/minute will cycle through rotation files fast, and anything not collected by a log agent before rotation is lost.
Node-Level Logging Agents#
The standard pattern is a DaemonSet that runs one log collector pod per node. The agent reads log files from /var/log/pods and ships them to a centralized backend. Three main options:
Fluent Bit#
Lightweight, low memory footprint (~15MB), written in C. Best for high-throughput environments or resource-constrained nodes.
# fluent-bit DaemonSet values (Helm)
config:
inputs: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser containerd
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
filters: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
outputs: |
[OUTPUT]
Name loki
Match *
Host loki-gateway.logging
Port 80
Labels job=fluent-bit
Label_keys $kubernetes['namespace_name'],$kubernetes['pod_name'],$kubernetes['container_name']
Remove_keys kubernetes,streamInstall with Helm:
helm repo add fluent https://fluent.github.io/helm-charts
helm install fluent-bit fluent/fluent-bit \
--namespace logging --create-namespace \
--values fluent-bit-values.yamlFluentd#
More mature plugin ecosystem, higher memory usage (~100-200MB), written in Ruby. Better when you need complex log transformation or routing to multiple backends.
Promtail#
Grafana’s log agent, purpose-built for Loki. Lowest configuration overhead when your backend is Loki. See the Loki article for config details.
Sidecar Logging Pattern#
Some applications write logs to files instead of stdout. In Kubernetes, these logs are invisible to node-level agents. The sidecar pattern adds a log-streaming container to the pod:
apiVersion: v1
kind: Pod
metadata:
name: legacy-app
spec:
volumes:
- name: log-volume
emptyDir: {}
containers:
- name: app
image: legacy-app:1.0
volumeMounts:
- name: log-volume
mountPath: /var/log/app
- name: log-sidecar
image: busybox:1.36
command: ["sh", "-c", "tail -F /var/log/app/application.log"]
volumeMounts:
- name: log-volume
mountPath: /var/log/app
readOnly: trueThe sidecar tails the file and writes to stdout, where the node agent picks it up. Alternatively, run a Fluent Bit sidecar that ships directly to the backend, skipping the stdout step for higher throughput.
Use sidecars sparingly. They double the container count and memory overhead per pod. The better fix is usually to configure the application to write to stdout.
Structured Logging (JSON) vs Unstructured#
Unstructured logs are free-form text:
2026-02-22 10:15:32 ERROR Failed to process order 12345: connection timeout to payment serviceStructured logs are machine-parseable, typically JSON:
{"timestamp":"2026-02-22T10:15:32Z","level":"error","msg":"failed to process order","order_id":"12345","error":"connection timeout","service":"payment","duration_ms":5003}Structured logging is strongly preferred in Kubernetes because:
- Log agents parse it without custom regex patterns.
- Fields become queryable in Loki (
| json | order_id="12345") or Elasticsearch. - Context (request ID, trace ID, user ID) is attached consistently.
- Aggregation and alerting can operate on specific fields.
Configure your logging library to output JSON. In Go (zerolog):
log.Info().
Str("order_id", orderID).
Str("trace_id", span.SpanContext().TraceID().String()).
Int("status", resp.StatusCode).
Dur("duration", elapsed).
Msg("order processed")Log Levels and When to Use Each#
- ERROR: Something failed and requires attention. A request could not be fulfilled, data was lost, a dependency is down. Alert on this.
- WARN: Something unexpected happened but the system recovered. Retries succeeded, fallback was used, configuration is suboptimal. Review periodically.
- INFO: Normal operational events. Request handled, job completed, service started. The default level for production.
- DEBUG: Detailed diagnostic information. Request payloads, SQL queries, cache hits/misses. Disabled in production by default; enable per-service when debugging.
Never log at ERROR for expected conditions (user sends invalid input – that is a WARN at most). Never log sensitive data (passwords, tokens, PII) at any level.
Centralized Logging Stacks#
Fluent Bit to Loki (lightweight): Fluent Bit DaemonSet collects and parses logs, ships to Loki via the Loki output plugin. Query with LogQL in Grafana. Low resource cost, good Grafana integration.
Fluentd to Elasticsearch (full-featured): Fluentd DaemonSet collects logs, transforms and routes them, ships to Elasticsearch via the fluent-plugin-elasticsearch plugin. Query with Kibana or Elasticsearch API. Higher resource cost but powerful full-text search.
Both stacks can coexist. A common pattern is Fluent Bit on nodes forwarding to a Fluentd aggregator (Deployment) that handles complex routing, buffering, and multi-destination output.
Log-Based Alerting#
Loki supports alerting rules evaluated by the Loki ruler:
# loki-alert-rules.yaml
groups:
- name: application-errors
rules:
- alert: HighErrorRate
expr: |
sum(rate({namespace="production"} |= "error" [5m])) by (app) > 10
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate in {{ $labels.app }}"In Elasticsearch, use ElastAlert or Kibana alerting. The principle is the same: define a query, set a threshold, fire an alert when exceeded.
Retention and Cost Management#
Logs are the highest-volume telemetry signal. Control costs by:
- Setting retention periods per namespace or log level. Keep ERROR logs for 90 days, INFO for 14 days, DEBUG for 3 days.
- Dropping noisy logs at the agent level. Fluent Bit can exclude health check logs, Kubernetes event spam, or verbose libraries before they reach the backend.
- Compressing and tiering storage. Loki stores chunks in object storage with configurable retention. Elasticsearch supports ILM (Index Lifecycle Management) to move old indices to cheaper storage.
- Sampling verbose logs. If a service produces 10,000 lines/second of DEBUG output, sample 1% at the agent rather than storing all of it.
# Fluent Bit filter to drop health check noise
[FILTER]
Name grep
Match kube.*
Exclude log /health|/ready|/liveMonitor your log volume with loki_distributor_bytes_received_total (Loki) or index size in Elasticsearch. Set up alerts for unexpected volume spikes – a misbehaving service can blow through your storage budget in hours.