Setting Up Full Observability from Scratch#
This operational sequence deploys a complete observability stack on Kubernetes: metrics (Prometheus + Grafana), logs (Loki + Promtail), traces (Tempo + OpenTelemetry), and alerting (Alertmanager). Each phase is self-contained with verification steps. Complete them in order – later phases depend on earlier infrastructure.
Prerequisite: a running Kubernetes cluster with Helm installed and a monitoring namespace created.
kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo updatePhase 1 – Metrics (Prometheus + Grafana)#
Metrics are the foundation. Logging and tracing integrations all route through Grafana, so this phase must be solid before continuing.
Step 1: Install kube-prometheus-stack#
This single Helm chart deploys Prometheus, Grafana, Alertmanager, node-exporter, kube-state-metrics, and the Prometheus Operator with ServiceMonitor CRDs.
cat <<'EOF' > /tmp/prometheus-values.yaml
prometheus:
prometheusSpec:
retention: 15d
retentionSize: "45GB"
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
memory: 4Gi
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
grafana:
persistence:
enabled: true
size: 10Gi
adminPassword: "change-me-immediately"
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: default
orgId: 1
folder: ""
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default
sidecar:
dashboards:
enabled: true
searchNamespace: monitoring
datasources:
enabled: true
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
resources:
requests:
cpu: 50m
memory: 64Mi
nodeExporter:
resources:
requests:
cpu: 50m
memory: 32Mi
EOF
helm install kube-prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values /tmp/prometheus-values.yamlThe key setting is serviceMonitorSelectorNilUsesHelmValues: false. Without this, Prometheus only discovers ServiceMonitors created by this Helm release, ignoring all others. This is the single most common reason people install Prometheus and wonder why their application metrics do not appear.
Step 2: Verify ServiceMonitor Discovery#
# Wait for Prometheus to be ready
kubectl wait --for=condition=Ready pods -l app.kubernetes.io/name=prometheus -n monitoring --timeout=300s
# Check that Prometheus has discovered targets
kubectl port-forward svc/kube-prometheus-prometheus 9090:9090 -n monitoring &
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets | length'Pass: Target count is greater than 0. Typical fresh install discovers 10-15 targets (kubelet, apiserver, node-exporter, kube-state-metrics, etc.).
Step 3: Import Essential Dashboards#
The kube-prometheus-stack installs several dashboards automatically via ConfigMap sidecar. Verify they loaded:
kubectl port-forward svc/kube-prometheus-grafana 3000:80 -n monitoring &
# Open http://localhost:3000 -- login with admin / change-me-immediately
# Navigate to Dashboards -- you should see:
# - Kubernetes / Compute Resources / Cluster
# - Kubernetes / Compute Resources / Namespace (Pods)
# - Node Exporter / Nodes
# - Prometheus / OverviewStep 4: Configure Remote Write (Optional)#
If you need long-term metric storage beyond 15 days, configure remote write to Thanos or Grafana Mimir:
# Add to prometheus-values.yaml under prometheus.prometheusSpec:
remoteWrite:
- url: "http://mimir-distributor.monitoring:8080/api/v1/push"
queueConfig:
maxSamplesPerSend: 1000
maxShards: 10
capacity: 2500Phase 1 Verification#
kubectl top nodes # metrics-server (separate from Prometheus) works
kubectl top pods -n monitoring # all monitoring pods running
kubectl get servicemonitor -n monitoring # ServiceMonitors exist
kubectl get prometheusrule -n monitoring # Alerting rules loaded
curl -s http://localhost:9090/api/v1/targets | \
jq '[.data.activeTargets[] | select(.health=="up")] | length' # all targets healthyRollback: helm uninstall kube-prometheus -n monitoring. CRDs remain and must be deleted manually: kubectl delete crd prometheuses.monitoring.coreos.com servicemonitors.monitoring.coreos.com podmonitors.monitoring.coreos.com alertmanagers.monitoring.coreos.com prometheusrules.monitoring.coreos.com.
Phase 2 – Logging (Loki + Promtail)#
Decision Point: Promtail vs Fluent Bit#
- Promtail: Simple, purpose-built for Loki. If Loki is your only log destination, use Promtail. Less configuration surface, fewer moving parts.
- Fluent Bit: Use if you need to send logs to multiple destinations (Loki + Elasticsearch + S3) or need advanced processing (multiline parsing, Lua scripting).
This sequence uses Promtail. Substitute Fluent Bit if your requirements demand it.
Step 7: Install Loki#
cat <<'EOF' > /tmp/loki-values.yaml
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
storage:
type: filesystem
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
singleBinary:
replicas: 1
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
memory: 1Gi
persistence:
enabled: true
size: 20Gi
gateway:
enabled: false
monitoring:
selfMonitoring:
enabled: false
grafanaAgent:
installOperator: false
lokiCanary:
enabled: false
test:
enabled: false
EOF
helm install loki grafana/loki \
--namespace monitoring \
--values /tmp/loki-values.yamlFor production workloads with high log volume (above 100GB/day), use the simple-scalable or distributed deployment mode instead of singleBinary. The single binary deployment is suitable for small-to-medium clusters.
Step 8: Install Promtail#
cat <<'EOF' > /tmp/promtail-values.yaml
config:
clients:
- url: http://loki:3100/loki/api/v1/push
snippets:
pipelineStages:
- cri: {}
- json:
expressions:
level: level
msg: msg
ts: ts
- labels:
level:
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
memory: 128Mi
tolerations:
- operator: Exists
effect: NoSchedule
EOF
helm install promtail grafana/promtail \
--namespace monitoring \
--values /tmp/promtail-values.yamlStep 9: Add Loki as Grafana Data Source#
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasource-loki
namespace: monitoring
labels:
grafana_datasource: "1"
data:
loki-datasource.yaml: |
apiVersion: 1
datasources:
- name: Loki
type: loki
url: http://loki:3100
access: proxy
isDefault: false
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: "trace_id=(\\w+)"
name: TraceID
url: "$${__value.raw}"
EOFThe derivedFields section pre-configures log-to-trace correlation. It will activate once Tempo is installed in Phase 3.
Phase 2 Verification#
# Promtail running on every node
kubectl get daemonset promtail -n monitoring
# desired == ready == number of nodes
# Loki accepting data
kubectl port-forward svc/loki 3100:3100 -n monitoring &
curl -s http://localhost:3100/ready
# Should return "ready"
# Query recent logs through Grafana
# In Grafana Explore, select Loki data source
# Query: {namespace="monitoring"} -- should return log linesRollback: helm uninstall promtail -n monitoring && helm uninstall loki -n monitoring. PVCs will remain.
Phase 3 – Tracing (Tempo + OpenTelemetry)#
Decision Point: Tempo vs Jaeger#
- Tempo: Native Grafana integration, lower operational overhead, cost-effective (uses object storage). Preferred when Grafana is your single pane of glass.
- Jaeger: More mature, richer query UI, better for teams that need advanced trace analysis outside Grafana. Higher operational overhead (requires Elasticsearch or Cassandra).
This sequence uses Tempo for Grafana-native correlation.
Step 14: Install Tempo#
cat <<'EOF' > /tmp/tempo-values.yaml
tempo:
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
memory: 1Gi
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
retention: 72h
persistence:
enabled: true
size: 20Gi
tempoQuery:
enabled: true
EOF
helm install tempo grafana/tempo \
--namespace monitoring \
--values /tmp/tempo-values.yamlStep 15: Install OpenTelemetry Collector#
The OTel Collector receives traces from applications and forwards them to Tempo. Deploy as a DaemonSet for node-level collection.
cat <<'EOF' > /tmp/otel-collector-values.yaml
mode: daemonset
config:
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
processors:
batch:
timeout: 5s
send_batch_size: 1000
memory_limiter:
check_interval: 5s
limit_mib: 256
spike_limit_mib: 128
exporters:
otlp/tempo:
endpoint: "tempo:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/tempo]
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
memory: 256Mi
EOF
helm install otel-collector open-telemetry/opentelemetry-collector \
--namespace monitoring \
--values /tmp/otel-collector-values.yamlStep 16: Configure Application Instrumentation#
Applications need to send traces to the OTel Collector. The approach depends on language and framework:
Option A – OTel Operator Auto-Instrumentation (zero code changes):
# Install the OTel Operator
helm install opentelemetry-operator open-telemetry/opentelemetry-operator \
--namespace monitoring \
--set admissionWebhooks.certManager.enabled=true
# Create an Instrumentation resource
cat <<'EOF' | kubectl apply -f -
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: auto-instrumentation
namespace: app-production
spec:
exporter:
endpoint: http://otel-collector-opentelemetry-collector.monitoring:4317
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_traceidratio
argument: "0.1"
python:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
java:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
nodejs:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest
EOF
# Annotate deployments to opt in
kubectl patch deployment my-app -n app-production -p '
{"spec":{"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-java":"true"}}}}}'Option B – Manual SDK instrumentation:
Set environment variables on your application pods:
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector-opentelemetry-collector.monitoring:4317"
- name: OTEL_SERVICE_NAME
value: "my-service"
- name: OTEL_TRACES_SAMPLER
value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
value: "0.1"Step 17: Add Tempo as Grafana Data Source#
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasource-tempo
namespace: monitoring
labels:
grafana_datasource: "1"
data:
tempo-datasource.yaml: |
apiVersion: 1
datasources:
- name: Tempo
type: tempo
uid: tempo
url: http://tempo:3100
access: proxy
isDefault: false
jsonData:
tracesToLogsV2:
datasourceUid: loki
filterByTraceID: true
filterBySpanID: false
tracesToMetrics:
datasourceUid: prometheus
queries:
- name: Request rate
query: "sum(rate(http_server_request_duration_seconds_count{$$__tags}[5m]))"
serviceMap:
datasourceUid: prometheus
nodeGraph:
enabled: true
EOFPhase 3 Verification#
# Tempo ready
kubectl port-forward svc/tempo 3100:3100 -n monitoring &
curl -s http://localhost:3100/ready
# Should return "ready"
# OTel Collector running on all nodes
kubectl get daemonset otel-collector-opentelemetry-collector -n monitoring
# Generate a test trace (if an instrumented app is running)
# Then in Grafana Explore, select Tempo, search by service name
# A trace should appear with spansRollback: helm uninstall otel-collector -n monitoring && helm uninstall tempo -n monitoring. If OTel Operator was installed: helm uninstall opentelemetry-operator -n monitoring.
Phase 4 – Correlation#
This phase connects the three signals so you can move seamlessly between metrics, logs, and traces.
Step 19: Configure Grafana Data Source Links#
The data source ConfigMaps from Phases 2 and 3 already include cross-references (derivedFields in Loki pointing to Tempo, tracesToLogsV2 in Tempo pointing to Loki). Verify they work.
Update the Prometheus data source to link to Tempo via exemplars:
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasource-prometheus-exemplars
namespace: monitoring
labels:
grafana_datasource: "1"
data:
prometheus-exemplars.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
uid: prometheus
url: http://kube-prometheus-prometheus:9090
access: proxy
isDefault: true
jsonData:
exemplarTraceIdDestinations:
- name: traceID
datasourceUid: tempo
httpMethod: POST
EOFStep 20: Add Trace IDs to Logs#
Applications must include trace context in their log output. The exact implementation depends on language, but the pattern is the same: extract the trace ID from the OTel context and include it as a structured field.
Example for a Go application using slog:
// Middleware that adds trace_id to the logger context
func tracingMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
span := trace.SpanFromContext(r.Context())
if span.SpanContext().IsValid() {
ctx := r.Context()
logger := slog.With("trace_id", span.SpanContext().TraceID().String())
ctx = context.WithValue(ctx, loggerKey, logger)
r = r.WithContext(ctx)
}
next.ServeHTTP(w, r)
})
}The key is that the field name trace_id matches the regex in the Loki data source derivedFields configuration from Step 9.
Step 21: Enable Exemplars in Prometheus#
Exemplars link individual metric samples to the trace that generated them. Your application must emit exemplars in its Prometheus metrics:
// Go example with prometheus client
histogram.With(labels).Observe(duration, prometheus.Labels{"traceID": span.SpanContext().TraceID().String()})Prometheus must be configured to store exemplars (enabled by default in kube-prometheus-stack).
Phase 4 Verification#
The acid test for correlation: starting from a single alert, you should be able to navigate the full path.
- Open an alert in Grafana
- Click through to the dashboard panel that triggered it (metrics)
- Click “Explore” on a spike in the graph
- Split the Explore view, add Loki as second panel
- Click a log line that contains a
trace_id - The trace opens in Tempo showing the full request path
If any link in this chain is broken, check the data source uid references match between ConfigMaps.
Rollback: Correlation is purely configuration. Delete the data source ConfigMaps and restart the Grafana pod to revert.
Phase 5 – Alerting#
Step 23: Configure Alertmanager Receivers#
# Create a secret with the Alertmanager configuration
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-config
namespace: monitoring
stringData:
alertmanager.yaml: |
global:
resolve_timeout: 5m
slack_api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
route:
receiver: "default-slack"
group_by: ["alertname", "namespace", "job"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: "pagerduty-oncall"
group_wait: 10s
repeat_interval: 1h
- match:
severity: warning
receiver: "team-slack"
repeat_interval: 12h
receivers:
- name: "default-slack"
slack_configs:
- channel: "#alerts"
title: '[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}'
text: '{{ range .Alerts }}*{{ .Annotations.summary }}*\n{{ .Annotations.description }}\n{{ end }}'
send_resolved: true
- name: "pagerduty-oncall"
pagerduty_configs:
- service_key: "YOUR_PAGERDUTY_SERVICE_KEY"
severity: '{{ .CommonLabels.severity }}'
- name: "team-slack"
slack_configs:
- channel: "#team-alerts"
send_resolved: true
EOFStep 24-25: Create Alerting Rules#
cat <<'EOF' | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: infrastructure-alerts
namespace: monitoring
labels:
release: kube-prometheus
spec:
groups:
- name: node-health
rules:
- alert: NodeNotReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.node }} is not ready"
description: "Node has been NotReady for more than 5 minutes."
- alert: NodeHighCPU
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 15m
labels:
severity: warning
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU usage above 85% for 15 minutes."
- alert: NodeDiskPressure
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
for: 10m
labels:
severity: warning
annotations:
summary: "Disk space low on {{ $labels.instance }}"
description: "Root filesystem has less than 15% free space."
- name: application-health
rules:
- alert: HighErrorRate
expr: sum(rate(http_server_request_duration_seconds_count{http_status_code=~"5.."}[5m])) by (service) / sum(rate(http_server_request_duration_seconds_count[5m])) by (service) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate for {{ $labels.service }}"
description: "Error rate above 5% for 5 minutes."
- alert: HighLatencyP99
expr: histogram_quantile(0.99, sum(rate(http_server_request_duration_seconds_bucket[5m])) by (le, service)) > 2
for: 10m
labels:
severity: warning
annotations:
summary: "High P99 latency for {{ $labels.service }}"
description: "P99 latency above 2 seconds for 10 minutes."
EOFStep 26: Dead Man’s Switch#
A dead man’s switch alert fires continuously. If it stops firing, it means Prometheus or Alertmanager is broken.
# Add to the PrometheusRule above
- name: meta
rules:
- alert: DeadMansSwitch
expr: vector(1)
labels:
severity: none
annotations:
summary: "Dead man's switch - alerting pipeline is healthy"Configure a receiver (like Healthchecks.io or PagerDuty’s heartbeat) that expects to receive this alert periodically and pages you if it stops.
Phase 5 Verification#
# Check that alerting rules are loaded
kubectl port-forward svc/kube-prometheus-prometheus 9090:9090 -n monitoring &
curl -s http://localhost:9090/api/v1/rules | jq '.data.groups | length'
# Should be greater than 0
# Check Alertmanager configuration
kubectl port-forward svc/alertmanager-operated 9093:9093 -n monitoring &
curl -s http://localhost:9093/api/v2/status | jq '.config.original'
# Should show your custom config
# Trigger a test alert
curl -XPOST http://localhost:9093/api/v2/alerts -H "Content-Type: application/json" -d '[
{
"labels": {"alertname": "TestAlert", "severity": "warning", "namespace": "test"},
"annotations": {"summary": "Test alert - please ignore"},
"startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
}
]'
# Verify it appears in Slack within 30 secondsRollback: Delete the PrometheusRule resources and the alertmanager-config secret. Alertmanager will revert to its default configuration on the next restart.
Complete Stack Summary#
| Component | Purpose | Port | Helm Release |
|---|---|---|---|
| Prometheus | Metrics collection and storage | 9090 | kube-prometheus |
| Grafana | Visualization and dashboards | 3000 | kube-prometheus (bundled) |
| Alertmanager | Alert routing and notification | 9093 | kube-prometheus (bundled) |
| Loki | Log aggregation | 3100 | loki |
| Promtail | Log shipping (DaemonSet) | – | promtail |
| Tempo | Trace storage | 3100 | tempo |
| OTel Collector | Trace collection (DaemonSet) | 4317/4318 | otel-collector |
Total resource overhead for a small cluster: approximately 4 CPU cores and 8GB memory for the full stack. Scale individual components based on ingestion volume.