Setting Up Full Observability from Scratch#

This operational sequence deploys a complete observability stack on Kubernetes: metrics (Prometheus + Grafana), logs (Loki + Promtail), traces (Tempo + OpenTelemetry), and alerting (Alertmanager). Each phase is self-contained with verification steps. Complete them in order – later phases depend on earlier infrastructure.

Prerequisite: a running Kubernetes cluster with Helm installed and a monitoring namespace created.

kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

Phase 1 – Metrics (Prometheus + Grafana)#

Metrics are the foundation. Logging and tracing integrations all route through Grafana, so this phase must be solid before continuing.

Step 1: Install kube-prometheus-stack#

This single Helm chart deploys Prometheus, Grafana, Alertmanager, node-exporter, kube-state-metrics, and the Prometheus Operator with ServiceMonitor CRDs.

cat <<'EOF' > /tmp/prometheus-values.yaml
prometheus:
  prometheusSpec:
    retention: 15d
    retentionSize: "45GB"
    resources:
      requests:
        cpu: 500m
        memory: 2Gi
      limits:
        memory: 4Gi
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false

grafana:
  persistence:
    enabled: true
    size: 10Gi
  adminPassword: "change-me-immediately"
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
      - name: default
        orgId: 1
        folder: ""
        type: file
        disableDeletion: false
        editable: true
        options:
          path: /var/lib/grafana/dashboards/default
  sidecar:
    dashboards:
      enabled: true
      searchNamespace: monitoring
    datasources:
      enabled: true

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi
    resources:
      requests:
        cpu: 50m
        memory: 64Mi

nodeExporter:
  resources:
    requests:
      cpu: 50m
      memory: 32Mi
EOF

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values /tmp/prometheus-values.yaml

The key setting is serviceMonitorSelectorNilUsesHelmValues: false. Without this, Prometheus only discovers ServiceMonitors created by this Helm release, ignoring all others. This is the single most common reason people install Prometheus and wonder why their application metrics do not appear.

Step 2: Verify ServiceMonitor Discovery#

# Wait for Prometheus to be ready
kubectl wait --for=condition=Ready pods -l app.kubernetes.io/name=prometheus -n monitoring --timeout=300s

# Check that Prometheus has discovered targets
kubectl port-forward svc/kube-prometheus-prometheus 9090:9090 -n monitoring &
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets | length'

Pass: Target count is greater than 0. Typical fresh install discovers 10-15 targets (kubelet, apiserver, node-exporter, kube-state-metrics, etc.).

Step 3: Import Essential Dashboards#

The kube-prometheus-stack installs several dashboards automatically via ConfigMap sidecar. Verify they loaded:

kubectl port-forward svc/kube-prometheus-grafana 3000:80 -n monitoring &
# Open http://localhost:3000 -- login with admin / change-me-immediately
# Navigate to Dashboards -- you should see:
#   - Kubernetes / Compute Resources / Cluster
#   - Kubernetes / Compute Resources / Namespace (Pods)
#   - Node Exporter / Nodes
#   - Prometheus / Overview

Step 4: Configure Remote Write (Optional)#

If you need long-term metric storage beyond 15 days, configure remote write to Thanos or Grafana Mimir:

# Add to prometheus-values.yaml under prometheus.prometheusSpec:
remoteWrite:
- url: "http://mimir-distributor.monitoring:8080/api/v1/push"
  queueConfig:
    maxSamplesPerSend: 1000
    maxShards: 10
    capacity: 2500

Phase 1 Verification#

kubectl top nodes                                          # metrics-server (separate from Prometheus) works
kubectl top pods -n monitoring                             # all monitoring pods running
kubectl get servicemonitor -n monitoring                   # ServiceMonitors exist
kubectl get prometheusrule -n monitoring                   # Alerting rules loaded
curl -s http://localhost:9090/api/v1/targets | \
  jq '[.data.activeTargets[] | select(.health=="up")] | length'  # all targets healthy

Rollback: helm uninstall kube-prometheus -n monitoring. CRDs remain and must be deleted manually: kubectl delete crd prometheuses.monitoring.coreos.com servicemonitors.monitoring.coreos.com podmonitors.monitoring.coreos.com alertmanagers.monitoring.coreos.com prometheusrules.monitoring.coreos.com.


Phase 2 – Logging (Loki + Promtail)#

Decision Point: Promtail vs Fluent Bit#

  • Promtail: Simple, purpose-built for Loki. If Loki is your only log destination, use Promtail. Less configuration surface, fewer moving parts.
  • Fluent Bit: Use if you need to send logs to multiple destinations (Loki + Elasticsearch + S3) or need advanced processing (multiline parsing, Lua scripting).

This sequence uses Promtail. Substitute Fluent Bit if your requirements demand it.

Step 7: Install Loki#

cat <<'EOF' > /tmp/loki-values.yaml
loki:
  auth_enabled: false
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem
  schemaConfig:
    configs:
    - from: "2024-01-01"
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

singleBinary:
  replicas: 1
  resources:
    requests:
      cpu: 200m
      memory: 512Mi
    limits:
      memory: 1Gi
  persistence:
    enabled: true
    size: 20Gi

gateway:
  enabled: false

monitoring:
  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false
  lokiCanary:
    enabled: false

test:
  enabled: false
EOF

helm install loki grafana/loki \
  --namespace monitoring \
  --values /tmp/loki-values.yaml

For production workloads with high log volume (above 100GB/day), use the simple-scalable or distributed deployment mode instead of singleBinary. The single binary deployment is suitable for small-to-medium clusters.

Step 8: Install Promtail#

cat <<'EOF' > /tmp/promtail-values.yaml
config:
  clients:
  - url: http://loki:3100/loki/api/v1/push

  snippets:
    pipelineStages:
    - cri: {}
    - json:
        expressions:
          level: level
          msg: msg
          ts: ts
    - labels:
        level:

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    memory: 128Mi

tolerations:
- operator: Exists
  effect: NoSchedule
EOF

helm install promtail grafana/promtail \
  --namespace monitoring \
  --values /tmp/promtail-values.yaml

Step 9: Add Loki as Grafana Data Source#

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasource-loki
  namespace: monitoring
  labels:
    grafana_datasource: "1"
data:
  loki-datasource.yaml: |
    apiVersion: 1
    datasources:
    - name: Loki
      type: loki
      url: http://loki:3100
      access: proxy
      isDefault: false
      jsonData:
        derivedFields:
        - datasourceUid: tempo
          matcherRegex: "trace_id=(\\w+)"
          name: TraceID
          url: "$${__value.raw}"
EOF

The derivedFields section pre-configures log-to-trace correlation. It will activate once Tempo is installed in Phase 3.

Phase 2 Verification#

# Promtail running on every node
kubectl get daemonset promtail -n monitoring
# desired == ready == number of nodes

# Loki accepting data
kubectl port-forward svc/loki 3100:3100 -n monitoring &
curl -s http://localhost:3100/ready
# Should return "ready"

# Query recent logs through Grafana
# In Grafana Explore, select Loki data source
# Query: {namespace="monitoring"} -- should return log lines

Rollback: helm uninstall promtail -n monitoring && helm uninstall loki -n monitoring. PVCs will remain.


Phase 3 – Tracing (Tempo + OpenTelemetry)#

Decision Point: Tempo vs Jaeger#

  • Tempo: Native Grafana integration, lower operational overhead, cost-effective (uses object storage). Preferred when Grafana is your single pane of glass.
  • Jaeger: More mature, richer query UI, better for teams that need advanced trace analysis outside Grafana. Higher operational overhead (requires Elasticsearch or Cassandra).

This sequence uses Tempo for Grafana-native correlation.

Step 14: Install Tempo#

cat <<'EOF' > /tmp/tempo-values.yaml
tempo:
  resources:
    requests:
      cpu: 200m
      memory: 512Mi
    limits:
      memory: 1Gi
  storage:
    trace:
      backend: local
      local:
        path: /var/tempo/traces
      wal:
        path: /var/tempo/wal
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: "0.0.0.0:4317"
        http:
          endpoint: "0.0.0.0:4318"
  retention: 72h

persistence:
  enabled: true
  size: 20Gi

tempoQuery:
  enabled: true
EOF

helm install tempo grafana/tempo \
  --namespace monitoring \
  --values /tmp/tempo-values.yaml

Step 15: Install OpenTelemetry Collector#

The OTel Collector receives traces from applications and forwards them to Tempo. Deploy as a DaemonSet for node-level collection.

cat <<'EOF' > /tmp/otel-collector-values.yaml
mode: daemonset

config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: "0.0.0.0:4317"
        http:
          endpoint: "0.0.0.0:4318"

  processors:
    batch:
      timeout: 5s
      send_batch_size: 1000
    memory_limiter:
      check_interval: 5s
      limit_mib: 256
      spike_limit_mib: 128

  exporters:
    otlp/tempo:
      endpoint: "tempo:4317"
      tls:
        insecure: true

  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [otlp/tempo]

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    memory: 256Mi
EOF

helm install otel-collector open-telemetry/opentelemetry-collector \
  --namespace monitoring \
  --values /tmp/otel-collector-values.yaml

Step 16: Configure Application Instrumentation#

Applications need to send traces to the OTel Collector. The approach depends on language and framework:

Option A – OTel Operator Auto-Instrumentation (zero code changes):

# Install the OTel Operator
helm install opentelemetry-operator open-telemetry/opentelemetry-operator \
  --namespace monitoring \
  --set admissionWebhooks.certManager.enabled=true

# Create an Instrumentation resource
cat <<'EOF' | kubectl apply -f -
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: auto-instrumentation
  namespace: app-production
spec:
  exporter:
    endpoint: http://otel-collector-opentelemetry-collector.monitoring:4317
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "0.1"
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest
EOF

# Annotate deployments to opt in
kubectl patch deployment my-app -n app-production -p '
  {"spec":{"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-java":"true"}}}}}'

Option B – Manual SDK instrumentation:

Set environment variables on your application pods:

env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
  value: "http://otel-collector-opentelemetry-collector.monitoring:4317"
- name: OTEL_SERVICE_NAME
  value: "my-service"
- name: OTEL_TRACES_SAMPLER
  value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
  value: "0.1"

Step 17: Add Tempo as Grafana Data Source#

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasource-tempo
  namespace: monitoring
  labels:
    grafana_datasource: "1"
data:
  tempo-datasource.yaml: |
    apiVersion: 1
    datasources:
    - name: Tempo
      type: tempo
      uid: tempo
      url: http://tempo:3100
      access: proxy
      isDefault: false
      jsonData:
        tracesToLogsV2:
          datasourceUid: loki
          filterByTraceID: true
          filterBySpanID: false
        tracesToMetrics:
          datasourceUid: prometheus
          queries:
          - name: Request rate
            query: "sum(rate(http_server_request_duration_seconds_count{$$__tags}[5m]))"
        serviceMap:
          datasourceUid: prometheus
        nodeGraph:
          enabled: true
EOF

Phase 3 Verification#

# Tempo ready
kubectl port-forward svc/tempo 3100:3100 -n monitoring &
curl -s http://localhost:3100/ready
# Should return "ready"

# OTel Collector running on all nodes
kubectl get daemonset otel-collector-opentelemetry-collector -n monitoring

# Generate a test trace (if an instrumented app is running)
# Then in Grafana Explore, select Tempo, search by service name
# A trace should appear with spans

Rollback: helm uninstall otel-collector -n monitoring && helm uninstall tempo -n monitoring. If OTel Operator was installed: helm uninstall opentelemetry-operator -n monitoring.


Phase 4 – Correlation#

This phase connects the three signals so you can move seamlessly between metrics, logs, and traces.

The data source ConfigMaps from Phases 2 and 3 already include cross-references (derivedFields in Loki pointing to Tempo, tracesToLogsV2 in Tempo pointing to Loki). Verify they work.

Update the Prometheus data source to link to Tempo via exemplars:

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasource-prometheus-exemplars
  namespace: monitoring
  labels:
    grafana_datasource: "1"
data:
  prometheus-exemplars.yaml: |
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      uid: prometheus
      url: http://kube-prometheus-prometheus:9090
      access: proxy
      isDefault: true
      jsonData:
        exemplarTraceIdDestinations:
        - name: traceID
          datasourceUid: tempo
        httpMethod: POST
EOF

Step 20: Add Trace IDs to Logs#

Applications must include trace context in their log output. The exact implementation depends on language, but the pattern is the same: extract the trace ID from the OTel context and include it as a structured field.

Example for a Go application using slog:

// Middleware that adds trace_id to the logger context
func tracingMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        span := trace.SpanFromContext(r.Context())
        if span.SpanContext().IsValid() {
            ctx := r.Context()
            logger := slog.With("trace_id", span.SpanContext().TraceID().String())
            ctx = context.WithValue(ctx, loggerKey, logger)
            r = r.WithContext(ctx)
        }
        next.ServeHTTP(w, r)
    })
}

The key is that the field name trace_id matches the regex in the Loki data source derivedFields configuration from Step 9.

Step 21: Enable Exemplars in Prometheus#

Exemplars link individual metric samples to the trace that generated them. Your application must emit exemplars in its Prometheus metrics:

// Go example with prometheus client
histogram.With(labels).Observe(duration, prometheus.Labels{"traceID": span.SpanContext().TraceID().String()})

Prometheus must be configured to store exemplars (enabled by default in kube-prometheus-stack).

Phase 4 Verification#

The acid test for correlation: starting from a single alert, you should be able to navigate the full path.

  1. Open an alert in Grafana
  2. Click through to the dashboard panel that triggered it (metrics)
  3. Click “Explore” on a spike in the graph
  4. Split the Explore view, add Loki as second panel
  5. Click a log line that contains a trace_id
  6. The trace opens in Tempo showing the full request path

If any link in this chain is broken, check the data source uid references match between ConfigMaps.

Rollback: Correlation is purely configuration. Delete the data source ConfigMaps and restart the Grafana pod to revert.


Phase 5 – Alerting#

Step 23: Configure Alertmanager Receivers#

# Create a secret with the Alertmanager configuration
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-config
  namespace: monitoring
stringData:
  alertmanager.yaml: |
    global:
      resolve_timeout: 5m
      slack_api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"

    route:
      receiver: "default-slack"
      group_by: ["alertname", "namespace", "job"]
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      routes:
      - match:
          severity: critical
        receiver: "pagerduty-oncall"
        group_wait: 10s
        repeat_interval: 1h
      - match:
          severity: warning
        receiver: "team-slack"
        repeat_interval: 12h

    receivers:
    - name: "default-slack"
      slack_configs:
      - channel: "#alerts"
        title: '[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}'
        text: '{{ range .Alerts }}*{{ .Annotations.summary }}*\n{{ .Annotations.description }}\n{{ end }}'
        send_resolved: true

    - name: "pagerduty-oncall"
      pagerduty_configs:
      - service_key: "YOUR_PAGERDUTY_SERVICE_KEY"
        severity: '{{ .CommonLabels.severity }}'

    - name: "team-slack"
      slack_configs:
      - channel: "#team-alerts"
        send_resolved: true
EOF

Step 24-25: Create Alerting Rules#

cat <<'EOF' | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: infrastructure-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus
spec:
  groups:
  - name: node-health
    rules:
    - alert: NodeNotReady
      expr: kube_node_status_condition{condition="Ready",status="true"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.node }} is not ready"
        description: "Node has been NotReady for more than 5 minutes."

    - alert: NodeHighCPU
      expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "High CPU on {{ $labels.instance }}"
        description: "CPU usage above 85% for 15 minutes."

    - alert: NodeDiskPressure
      expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Disk space low on {{ $labels.instance }}"
        description: "Root filesystem has less than 15% free space."

  - name: application-health
    rules:
    - alert: HighErrorRate
      expr: sum(rate(http_server_request_duration_seconds_count{http_status_code=~"5.."}[5m])) by (service) / sum(rate(http_server_request_duration_seconds_count[5m])) by (service) > 0.05
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "High error rate for {{ $labels.service }}"
        description: "Error rate above 5% for 5 minutes."

    - alert: HighLatencyP99
      expr: histogram_quantile(0.99, sum(rate(http_server_request_duration_seconds_bucket[5m])) by (le, service)) > 2
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "High P99 latency for {{ $labels.service }}"
        description: "P99 latency above 2 seconds for 10 minutes."
EOF

Step 26: Dead Man’s Switch#

A dead man’s switch alert fires continuously. If it stops firing, it means Prometheus or Alertmanager is broken.

# Add to the PrometheusRule above
- name: meta
  rules:
  - alert: DeadMansSwitch
    expr: vector(1)
    labels:
      severity: none
    annotations:
      summary: "Dead man's switch - alerting pipeline is healthy"

Configure a receiver (like Healthchecks.io or PagerDuty’s heartbeat) that expects to receive this alert periodically and pages you if it stops.

Phase 5 Verification#

# Check that alerting rules are loaded
kubectl port-forward svc/kube-prometheus-prometheus 9090:9090 -n monitoring &
curl -s http://localhost:9090/api/v1/rules | jq '.data.groups | length'
# Should be greater than 0

# Check Alertmanager configuration
kubectl port-forward svc/alertmanager-operated 9093:9093 -n monitoring &
curl -s http://localhost:9093/api/v2/status | jq '.config.original'
# Should show your custom config

# Trigger a test alert
curl -XPOST http://localhost:9093/api/v2/alerts -H "Content-Type: application/json" -d '[
  {
    "labels": {"alertname": "TestAlert", "severity": "warning", "namespace": "test"},
    "annotations": {"summary": "Test alert - please ignore"},
    "startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
  }
]'
# Verify it appears in Slack within 30 seconds

Rollback: Delete the PrometheusRule resources and the alertmanager-config secret. Alertmanager will revert to its default configuration on the next restart.


Complete Stack Summary#

Component Purpose Port Helm Release
Prometheus Metrics collection and storage 9090 kube-prometheus
Grafana Visualization and dashboards 3000 kube-prometheus (bundled)
Alertmanager Alert routing and notification 9093 kube-prometheus (bundled)
Loki Log aggregation 3100 loki
Promtail Log shipping (DaemonSet) promtail
Tempo Trace storage 3100 tempo
OTel Collector Trace collection (DaemonSet) 4317/4318 otel-collector

Total resource overhead for a small cluster: approximately 4 CPU cores and 8GB memory for the full stack. Scale individual components based on ingestion volume.