OpenTelemetry for Kubernetes

What OpenTelemetry Is#

OpenTelemetry (OTel) is a vendor-neutral framework for generating, collecting, and exporting telemetry data: traces, metrics, and logs. It provides APIs, SDKs, and the Collector – a standalone binary that receives, processes, and exports telemetry. OTel replaces the fragmented landscape of Jaeger client libraries, Zipkin instrumentation, Prometheus client libraries, and proprietary agents with a single standard.

The three signal types:

Traces: Record the path of a request through distributed services as a tree of spans. Each span has a name, duration, attributes, and parent reference.
Metrics: Numeric measurements (counters, gauges, histograms) emitted by applications and infrastructure. OTel metrics can be exported to Prometheus.
Logs: Structured log records correlated with trace context. OTel log support bridges existing logging libraries with trace correlation.

The OTel Collector Pipeline#

The Collector is the central hub. It has three pipeline stages:

Receivers ingest data. They listen on network ports or pull from sources:

otlp: Receives OTLP over gRPC (4317) and HTTP (4318). The primary receiver.
prometheus: Scrapes Prometheus metrics endpoints.
jaeger: Accepts Jaeger Thrift or gRPC spans.
filelog: Tails log files (useful for node-level log collection).

Processors transform data in flight:

batch: Batches telemetry before export to reduce network overhead.
memory_limiter: Prevents OOM by dropping data when memory is high.
attributes: Adds, removes, or modifies span/metric attributes.
filter: Drops telemetry matching specified conditions.
tail_sampling: Makes sampling decisions based on complete traces.
k8sattributes: Enriches telemetry with Kubernetes metadata (pod name, namespace, node).

Exporters send data to backends:

otlp: Forwards to another OTLP-compatible endpoint (Tempo, Jaeger, vendor backends).
prometheus: Exposes a Prometheus scrape endpoint for collected metrics.
loki: Ships logs to Grafana Loki.
debug: Prints telemetry to stdout for development.

Collector Configuration#

A real collector config for Kubernetes:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128
  k8sattributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.deployment.name
        - k8s.node.name
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.ip

exporters:
  otlp/tempo:
    endpoint: tempo.observability:4317
    tls:
      insecure: true
  prometheusremotewrite:
    endpoint: http://prometheus.observability:9090/api/v1/write
  loki:
    endpoint: http://loki.logging:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, k8sattributes, batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, k8sattributes, batch]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, k8sattributes, batch]
      exporters: [loki]

Deployment Modes on Kubernetes#

DaemonSet: One Collector pod per node. Best for collecting node-level telemetry (logs from files, host metrics) and as a local aggregation point. Applications send telemetry to the Collector on their node via NODE_IP:4317.

Sidecar: A Collector container in each application pod. Useful when apps need a dedicated processing pipeline or when the Collector must share a network namespace with the app. Higher resource overhead.

Deployment (Gateway): A centralized Collector pool behind a Service. Applications send telemetry to otel-collector.observability:4317. The gateway handles tail sampling, enrichment, and routing. Scale replicas based on throughput. This is the most common production pattern.

Deploy with the OTel Operator and Helm:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install otel-collector open-telemetry/opentelemetry-collector \
  --namespace observability --create-namespace \
  --set mode=deployment \
  --set replicaCount=2 \
  --values otel-collector-values.yaml

Auto-Instrumentation#

The OTel Operator supports automatic instrumentation for Java, Python, Node.js, and Go without code changes. Install the operator, then create an Instrumentation resource:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: auto-instrumentation
  namespace: default
spec:
  exporter:
    endpoint: http://otel-collector.observability:4317
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "0.1"
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest

Annotate pods to activate injection:

metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-java: "true"
    # or inject-python, inject-nodejs, inject-go

The operator injects an init container that installs the OTel agent, plus environment variables that configure the SDK. No application code changes needed.

Context Propagation#

Traces span multiple services because context is propagated in HTTP headers. The two main formats:

W3C Trace Context: traceparent: 00-<trace-id>-<span-id>-<flags>. The standard. Use this unless you have legacy Zipkin/Jaeger services.
B3 (Zipkin): X-B3-TraceId, X-B3-SpanId, X-B3-Sampled. Used by older Zipkin-instrumented services.

Configure propagators in the SDK or Instrumentation resource. If your mesh includes both old and new services, set propagators: [tracecontext, b3multi] to inject and extract both formats.

Sampling Strategies#

Sampling controls how many traces are recorded, reducing storage and cost.

Head-based sampling: Decided at trace creation. The parentbased_traceidratio sampler keeps a percentage of traces (e.g., 10% with argument "0.1"). Simple but blind – it drops traces before knowing if they contain errors.

Tail-based sampling: Decided after the full trace is assembled. The Collector’s tail_sampling processor can keep all error traces, slow traces, or traces matching specific attributes. Requires a gateway Collector that sees all spans for a trace:

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-requests
        type: latency
        latency: {threshold_ms: 2000}
      - name: percentage
        type: probabilistic
        probabilistic: {sampling_percentage: 5}

This keeps all error traces, all traces over 2 seconds, and 5% of everything else. Tail sampling is more powerful but requires careful memory management since traces must be held in memory until the decision is made.

Resource Attributes#

Resource attributes describe the entity producing telemetry. Set them via environment variables:

env:
  - name: OTEL_RESOURCE_ATTRIBUTES
    value: "service.name=payment-api,service.version=1.4.2,deployment.environment=production"
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://otel-collector.observability:4317"

The k8sattributes processor in the Collector adds Kubernetes-specific attributes automatically, so applications only need to set service.name and service.version.