Prometheus Architecture#

Prometheus pulls metrics from targets at regular intervals (scraping). Each target exposes an HTTP endpoint (typically /metrics) that returns metrics in a text format. Prometheus stores the scraped data in a local time-series database and evaluates alerting rules against it. Grafana connects to Prometheus as a data source and renders dashboards.

Scrape Configuration#

The core of Prometheus configuration is the scrape config. Each scrape_config block defines a set of targets and how to scrape them.

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "app"
    metrics_path: /metrics
    static_configs:
      - targets: ["app:8080"]
        labels:
          env: "production"

  - job_name: "node"
    static_configs:
      - targets: ["node-exporter:9100"]

  - job_name: "postgres"
    static_configs:
      - targets: ["postgres-exporter:9187"]

For dynamic environments, use service discovery instead of static configs. In Kubernetes, Prometheus discovers pods and services via the API server:

scrape_configs:
  - job_name: "kubernetes-pods"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_ip]
        action: replace
        target_label: __address__
        regex: (.+);(.+)
        replacement: $2:$1

This scrapes any pod with the annotation prometheus.io/scrape: "true". The relabel configs extract the metrics path and port from pod annotations.

PromQL Essentials#

PromQL is Prometheus’s query language. Every query returns either an instant vector (one value per time series) or a range vector (multiple values over time).

# Current CPU usage rate per instance (last 5 minutes)
rate(node_cpu_seconds_total{mode!="idle"}[5m])

# Total HTTP requests per second by status code
sum by (status_code) (rate(http_requests_total[5m]))

# 95th percentile request latency
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Disk space remaining percentage
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100

# Container CPU usage in a Kubernetes cluster
sum by (pod) (rate(container_cpu_usage_seconds_total{container!=""}[5m]))

# Container memory working set
sum by (pod) (container_memory_working_set_bytes{container!=""})

Key functions: rate() computes per-second average, increase() gives total increase, sum by () aggregates across labels, histogram_quantile() computes percentiles from histogram buckets.

Alerting Rules#

Alerting rules are evaluated by Prometheus and fired when conditions hold for a specified duration. Alerts are sent to Alertmanager, which handles routing, deduplication, and notification.

# alert-rules.yml
groups:
  - name: infrastructure
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU usage above 80% for 10 minutes. Current: {{ $value | printf \"%.1f\" }}%"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"

      - alert: HighErrorRate
        expr: sum by (job) (rate(http_requests_total{status_code=~"5.."}[5m])) / sum by (job) (rate(http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Error rate above 5% for {{ $labels.job }}"

      - alert: PodCrashLooping
        expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Pod {{ $labels.pod }} is crash looping"

The for duration prevents alerting on brief spikes. A condition must hold continuously for the entire duration before the alert fires.

Grafana Data Sources and Dashboards#

Connect Grafana to Prometheus by adding it as a data source. This can be provisioned declaratively:

# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

Dashboard provisioning loads JSON dashboards from files on startup:

# grafana/provisioning/dashboards/dashboards.yml
apiVersion: 1
providers:
  - name: default
    orgId: 1
    folder: ""
    type: file
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: true

Place dashboard JSON files in the configured path. Export dashboards from the Grafana UI (Share > Export > Save to file) and commit them to version control for reproducibility.

USE and RED Methods#

Structure your monitoring around established methodologies.

USE method (for infrastructure resources – CPU, memory, disk, network):

  • Utilization: What percentage of the resource is in use? (node_cpu_seconds_total, node_memory_MemAvailable_bytes)
  • Saturation: Is work queuing? (node_load1 vs CPU count, swap usage)
  • Errors: Are there error conditions? (node_disk_io_errors, node_network_receive_errs_total)

RED method (for request-driven services – APIs, web servers):

  • Rate: Requests per second (rate(http_requests_total[5m]))
  • Errors: Error rate or ratio (rate(http_requests_total{status_code=~"5.."}[5m]))
  • Duration: Latency distribution (histogram_quantile(0.95, ...))

kube-prometheus-stack for Kubernetes#

The kube-prometheus-stack Helm chart deploys Prometheus, Grafana, Alertmanager, node-exporter, and kube-state-metrics in one shot. It is the standard way to monitor a Kubernetes cluster.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set grafana.adminPassword=admin \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

This deploys Prometheus scraping the Kubernetes API, kubelet, node-exporter, and kube-state-metrics. Grafana comes with dashboards for cluster health, node resources, and pod workloads. Access Grafana with kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80.

To add custom scrape targets, create a ServiceMonitor resource:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp
  namespace: monitoring
  labels:
    release: monitoring    # must match the Helm release label selector
spec:
  namespaceSelector:
    matchNames:
      - default
  selector:
    matchLabels:
      app: myapp
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

The Prometheus Operator watches for ServiceMonitor resources and updates the scrape configuration automatically. The release: monitoring label is critical – without it, the Operator ignores the ServiceMonitor.