Writing Custom Prometheus Exporters: Exposing Application and Business Metrics

When to Write a Custom Exporter#

The Prometheus ecosystem has exporters for most infrastructure components: node_exporter for Linux hosts, kube-state-metrics for Kubernetes objects, mysqld_exporter for MySQL, and hundreds more. You write a custom exporter when your application or service does not have a Prometheus endpoint, you need business metrics that no generic exporter can provide (revenue, signups, queue depth), or you need to adapt a non-Prometheus system that exposes metrics in a proprietary format.

The simplest approach is instrumenting your own application code directly – adding a /metrics endpoint to your existing service. Writing a standalone exporter is the right choice when you cannot modify the source application, or when the metrics come from an external system you query on each scrape.

Metric Types#

Prometheus defines four metric types, and choosing the right one matters for correct query behavior.

Counter: a monotonically increasing value that only goes up (or resets to zero on restart). Use for: total request count, total errors, total bytes transferred. Query with rate() or increase() to get per-second or per-interval change. Never use a counter for values that can decrease.

Gauge: a value that can go up and down. Use for: current temperature, queue depth, active connections, memory usage, number of goroutines. Read the raw value or use max_over_time(), min_over_time(), avg_over_time() for trends.

Histogram: samples observations and counts them in configurable buckets. Use for: request latency, response sizes, batch job durations. Stores data in multiple series: _bucket (cumulative counts per bucket), _sum (total sum of observations), _count (total number of observations). Query with histogram_quantile() for percentiles that can be aggregated across instances.

Summary: similar to histogram but calculates quantiles on the client side. The calculated quantiles (p50, p90, p99) cannot be aggregated across instances – you cannot meaningfully average two p99 values. Prefer histograms for almost all use cases. Summaries exist mainly for backward compatibility.

Naming Conventions#

Prometheus naming conventions are not enforced but following them makes metrics understandable across teams and tools.

The pattern is <namespace>_<subsystem>_<name>_<unit>:

http_server_request_duration_seconds – namespace: http, subsystem: server, name: request_duration, unit: seconds
myapp_orders_processed_total – namespace: myapp, name: orders_processed, suffix: total (counter)

Rules to follow:

Use base units: seconds (not milliseconds), bytes (not megabytes), meters (not kilometers)
Counters must end in _total
Use snake_case, not camelCase
Prefix with a namespace that identifies the application or subsystem
Avoid the prometheus_ prefix unless you are instrumenting Prometheus itself

Go Exporter Example#

Using prometheus/client_golang, the standard Go client:

package main

import (
    "net/http"
    "time"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    requestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "myapp_http_requests_total",
            Help: "Total HTTP requests processed.",
        },
        []string{"method", "endpoint", "status_code"},
    )

    activeConnections = prometheus.NewGauge(
        prometheus.GaugeOpts{
            Name: "myapp_active_connections",
            Help: "Number of active connections.",
        },
    )

    requestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "myapp_http_request_duration_seconds",
            Help:    "HTTP request latency in seconds.",
            Buckets: []float64{0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10},
        },
        []string{"method", "endpoint"},
    )
)

func init() {
    prometheus.MustRegister(requestsTotal)
    prometheus.MustRegister(activeConnections)
    prometheus.MustRegister(requestDuration)
}

func handleRequest(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    activeConnections.Inc()
    defer activeConnections.Dec()

    // ... handle the request ...
    w.WriteHeader(http.StatusOK)

    duration := time.Since(start).Seconds()
    requestsTotal.WithLabelValues(r.Method, r.URL.Path, "200").Inc()
    requestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration)
}

func main() {
    http.HandleFunc("/api/", handleRequest)
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}

The /metrics endpoint is served by promhttp.Handler(), which outputs all registered metrics in the Prometheus exposition format. Each metric includes its HELP text and TYPE declaration.

Python Exporter Example#

Using the prometheus_client library:

from prometheus_client import Counter, Gauge, Histogram, start_http_server
import time
import random

# Define metrics
requests_total = Counter(
    'myapp_http_requests_total',
    'Total HTTP requests processed',
    ['method', 'endpoint', 'status_code']
)

active_connections = Gauge(
    'myapp_active_connections',
    'Number of active connections'
)

request_duration = Histogram(
    'myapp_http_request_duration_seconds',
    'HTTP request latency in seconds',
    ['method', 'endpoint'],
    buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
)

# Instrument a function
@request_duration.labels(method='GET', endpoint='/api/data').time()
def process_request():
    time.sleep(random.uniform(0.01, 0.5))
    return {"status": "ok"}

# Manual instrumentation
def handle_request(method, endpoint):
    active_connections.inc()
    start = time.time()
    try:
        # ... handle request ...
        requests_total.labels(method=method, endpoint=endpoint, status_code='200').inc()
    except Exception:
        requests_total.labels(method=method, endpoint=endpoint, status_code='500').inc()
        raise
    finally:
        active_connections.dec()
        duration = time.time() - start
        request_duration.labels(method=method, endpoint=endpoint).observe(duration)

if __name__ == '__main__':
    start_http_server(8000)  # Serves /metrics on port 8000
    while True:
        handle_request('GET', '/api/data')
        time.sleep(1)

The start_http_server() call launches a background HTTP server that serves the /metrics endpoint. For web frameworks like Flask or FastAPI, use the middleware integrations instead.

Collector Pattern#

The examples above continuously update metrics as events happen. The collector pattern is different: metrics are gathered fresh on each scrape. Implement the Collector interface when your metrics come from an external source that you query on demand.

type DatabaseCollector struct {
    db          *sql.DB
    activeConns *prometheus.Desc
    queryTime   *prometheus.Desc
}

func (c *DatabaseCollector) Describe(ch chan<- *prometheus.Desc) {
    ch <- c.activeConns
    ch <- c.queryTime
}

func (c *DatabaseCollector) Collect(ch chan<- prometheus.Metric) {
    // Query the database for current stats on each scrape
    var conns float64
    c.db.QueryRow("SELECT count(*) FROM pg_stat_activity").Scan(&conns)
    ch <- prometheus.MustNewConstMetric(c.activeConns, prometheus.GaugeValue, conns)

    var avgTime float64
    c.db.QueryRow("SELECT avg(total_exec_time) FROM pg_stat_statements").Scan(&avgTime)
    ch <- prometheus.MustNewConstMetric(c.queryTime, prometheus.GaugeValue, avgTime)
}

This pattern is the right choice for standalone exporters that scrape external APIs, read from databases, or poll system interfaces. The data is always fresh because it is collected at scrape time.

ServiceMonitor for Kubernetes#

Once your application exposes a /metrics endpoint, create a ServiceMonitor resource so the Prometheus Operator discovers and scrapes it automatically:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp
  labels:
    release: prometheus  # must match Prometheus operator's serviceMonitorSelector
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
    - port: http
      path: /metrics
      interval: 30s
  namespaceSelector:
    matchNames:
      - production

The selector.matchLabels must match the labels on your Service (not your Deployment). The release: prometheus label (or whatever your Prometheus Operator uses for serviceMonitorSelector) must be present or the ServiceMonitor will be silently ignored.

Labels and Cardinality#

Labels add dimensions to metrics. http_requests_total{method="GET", endpoint="/api/users", status_code="200"} is a specific time series. Each unique combination of label values creates a new series.

Keep cardinality low. Good label values are bounded and predictable: HTTP methods (GET, POST, PUT, DELETE), status code classes (2xx, 3xx, 4xx, 5xx), environment names, service names. Bad label values are unbounded: user IDs, request IDs, email addresses, full URL paths with dynamic segments, IP addresses.

A metric with 5 labels, each with 100 unique values, creates a theoretical maximum of 10 billion series. In practice not all combinations occur, but even a fraction of that will overwhelm Prometheus.

Business Metrics Examples#

Custom exporters are most valuable when they expose business-level metrics alongside technical metrics:

myapp_orders_processed_total (Counter, labels: payment_method, region) – order volume
myapp_cart_value_dollars (Histogram, buckets: 10, 25, 50, 100, 250, 500, 1000) – cart value distribution
myapp_active_users (Gauge) – current active user sessions
myapp_payment_processing_seconds (Histogram) – payment gateway latency
myapp_signup_total (Counter, labels: source, plan_type) – user acquisition tracking
myapp_queue_depth (Gauge, labels: queue_name) – background job queue sizes

Displaying business and technical metrics side by side in Grafana dashboards is powerful. When orders_processed_total drops while http_requests_total stays constant, the problem is in business logic, not infrastructure. When both drop simultaneously, look at the infrastructure layer.

Common Gotchas#

High-cardinality labels are the most frequent mistake in custom exporters. A label with user IDs generates one series per user per metric. With 100,000 users and 10 metrics, that is 1 million series from one application. Prometheus memory usage scales linearly with series count. Never use unbounded values as labels.

Using a Gauge for something that should be a Counter causes incorrect rate calculations. If a value only increases (total requests, total errors), use a Counter. Prometheus can calculate rate() and increase() from a Counter, and handles counter resets correctly. A Gauge that only increases looks similar but loses reset detection.

Summary quantiles cannot be aggregated across instances. If you have three instances each reporting p99 = 200ms, the aggregate p99 is not the average of those values. Use Histogram instead – histogram_quantile() aggregates correctly across instances because it works on raw bucket counts.