Instant Vectors vs Range Vectors#
An instant vector returns one sample per time series at a single point in time. A range vector returns multiple samples per time series over a time window.
# Instant vector: current value of each series
http_requests_total{job="api"}
# Range vector: last 5 minutes of samples for each series
http_requests_total{job="api"}[5m]You cannot graph a range vector directly. Functions like rate() and increase() consume a range vector and return an instant vector, which Grafana can then plot.
rate() vs irate()#
Both compute per-second rates from counter metrics, but they behave differently.
rate() calculates the average per-second increase over the entire range window. It smooths out spikes and is the right choice for alerting and dashboards where you want stable trends:
# Average requests per second over last 5 minutes
rate(http_requests_total[5m])irate() uses only the last two data points in the range window. It reacts to spikes immediately but is noisy:
# Instantaneous rate based on last two samples
irate(http_requests_total[5m])The range window in irate() is only a lookback to find two samples. The actual rate is computed over the gap between those two scrapes, regardless of the window size.
Rule of thumb: use rate() for alerts and recording rules, irate() only for high-resolution interactive dashboards.
histogram_quantile() for Latency Percentiles#
Histogram metrics store observations in buckets. To get percentiles, use histogram_quantile():
# p99 latency across all instances
histogram_quantile(0.99,
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)
# p95 latency broken down by endpoint
histogram_quantile(0.95,
sum by (le, handler) (rate(http_request_duration_seconds_bucket[5m]))
)
# p50 (median) latency
histogram_quantile(0.50,
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)The le label (less-than-or-equal) is required in the by clause – it represents the bucket boundaries. Always apply rate() before histogram_quantile() to compute per-second bucket fill rates, otherwise you get cumulative counts that produce meaningless percentiles.
Bucket boundaries are set in your application instrumentation. Default Go client buckets are [.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]. Choose buckets that span your realistic latency range for accurate percentiles.
Aggregation Operators#
Aggregation operators reduce dimensions by collapsing label sets.
# Sum across all instances, keep only the job label
sum by (job) (rate(http_requests_total[5m]))
# Equivalent using 'without' -- drop the instance label, keep everything else
sum without (instance) (rate(http_requests_total[5m]))
# Average CPU usage per node
avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m]))
# Count the number of pods per namespace
count by (namespace) (kube_pod_info)
# Maximum memory usage across all pods in a deployment
max by (deployment) (container_memory_working_set_bytes{container!=""})
# Minimum available disk space across all nodes
min by (instance) (node_filesystem_avail_bytes{mountpoint="/"})
# Top 5 pods by CPU usage
topk(5, sum by (pod) (rate(container_cpu_usage_seconds_total{container!=""}[5m])))Offset Modifier and Subqueries#
The offset modifier shifts a query back in time. Useful for comparing current values to historical baselines:
# Request rate now vs 1 hour ago
rate(http_requests_total[5m]) / rate(http_requests_total[5m] offset 1h)
# Request rate now vs same time yesterday
rate(http_requests_total[5m]) - rate(http_requests_total[5m] offset 1d)Subqueries evaluate a query over a range at a specified resolution:
# Maximum 5-minute error rate over the last hour, evaluated every minute
max_over_time(
rate(http_requests_total{status_code=~"5.."}[5m])[1h:1m]
)
# Average of p99 latency over the last 24 hours
avg_over_time(
histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))[24h:5m]
)Label Matching#
PromQL supports four label matchers: = (exact match), != (not equal), =~ (regex match), !~ (negative regex match):
# All 5xx status codes
http_requests_total{status_code=~"5.."}
# All non-GET methods
http_requests_total{method!="GET"}
# Multiple namespaces
kube_pod_info{namespace=~"production|staging"}
# Exclude system containers
container_memory_working_set_bytes{container!="", container!="POD"}10 Common Monitoring Queries#
# 1. Error rate as a percentage
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100
# 2. Request throughput by endpoint
sum by (handler) (rate(http_requests_total[5m]))
# 3. p95 latency per service
histogram_quantile(0.95,
sum by (le, job) (rate(http_request_duration_seconds_bucket[5m])))
# 4. CPU saturation (load average vs cores)
node_load1 / count without (cpu) (node_cpu_seconds_total{mode="idle"})
# 5. Memory usage percentage per node
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# 6. Disk IOPS
sum by (instance) (rate(node_disk_reads_completed_total[5m])
+ rate(node_disk_writes_completed_total[5m]))
# 7. Container restart count in last hour
increase(kube_pod_container_status_restarts_total[1h])
# 8. Network throughput per pod
sum by (pod) (rate(container_network_receive_bytes_total[5m])) * 8
# 9. PVC usage percentage
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100
# 10. Deployment replica availability
kube_deployment_status_available_replicas / kube_deployment_spec_replicasRecording Rules#
Expensive queries that run frequently (dashboard panels refreshing every 15s, alerts evaluating every minute) should be converted to recording rules. Prometheus pre-computes the result and stores it as a new time series.
groups:
- name: http_recording_rules
interval: 30s
rules:
- record: job:http_requests:rate5m
expr: sum by (job) (rate(http_requests_total[5m]))
- record: job:http_errors:ratio5m
expr: |
sum by (job) (rate(http_requests_total{status_code=~"5.."}[5m]))
/ sum by (job) (rate(http_requests_total[5m]))
- record: job:http_latency:p99_5m
expr: |
histogram_quantile(0.99,
sum by (le, job) (rate(http_request_duration_seconds_bucket[5m])))
- name: node_recording_rules
rules:
- record: instance:node_cpu:utilization5m
expr: |
1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))Naming convention: level:metric_name:operations. The level indicates the aggregation level (job, instance, namespace). The operations suffix describes what was applied (rate5m, ratio5m, p99_5m).
Use recording rules in your alerts and dashboards by referencing the recorded metric name directly: job:http_errors:ratio5m > 0.05 instead of the full expression. This reduces query load on Prometheus and makes alert rules more readable.