Grafana Mimir for Long-Term Prometheus Storage#
Prometheus stores metrics on local disk with a practical retention limit of weeks to a few months. Beyond that, you need a long-term storage solution. Grafana Mimir is a horizontally scalable, multi-tenant time series database designed for exactly this purpose. It is API-compatible with Prometheus – Grafana queries Mimir using the same PromQL, and Prometheus pushes data to Mimir via remote_write.
Mimir is the successor to Cortex. Grafana Labs forked Cortex, rewrote significant portions for performance, and released Mimir under the AGPLv3 license. If you see references to Cortex architecture, the concepts map directly to Mimir with improvements.
Architecture Overview#
Mimir splits the metrics pipeline into specialized components. Each component can be scaled independently based on the bottleneck.
Prometheus --> [Distributor] --> [Ingester] --> Object Storage (S3/GCS/MinIO)
|
Grafana <-- [Query Frontend] <-- [Querier] <-- [Store-Gateway] <-- Object Storage
|
[Compactor] --> Object StorageDistributor#
The distributor is the entry point for all incoming metrics. It receives samples via the Prometheus remote_write API, validates them (label names, sample timestamps, series limits), and distributes them to ingesters using consistent hashing.
Key responsibilities:
- Validation: Rejects samples with invalid label names, too many labels, labels exceeding length limits, or timestamps too far in the past or future.
- HA deduplication: When running multiple Prometheus replicas for high availability, the distributor deduplicates samples from replicas so only one copy is stored.
- Sharding: Uses a hash ring (stored in Consul, etcd, or memberlist) to determine which ingesters should receive each series.
- Replication: Writes each series to
replication_factoringesters (default 3) for durability.
# Distributor configuration
distributor:
ring:
kvstore:
store: memberlist # No external dependency for ring
ha_tracker:
enable_ha_tracker: true
kvstore:
store: memberlist
# Prometheus instances with the same cluster/replica labels
# are treated as HA pairsThe distributor is stateless and can be scaled horizontally behind a load balancer. It is CPU-bound during high-ingestion periods due to validation and hashing.
Ingester#
Ingesters hold recent data in memory and periodically flush it to long-term object storage as TSDB blocks. Each ingester owns a portion of the hash ring and is responsible for the series assigned to it.
Key responsibilities:
- In-memory storage: Recent samples (last ~2 hours) are stored in an in-memory TSDB head block with a write-ahead log (WAL) for crash recovery.
- Block flushing: Every 2 hours, the head block is compacted into an immutable TSDB block and uploaded to object storage.
- Replication: Each series is replicated across
replication_factoringesters. During reads, the querier deduplicates data from replicas.
# Ingester configuration
ingester:
ring:
replication_factor: 3
kvstore:
store: memberlist
# WAL settings for crash recovery
wal_dir: /data/ingester/wal
# How long to keep data in memory before flushing
blocks_storage:
tsdb:
dir: /data/ingester/tsdb
block_ranges_period: [2h]
retention_period: 13h # Keep recent blocks for fast queriesIngesters are the most resource-intensive component. They are memory-bound (holding active series in RAM) and require persistent storage for WAL. Plan for 1-2GB of memory per 100,000 active series as a starting estimate.
Scaling ingesters: Adding or removing ingesters triggers ring rebalancing. Mimir handles this gracefully – the leaving ingester flushes its data, and the joining ingester starts receiving new writes. However, rebalancing creates temporary increased load. Scale ingesters during low-traffic periods when possible.
Store-Gateway#
The store-gateway provides efficient access to historical blocks in object storage. It downloads block index files and caches them locally, enabling fast series lookups without scanning entire blocks.
# Store-gateway configuration
store_gateway:
sharding_ring:
replication_factor: 3
kvstore:
store: memberlist
# Cache block index headers on local disk
bucket_store:
sync_dir: /data/store-gateway/sync
index_cache:
backend: memcached
memcached:
addresses: memcached.mimir.svc:11211
chunks_cache:
backend: memcached
memcached:
addresses: memcached.mimir.svc:11211The store-gateway shards blocks across instances using a hash ring, so each instance only downloads and caches a subset of blocks. Caching (index headers in memory, full index and chunks in Memcached) is critical for query performance.
Querier#
The querier executes PromQL queries by combining data from two sources: ingesters (for recent, not-yet-flushed data) and store-gateways (for historical blocks in object storage).
# Querier configuration
querier:
# Maximum time range for a single query
max_query_lookback: 30d
# Query ingesters for recent data
query_ingesters_within: 13hThe query_ingesters_within setting is important. It tells the querier to only contact ingesters for data within the last 13 hours (matching the ingester retention). Data older than this is served entirely from store-gateways, reducing ingester query load.
Query Frontend#
The query frontend sits in front of queriers and optimizes query execution. It is optional but strongly recommended for production.
Key optimizations:
- Splitting: Splits long-range queries into smaller time ranges that can be executed in parallel.
- Caching: Caches query results so repeated dashboard loads do not re-execute queries.
- Queuing: Queues requests and distributes them fairly across queriers, preventing a single heavy query from consuming all querier resources.
# Query frontend configuration
query_frontend:
# Split queries by day for parallel execution
split_queries_by_interval: 24h
# Cache results
results_cache:
backend: memcached
memcached:
addresses: memcached.mimir.svc:11211
max_item_size: 10485760 # 10MBCompactor#
The compactor runs periodically to merge small TSDB blocks in object storage into larger, more efficient blocks. It also handles retention enforcement by deleting blocks that exceed the configured retention period.
# Compactor configuration
compactor:
data_dir: /data/compactor
compaction_interval: 1h
# Retention
deletion_delay: 12h # Delay deletion to allow queries to complete
tenant_cleanup_delay: 6hThe compactor is a singleton per tenant (or uses sharding for multi-tenant setups). It requires temporary local disk space to download, merge, and re-upload blocks. Plan for local storage equal to 2-3x the size of your largest tenant’s uncompacted blocks.
Deployment Modes#
Monolithic#
All components run in a single binary/process. Suitable for development, testing, and small production deployments handling up to a few million active series.
# Run monolithic Mimir
mimir -target=all \
-blocks-storage.backend=s3 \
-blocks-storage.s3.endpoint=s3.amazonaws.com \
-blocks-storage.s3.bucket-name=mimir-blocks \
-blocks-storage.s3.region=us-east-1Helm deployment in monolithic mode:
# mimir-values.yaml (monolithic)
mimir:
structuredConfig:
common:
storage:
backend: s3
s3:
endpoint: s3.amazonaws.com
bucket_name: mimir-blocks
region: us-east-1
blocks_storage:
storage_prefix: blocks
limits:
max_global_series_per_user: 1500000
deploymentMode: SingleBinary
singleBinary:
replicas: 1
resources:
requests:
cpu: "2"
memory: 8GiMicroservices#
Each component runs as a separate Deployment or StatefulSet. Required for large-scale deployments (tens of millions of active series, hundreds of Prometheus instances).
helm repo add grafana https://grafana.github.io/helm-charts
helm install mimir grafana/mimir-distributed \
--namespace mimir --create-namespace \
-f mimir-values.yaml# mimir-values.yaml (microservices)
mimir:
structuredConfig:
common:
storage:
backend: s3
s3:
endpoint: s3.amazonaws.com
bucket_name: mimir-blocks
region: us-east-1
limits:
max_global_series_per_user: 5000000
ingestion_rate: 200000
ingestion_burst_size: 400000
distributor:
replicas: 3
resources:
requests:
cpu: "1"
memory: 2Gi
ingester:
replicas: 6
persistentVolume:
enabled: true
size: 50Gi
resources:
requests:
cpu: "2"
memory: 8Gi
store_gateway:
replicas: 3
persistentVolume:
enabled: true
size: 20Gi
querier:
replicas: 4
resources:
requests:
cpu: "1"
memory: 4Gi
query_frontend:
replicas: 2
compactor:
replicas: 1
persistentVolume:
enabled: true
size: 100GiTenant Isolation#
Mimir is natively multi-tenant. Each tenant gets isolated storage and configurable limits. Tenants are identified by the X-Scope-OrgID header on write and read requests.
Prometheus Configuration per Tenant#
# prometheus.yaml for tenant "team-platform"
remote_write:
- url: http://mimir-distributor.mimir.svc:8080/api/v1/push
headers:
X-Scope-OrgID: team-platform
queue_config:
max_samples_per_send: 1000
max_shards: 10
capacity: 2500Per-Tenant Limits#
Configure limits per tenant to prevent one team from consuming all resources:
# Mimir runtime configuration (overrides.yaml)
overrides:
team-platform:
max_global_series_per_user: 2000000
ingestion_rate: 100000
ingestion_burst_size: 200000
max_query_lookback: 90d
max_query_parallelism: 16
team-frontend:
max_global_series_per_user: 500000
ingestion_rate: 25000
ingestion_burst_size: 50000
max_query_lookback: 30d
max_query_parallelism: 8Reload the runtime configuration without restarting Mimir by updating the ConfigMap and sending a SIGHUP to the process, or by using the /runtime_config endpoint.
Remote Write Configuration#
Prometheus remote_write Tuning#
The default Prometheus remote_write configuration is conservative. For production, tune these parameters:
remote_write:
- url: http://mimir-distributor.mimir.svc:8080/api/v1/push
headers:
X-Scope-OrgID: my-tenant
queue_config:
capacity: 10000 # Buffer size before blocking
max_shards: 30 # Parallel senders (increase for high-volume)
min_shards: 1
max_samples_per_send: 2000 # Batch size per request
batch_send_deadline: 5s # Max time to wait for a full batch
min_backoff: 30ms
max_backoff: 5s
metadata_config:
send: true
send_interval: 1mKey tuning knobs:
- max_shards: Increase when remote_write is falling behind (check
prometheus_remote_storage_samples_pending). Each shard is a parallel sender. - capacity: Increase when you see
prometheus_remote_storage_samples_dropped_totalincreasing. This is the in-memory buffer before samples are dropped. - max_samples_per_send: Larger batches are more efficient but increase latency. 1000-2000 is the sweet spot.
Monitoring remote_write Health#
Essential Prometheus metrics to monitor:
# Samples pending in the remote write queue (should stay low)
prometheus_remote_storage_samples_pending
# Samples failed to send (should be zero)
rate(prometheus_remote_storage_samples_failed_total[5m])
# Samples successfully sent per second
rate(prometheus_remote_storage_samples_total[5m])
# Highest timestamp successfully sent (for lag calculation)
prometheus_remote_storage_queue_highest_sent_timestamp_secondsRetention Policies#
Configure retention at two levels:
Block-Level Retention (Compactor)#
The compactor enforces retention by deleting blocks older than the configured period:
limits:
compactor_blocks_retention_period: 365d # Keep metrics for 1 yearPer-tenant retention:
overrides:
team-platform:
compactor_blocks_retention_period: 365d
team-staging:
compactor_blocks_retention_period: 30dObject Storage Lifecycle (Belt and Suspenders)#
As a safety net, configure object storage lifecycle rules to delete objects older than your longest retention period plus a buffer:
# Terraform: S3 lifecycle rule
resource "aws_s3_bucket_lifecycle_configuration" "mimir" {
bucket = aws_s3_bucket.mimir_blocks.id
rule {
id = "expire-old-blocks"
status = "Enabled"
expiration {
days = 400 # 365 + 35 day buffer
}
}
}Performance Tuning#
Ingester Memory#
The primary scaling dimension. Each active series consumes approximately 5-10KB of memory in the ingester.
Active series: 2,000,000
Memory per series: ~8KB (average)
Ingester memory needed: ~16GB
Replication factor: 3
Ingesters: 6
Memory per ingester: ~8GB (each ingester holds 1/6 of 3 copies)Monitor with:
# Active series per ingester
cortex_ingester_memory_series
# Memory used by ingesters
process_resident_memory_bytes{container="ingester"}Store-Gateway Caching#
Without caching, every query to historical data requires downloading index files from object storage. This is slow and expensive.
Index cache: Stores series index lookups. Size it to hold the index for your most-queried time ranges. Start with 1GB per million series per day of retention.
Chunks cache: Stores actual sample data. Size it to hold the chunks for your most common queries. Start with 2GB per million series per frequently-queried day.
Use Memcached for both caches in production:
memcached:
replicas: 3
resources:
requests:
cpu: "500m"
memory: 4Gi
maxItemMemory: 3072 # MB, leave room for overheadQuery Performance#
Slow queries are usually caused by high cardinality (too many series matching a selector) or long time ranges without splitting.
- Enable query splitting in the query frontend (
split_queries_by_interval: 24h). - Set
max_query_parallelismper tenant to limit resource consumption per query. - Use
max_fetched_series_per_queryto prevent runaway queries from loading millions of series.
limits:
max_fetched_series_per_query: 100000
max_fetched_chunks_per_query: 2000000
max_query_parallelism: 16
max_query_length: 30d # Maximum time range per queryPractical Checklist for Agents#
When deploying or operating Mimir:
- Start monolithic for clusters with fewer than 1 million active series. Move to microservices when scaling demands it.
- Use memberlist for the hash ring instead of Consul or etcd. It eliminates an external dependency and works well up to dozens of ring members.
- Set per-tenant limits from day one. A single team sending high-cardinality metrics can degrade the entire cluster.
- Monitor ingester memory closely. OOM-killed ingesters lose in-flight data between WAL checkpoints. Set memory requests conservatively with headroom.
- Deploy Memcached for index and chunks caching. Without caching, historical queries are unacceptably slow.
- Tune remote_write on the Prometheus side. The default settings are too conservative for most production workloads. Increase
max_shardsandcapacityif samples are pending or dropping. - Configure retention in both Mimir (compactor) and object storage (lifecycle rules) as defense in depth.
- Test failover by killing an ingester and verifying that queries still return correct data (due to replication) and that the replacement ingester catches up via WAL replay.