Grafana Mimir for Long-Term Prometheus Storage#

Prometheus stores metrics on local disk with a practical retention limit of weeks to a few months. Beyond that, you need a long-term storage solution. Grafana Mimir is a horizontally scalable, multi-tenant time series database designed for exactly this purpose. It is API-compatible with Prometheus – Grafana queries Mimir using the same PromQL, and Prometheus pushes data to Mimir via remote_write.

Mimir is the successor to Cortex. Grafana Labs forked Cortex, rewrote significant portions for performance, and released Mimir under the AGPLv3 license. If you see references to Cortex architecture, the concepts map directly to Mimir with improvements.

Architecture Overview#

Mimir splits the metrics pipeline into specialized components. Each component can be scaled independently based on the bottleneck.

Prometheus --> [Distributor] --> [Ingester] --> Object Storage (S3/GCS/MinIO)
                                    |
Grafana   <-- [Query Frontend] <-- [Querier] <-- [Store-Gateway] <-- Object Storage
                                    |
                              [Compactor] --> Object Storage

Distributor#

The distributor is the entry point for all incoming metrics. It receives samples via the Prometheus remote_write API, validates them (label names, sample timestamps, series limits), and distributes them to ingesters using consistent hashing.

Key responsibilities:

  • Validation: Rejects samples with invalid label names, too many labels, labels exceeding length limits, or timestamps too far in the past or future.
  • HA deduplication: When running multiple Prometheus replicas for high availability, the distributor deduplicates samples from replicas so only one copy is stored.
  • Sharding: Uses a hash ring (stored in Consul, etcd, or memberlist) to determine which ingesters should receive each series.
  • Replication: Writes each series to replication_factor ingesters (default 3) for durability.
# Distributor configuration
distributor:
  ring:
    kvstore:
      store: memberlist   # No external dependency for ring
  ha_tracker:
    enable_ha_tracker: true
    kvstore:
      store: memberlist
    # Prometheus instances with the same cluster/replica labels
    # are treated as HA pairs

The distributor is stateless and can be scaled horizontally behind a load balancer. It is CPU-bound during high-ingestion periods due to validation and hashing.

Ingester#

Ingesters hold recent data in memory and periodically flush it to long-term object storage as TSDB blocks. Each ingester owns a portion of the hash ring and is responsible for the series assigned to it.

Key responsibilities:

  • In-memory storage: Recent samples (last ~2 hours) are stored in an in-memory TSDB head block with a write-ahead log (WAL) for crash recovery.
  • Block flushing: Every 2 hours, the head block is compacted into an immutable TSDB block and uploaded to object storage.
  • Replication: Each series is replicated across replication_factor ingesters. During reads, the querier deduplicates data from replicas.
# Ingester configuration
ingester:
  ring:
    replication_factor: 3
    kvstore:
      store: memberlist
  # WAL settings for crash recovery
  wal_dir: /data/ingester/wal
  # How long to keep data in memory before flushing
  blocks_storage:
    tsdb:
      dir: /data/ingester/tsdb
      block_ranges_period: [2h]
      retention_period: 13h   # Keep recent blocks for fast queries

Ingesters are the most resource-intensive component. They are memory-bound (holding active series in RAM) and require persistent storage for WAL. Plan for 1-2GB of memory per 100,000 active series as a starting estimate.

Scaling ingesters: Adding or removing ingesters triggers ring rebalancing. Mimir handles this gracefully – the leaving ingester flushes its data, and the joining ingester starts receiving new writes. However, rebalancing creates temporary increased load. Scale ingesters during low-traffic periods when possible.

Store-Gateway#

The store-gateway provides efficient access to historical blocks in object storage. It downloads block index files and caches them locally, enabling fast series lookups without scanning entire blocks.

# Store-gateway configuration
store_gateway:
  sharding_ring:
    replication_factor: 3
    kvstore:
      store: memberlist
  # Cache block index headers on local disk
  bucket_store:
    sync_dir: /data/store-gateway/sync
    index_cache:
      backend: memcached
      memcached:
        addresses: memcached.mimir.svc:11211
    chunks_cache:
      backend: memcached
      memcached:
        addresses: memcached.mimir.svc:11211

The store-gateway shards blocks across instances using a hash ring, so each instance only downloads and caches a subset of blocks. Caching (index headers in memory, full index and chunks in Memcached) is critical for query performance.

Querier#

The querier executes PromQL queries by combining data from two sources: ingesters (for recent, not-yet-flushed data) and store-gateways (for historical blocks in object storage).

# Querier configuration
querier:
  # Maximum time range for a single query
  max_query_lookback: 30d
  # Query ingesters for recent data
  query_ingesters_within: 13h

The query_ingesters_within setting is important. It tells the querier to only contact ingesters for data within the last 13 hours (matching the ingester retention). Data older than this is served entirely from store-gateways, reducing ingester query load.

Query Frontend#

The query frontend sits in front of queriers and optimizes query execution. It is optional but strongly recommended for production.

Key optimizations:

  • Splitting: Splits long-range queries into smaller time ranges that can be executed in parallel.
  • Caching: Caches query results so repeated dashboard loads do not re-execute queries.
  • Queuing: Queues requests and distributes them fairly across queriers, preventing a single heavy query from consuming all querier resources.
# Query frontend configuration
query_frontend:
  # Split queries by day for parallel execution
  split_queries_by_interval: 24h
  # Cache results
  results_cache:
    backend: memcached
    memcached:
      addresses: memcached.mimir.svc:11211
      max_item_size: 10485760  # 10MB

Compactor#

The compactor runs periodically to merge small TSDB blocks in object storage into larger, more efficient blocks. It also handles retention enforcement by deleting blocks that exceed the configured retention period.

# Compactor configuration
compactor:
  data_dir: /data/compactor
  compaction_interval: 1h
  # Retention
  deletion_delay: 12h   # Delay deletion to allow queries to complete
  tenant_cleanup_delay: 6h

The compactor is a singleton per tenant (or uses sharding for multi-tenant setups). It requires temporary local disk space to download, merge, and re-upload blocks. Plan for local storage equal to 2-3x the size of your largest tenant’s uncompacted blocks.

Deployment Modes#

Monolithic#

All components run in a single binary/process. Suitable for development, testing, and small production deployments handling up to a few million active series.

# Run monolithic Mimir
mimir -target=all \
  -blocks-storage.backend=s3 \
  -blocks-storage.s3.endpoint=s3.amazonaws.com \
  -blocks-storage.s3.bucket-name=mimir-blocks \
  -blocks-storage.s3.region=us-east-1

Helm deployment in monolithic mode:

# mimir-values.yaml (monolithic)
mimir:
  structuredConfig:
    common:
      storage:
        backend: s3
        s3:
          endpoint: s3.amazonaws.com
          bucket_name: mimir-blocks
          region: us-east-1
    blocks_storage:
      storage_prefix: blocks
    limits:
      max_global_series_per_user: 1500000
deploymentMode: SingleBinary
singleBinary:
  replicas: 1
  resources:
    requests:
      cpu: "2"
      memory: 8Gi

Microservices#

Each component runs as a separate Deployment or StatefulSet. Required for large-scale deployments (tens of millions of active series, hundreds of Prometheus instances).

helm repo add grafana https://grafana.github.io/helm-charts
helm install mimir grafana/mimir-distributed \
  --namespace mimir --create-namespace \
  -f mimir-values.yaml
# mimir-values.yaml (microservices)
mimir:
  structuredConfig:
    common:
      storage:
        backend: s3
        s3:
          endpoint: s3.amazonaws.com
          bucket_name: mimir-blocks
          region: us-east-1
    limits:
      max_global_series_per_user: 5000000
      ingestion_rate: 200000
      ingestion_burst_size: 400000

distributor:
  replicas: 3
  resources:
    requests:
      cpu: "1"
      memory: 2Gi

ingester:
  replicas: 6
  persistentVolume:
    enabled: true
    size: 50Gi
  resources:
    requests:
      cpu: "2"
      memory: 8Gi

store_gateway:
  replicas: 3
  persistentVolume:
    enabled: true
    size: 20Gi

querier:
  replicas: 4
  resources:
    requests:
      cpu: "1"
      memory: 4Gi

query_frontend:
  replicas: 2

compactor:
  replicas: 1
  persistentVolume:
    enabled: true
    size: 100Gi

Tenant Isolation#

Mimir is natively multi-tenant. Each tenant gets isolated storage and configurable limits. Tenants are identified by the X-Scope-OrgID header on write and read requests.

Prometheus Configuration per Tenant#

# prometheus.yaml for tenant "team-platform"
remote_write:
  - url: http://mimir-distributor.mimir.svc:8080/api/v1/push
    headers:
      X-Scope-OrgID: team-platform
    queue_config:
      max_samples_per_send: 1000
      max_shards: 10
      capacity: 2500

Per-Tenant Limits#

Configure limits per tenant to prevent one team from consuming all resources:

# Mimir runtime configuration (overrides.yaml)
overrides:
  team-platform:
    max_global_series_per_user: 2000000
    ingestion_rate: 100000
    ingestion_burst_size: 200000
    max_query_lookback: 90d
    max_query_parallelism: 16

  team-frontend:
    max_global_series_per_user: 500000
    ingestion_rate: 25000
    ingestion_burst_size: 50000
    max_query_lookback: 30d
    max_query_parallelism: 8

Reload the runtime configuration without restarting Mimir by updating the ConfigMap and sending a SIGHUP to the process, or by using the /runtime_config endpoint.

Remote Write Configuration#

Prometheus remote_write Tuning#

The default Prometheus remote_write configuration is conservative. For production, tune these parameters:

remote_write:
  - url: http://mimir-distributor.mimir.svc:8080/api/v1/push
    headers:
      X-Scope-OrgID: my-tenant
    queue_config:
      capacity: 10000            # Buffer size before blocking
      max_shards: 30             # Parallel senders (increase for high-volume)
      min_shards: 1
      max_samples_per_send: 2000 # Batch size per request
      batch_send_deadline: 5s    # Max time to wait for a full batch
      min_backoff: 30ms
      max_backoff: 5s
    metadata_config:
      send: true
      send_interval: 1m

Key tuning knobs:

  • max_shards: Increase when remote_write is falling behind (check prometheus_remote_storage_samples_pending). Each shard is a parallel sender.
  • capacity: Increase when you see prometheus_remote_storage_samples_dropped_total increasing. This is the in-memory buffer before samples are dropped.
  • max_samples_per_send: Larger batches are more efficient but increase latency. 1000-2000 is the sweet spot.

Monitoring remote_write Health#

Essential Prometheus metrics to monitor:

# Samples pending in the remote write queue (should stay low)
prometheus_remote_storage_samples_pending

# Samples failed to send (should be zero)
rate(prometheus_remote_storage_samples_failed_total[5m])

# Samples successfully sent per second
rate(prometheus_remote_storage_samples_total[5m])

# Highest timestamp successfully sent (for lag calculation)
prometheus_remote_storage_queue_highest_sent_timestamp_seconds

Retention Policies#

Configure retention at two levels:

Block-Level Retention (Compactor)#

The compactor enforces retention by deleting blocks older than the configured period:

limits:
  compactor_blocks_retention_period: 365d  # Keep metrics for 1 year

Per-tenant retention:

overrides:
  team-platform:
    compactor_blocks_retention_period: 365d
  team-staging:
    compactor_blocks_retention_period: 30d

Object Storage Lifecycle (Belt and Suspenders)#

As a safety net, configure object storage lifecycle rules to delete objects older than your longest retention period plus a buffer:

# Terraform: S3 lifecycle rule
resource "aws_s3_bucket_lifecycle_configuration" "mimir" {
  bucket = aws_s3_bucket.mimir_blocks.id

  rule {
    id     = "expire-old-blocks"
    status = "Enabled"
    expiration {
      days = 400  # 365 + 35 day buffer
    }
  }
}

Performance Tuning#

Ingester Memory#

The primary scaling dimension. Each active series consumes approximately 5-10KB of memory in the ingester.

Active series: 2,000,000
Memory per series: ~8KB (average)
Ingester memory needed: ~16GB
Replication factor: 3
Ingesters: 6

Memory per ingester: ~8GB (each ingester holds 1/6 of 3 copies)

Monitor with:

# Active series per ingester
cortex_ingester_memory_series

# Memory used by ingesters
process_resident_memory_bytes{container="ingester"}

Store-Gateway Caching#

Without caching, every query to historical data requires downloading index files from object storage. This is slow and expensive.

Index cache: Stores series index lookups. Size it to hold the index for your most-queried time ranges. Start with 1GB per million series per day of retention.

Chunks cache: Stores actual sample data. Size it to hold the chunks for your most common queries. Start with 2GB per million series per frequently-queried day.

Use Memcached for both caches in production:

memcached:
  replicas: 3
  resources:
    requests:
      cpu: "500m"
      memory: 4Gi
  maxItemMemory: 3072  # MB, leave room for overhead

Query Performance#

Slow queries are usually caused by high cardinality (too many series matching a selector) or long time ranges without splitting.

  • Enable query splitting in the query frontend (split_queries_by_interval: 24h).
  • Set max_query_parallelism per tenant to limit resource consumption per query.
  • Use max_fetched_series_per_query to prevent runaway queries from loading millions of series.
limits:
  max_fetched_series_per_query: 100000
  max_fetched_chunks_per_query: 2000000
  max_query_parallelism: 16
  max_query_length: 30d  # Maximum time range per query

Practical Checklist for Agents#

When deploying or operating Mimir:

  1. Start monolithic for clusters with fewer than 1 million active series. Move to microservices when scaling demands it.
  2. Use memberlist for the hash ring instead of Consul or etcd. It eliminates an external dependency and works well up to dozens of ring members.
  3. Set per-tenant limits from day one. A single team sending high-cardinality metrics can degrade the entire cluster.
  4. Monitor ingester memory closely. OOM-killed ingesters lose in-flight data between WAL checkpoints. Set memory requests conservatively with headroom.
  5. Deploy Memcached for index and chunks caching. Without caching, historical queries are unacceptably slow.
  6. Tune remote_write on the Prometheus side. The default settings are too conservative for most production workloads. Increase max_shards and capacity if samples are pending or dropping.
  7. Configure retention in both Mimir (compactor) and object storage (lifecycle rules) as defense in depth.
  8. Test failover by killing an ingester and verifying that queries still return correct data (due to replication) and that the replacement ingester catches up via WAL replay.