MongoDB Operational Patterns#

MongoDB operations center on three areas: keeping the cluster healthy (replica sets and sharding), protecting data (backups), and keeping queries fast (indexes and explain plans). This reference covers the practical commands and patterns for each.

Replica Set Setup#

A replica set is the minimum production deployment – three data-bearing members that elect a primary and maintain identical copies of the data.

Launching Members#

Each member runs mongod with the same --replSet name:

# Node 1
mongod --replSet rs0 --dbpath /data/rs0-1 --port 27017 --bind_ip 0.0.0.0

# Node 2
mongod --replSet rs0 --dbpath /data/rs0-2 --port 27018 --bind_ip 0.0.0.0

# Node 3
mongod --replSet rs0 --dbpath /data/rs0-3 --port 27019 --bind_ip 0.0.0.0

Initiating the Replica Set#

Connect to one member and initiate:

// mongosh
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo1:27017" },
    { _id: 1, host: "mongo2:27018" },
    { _id: 2, host: "mongo3:27019" }
  ]
})

Check status with rs.status(). The stateStr field shows PRIMARY, SECONDARY, or error states like RECOVERING.

Priority and Hidden Members#

Control election behavior and read routing:

// Make a member ineligible for primary (reporting/analytics replica)
cfg = rs.conf()
cfg.members[2].priority = 0
cfg.members[2].hidden = true
rs.reconfig(cfg)

Hidden members do not receive client reads via connection strings but still replicate data. Useful for dedicated backup or analytics nodes.

Arbiter Nodes#

An arbiter votes in elections but holds no data. Use one only when you cannot afford a third data-bearing member:

rs.addArb("arbiter-host:27020")

Arbiters introduce operational risk – if one data-bearing member goes down, you have zero redundancy. Prefer three full members.

Sharding Strategies#

Sharding distributes data across multiple replica sets. Each shard holds a portion of the data, and mongos routes queries.

Architecture#

A sharded cluster requires config servers (metadata), shards (data), and mongos routers (query routing):

# Config servers (replica set)
mongod --configsvr --replSet cfgrs --dbpath /data/cfg --port 27019

# Shard servers (each is its own replica set)
mongod --shardsvr --replSet shard1rs --dbpath /data/shard1 --port 27018

# mongos router
mongos --configdb cfgrs/cfg1:27019,cfg2:27019,cfg3:27019 --port 27017

Choosing a Shard Key#

The shard key determines data distribution. Bad shard key choices cause hotspots that cannot be fixed without migrating data.

Hashed shard key – even distribution, but range queries scan all shards:

sh.shardCollection("mydb.orders", { _id: "hashed" })

Range shard key – supports range queries on the shard key, but monotonically increasing values (timestamps, ObjectIds) create a hot shard:

sh.shardCollection("mydb.events", { tenant_id: 1, created_at: 1 })

Compound shard key – combine a high-cardinality field with a range field. tenant_id distributes across shards, created_at supports time-range queries within each tenant:

sh.shardCollection("mydb.logs", { tenant_id: 1, timestamp: 1 })

Check chunk distribution:

sh.status()
db.orders.getShardDistribution()

If one shard holds significantly more chunks, the balancer should migrate them. If it does not, check sh.isBalancerRunning() and the balancer window settings.

Backup with mongodump and mongorestore#

Full Backup#

# Backup all databases
mongodump --host rs0/mongo1:27017,mongo2:27018,mongo3:27019 \
  --readPreference secondaryPreferred \
  --out /backup/$(date +%Y%m%d)

# Backup a single database
mongodump --db mydb --out /backup/mydb-$(date +%Y%m%d)

# Compressed backup (significantly smaller)
mongodump --db mydb --gzip --out /backup/mydb-$(date +%Y%m%d)

The --readPreference secondaryPreferred flag reads from a secondary to avoid loading the primary. For sharded clusters, connect mongodump to a mongos router.

Point-in-Time Backup#

# Capture oplog for point-in-time recovery
mongodump --oplog --out /backup/full-$(date +%Y%m%d)

The --oplog flag includes oplog entries captured during the dump, enabling consistent restores even while writes continue.

Restore#

# Restore all databases
mongorestore /backup/20260222/

# Restore a single database
mongorestore --db mydb /backup/mydb-20260222/mydb/

# Restore with oplog replay
mongorestore --oplogReplay /backup/full-20260222/

# Drop existing collections before restoring
mongorestore --drop --db mydb /backup/mydb-20260222/mydb/

Always test restores on a non-production instance. A backup you have never restored is not a backup.

Index Management#

Creating Indexes#

// Single field index
db.orders.createIndex({ customer_id: 1 })

// Compound index (field order matters for query optimization)
db.orders.createIndex({ status: 1, created_at: -1 })

// Unique index
db.users.createIndex({ email: 1 }, { unique: true })

// TTL index (auto-delete documents after 30 days)
db.sessions.createIndex({ created_at: 1 }, { expireAfterSeconds: 2592000 })

// Partial index (index only documents matching a filter)
db.orders.createIndex(
  { status: 1, total: 1 },
  { partialFilterExpression: { status: "pending" } }
)

// Background index build (default in 4.2+, explicit in earlier versions)
db.orders.createIndex({ region: 1 }, { background: true })

Reviewing Index Usage#

// List all indexes
db.orders.getIndexes()

// Index usage statistics -- identifies unused indexes
db.orders.aggregate([{ $indexStats: {} }])

Unused indexes waste write performance and storage. If $indexStats shows an index with zero accesses.ops over a meaningful time window, drop it:

db.orders.dropIndex("region_1")

Index Build Impact#

In MongoDB 4.2+, index builds use an optimized process that holds an exclusive lock only at the beginning and end. For large collections, building indexes during off-peak hours is still recommended. Monitor build progress:

db.currentOp({ "msg": /Index Build/ })

Query Optimization with explain()#

Reading Explain Output#

db.orders.find({ customer_id: "abc123", status: "shipped" })
  .sort({ created_at: -1 })
  .explain("executionStats")

Key fields in the output:

  • winningPlan.stage: IXSCAN (index scan, good), COLLSCAN (collection scan, bad), SORT (in-memory sort, expensive).
  • executionStats.totalKeysExamined: Number of index entries scanned. Should be close to nReturned.
  • executionStats.totalDocsExamined: Number of documents fetched. If much larger than nReturned, the index is not selective enough.
  • executionStats.executionTimeMillis: Actual query execution time.

Optimization Patterns#

Collection scan to index scan:

// Before: COLLSCAN, examines 5 million documents
db.orders.find({ customer_id: "abc123" }).explain("executionStats")

// Fix: add an index
db.orders.createIndex({ customer_id: 1 })

// After: IXSCAN, examines 47 documents

Covered query – when the index contains all fields the query needs, MongoDB returns results directly from the index without fetching documents:

// Index covers the query entirely
db.orders.createIndex({ customer_id: 1, status: 1, total: 1 })
db.orders.find(
  { customer_id: "abc123" },
  { status: 1, total: 1, _id: 0 }
).explain("executionStats")
// Look for: "totalDocsExamined": 0

Sort optimization – if the sort field is included in the index, MongoDB avoids an in-memory sort:

// Index supports both filter and sort
db.orders.createIndex({ customer_id: 1, created_at: -1 })
db.orders.find({ customer_id: "abc123" }).sort({ created_at: -1 })

Monitoring with mongostat and mongotop#

mongostat#

Real-time server statistics, similar to vmstat:

mongostat --host mongo1:27017 --rowcount 0 2

Key columns: insert, query, update, delete (operations per second), getmore (cursor batches), command (admin commands), dirty and used (WiredTiger cache percentages), qrw and arw (queue lengths – read/write operations queued and active).

Watch for: dirty consistently above 5% (writes outpacing cache eviction), qrw growing (operations queuing), conn approaching maxIncomingConnections.

mongotop#

Shows time spent reading and writing per collection:

mongotop --host mongo1:27017 10

The output shows which collections consume the most I/O. If a collection you do not expect dominates, investigate – it could be a missing index causing full collection scans, or a background process writing heavily.

Server Status Commands#

// Comprehensive server stats
db.serverStatus()

// Connection pool
db.serverStatus().connections
// { current: 45, available: 51155, totalCreated: 12847 }

// WiredTiger cache
db.serverStatus().wiredTiger.cache
// Check "bytes currently in the cache" vs "maximum bytes configured"

// Replication lag
rs.printReplicationInfo()
rs.printSecondaryReplicationInfo()

// Current operations (find long-running queries)
db.currentOp({ "secs_running": { $gte: 5 } })

// Kill a long-running operation
db.killOp(12345)

Operational Checklist#

When taking over an existing MongoDB deployment, run through these checks:

  1. Replica set health: rs.status() – all members should be PRIMARY or SECONDARY, not RECOVERING or unreachable.
  2. Replication lag: rs.printSecondaryReplicationInfo() – lag should be under a few seconds.
  3. Index coverage: Run explain() on the top 10 queries. Any COLLSCAN on a large collection needs an index.
  4. Unused indexes: $indexStats across all collections. Drop indexes with zero operations.
  5. Backup verification: Restore the latest backup to a test instance and verify data integrity.
  6. Disk space: Check data directory usage and oplog size. The oplog should cover at least 24 hours of operations.
  7. Connection count: Compare db.serverStatus().connections.current against the configured limit.
  8. WiredTiger cache: If dirty percentage stays above 5% in mongostat, the cache is undersized or writes are too heavy.