MongoDB Indexing Strategies

Introduction to MongoDB Indexing

Indexing is one of the most critical aspects of MongoDB performance optimization. A well-designed indexing strategy can transform slow queries into lightning-fast operations, while poor indexing can cripple even the most powerful hardware. MongoDB indexes are data structures that improve the speed of data retrieval operations on a collection at the cost of additional storage space and write performance overhead.

Think of indexes like the index in a book - instead of reading every page to find information about a specific topic, you can use the index to jump directly to the relevant pages. Similarly, MongoDB indexes allow the database to quickly locate documents without scanning every document in a collection.

Key Benefits of Proper Indexing

Query Performance: Dramatically reduce query execution time from seconds to milliseconds
Memory Efficiency: Keep frequently accessed data in memory
Scalability: Maintain performance as data volume grows
Resource Optimization: Reduce CPU and I/O usage
User Experience: Ensure responsive application performance

Understanding Index Fundamentals

How MongoDB Indexes Work

MongoDB indexes use B-tree data structures that maintain sorted pointers to documents. When a query is executed, MongoDB’s query optimizer determines whether an index can be used to satisfy the query efficiently.

Index Structure:
```
// Collection document
{ _id: ObjectId("..."), name: "Alice", age: 30, city: "New York" }

// Index on 'name' field creates a sorted structure:
// Index Entry -> Document Location
// "Alice"     -> Document 1
// "Bob"       -> Document 3
// "Charlie"   -> Document 2
```
Query Execution Without Index:
- Collection scan: O(n) complexity
- Examines every document in collection
- Performance degrades linearly with data size
Query Execution With Index:
- Index scan: O(log n) complexity
- Directly locates relevant documents
- Consistent performance regardless of collection size
Index Types Overview

MongoDB supports various index types, each optimized for specific use cases:

Primary Index Types:
- Single Field: Index on one field
- Compound: Index on multiple fields
- Multikey: Automatically handles array values
- Text: Full-text search capabilities
- Geospatial: Location-based queries
- Hashed: Evenly distributed shard keys
Specialized Index Types:
- Partial: Index subset of documents
- Sparse: Skip documents missing indexed fields
- TTL: Automatically expire documents
- Unique: Enforce uniqueness constraints

Core Indexing Strategies

Single Field Indexes

Single field indexes are the foundation of MongoDB indexing, created on individual document fields to optimize queries and sorting operations.

Creating Single Field Indexes:

// Create ascending index on email field
db.users.createIndex({ email: 1 });

// Create descending index on timestamp
db.events.createIndex({ timestamp: -1 });

// Create index with options
db.users.createIndex(
  { username: 1 },
  {
    unique: true,
    name: "unique_username_idx",
    background: true,
  }
);

Optimized Query Patterns:

// Efficient queries using single field indexes

// Equality queries
db.users.find({ email: "alice@example.com" });

// Range queries
db.events.find({ timestamp: { $gte: ISODate("2023-01-01") } });

// Sorting operations
db.products.find().sort({ price: -1 });

// Prefix matching with regular expressions
db.users.find({ username: /^alice/ });

Performance Considerations:

Index selectivity: High cardinality fields perform better
Query patterns: Match index sort order for optimal performance
Write overhead: Each index adds overhead to insert/update operations

Compound Indexes

Compound indexes span multiple fields and are essential for optimizing complex queries. The order of fields in compound indexes significantly impacts performance.

Index Prefix Rule: A compound index can support queries on any prefix of its fields:

// Create compound index
db.users.createIndex({ status: 1, age: 1, city: 1 });

// Supported query patterns (index prefixes):
// 1. { status: ... }
// 2. { status: ..., age: ... }
// 3. { status: ..., age: ..., city: ... }

// NOT efficiently supported:
// { age: ... }           - skips first field
// { city: ... }          - skips first two fields
// { age: ..., city: ... } - skips first field

Field Ordering Strategies:

ESR Rule (Equality, Sort, Range):

Equality: Fields used in equality conditions first
Sort: Fields used for sorting second
Range: Fields used in range queries last

// Query: Find active users in NYC, aged 25-35, sorted by registration date
db.users
  .find({
    status: "active", // Equality
    city: "NYC", // Equality
    age: { $gte: 25, $lte: 35 }, // Range
  })
  .sort({ registrationDate: -1 }); // Sort

// Optimal compound index:
db.users.createIndex({
  status: 1, // Equality
  city: 1, // Equality
  registrationDate: -1, // Sort
  age: 1, // Range
});

Advanced Compound Index Examples:

// E-commerce product catalog
db.products.createIndex({
  category: 1, // Filter by category
  featured: 1, // Filter by featured status
  price: 1, // Sort/filter by price
  rating: -1, // Sort by rating (descending)
});

// Time-series data
db.metrics.createIndex({
  deviceId: 1, // Partition by device
  timestamp: -1, // Sort by time (newest first)
  metricType: 1, // Filter by metric type
});

// User activity logs
db.logs.createIndex({
  userId: 1, // Filter by user
  action: 1, // Filter by action type
  timestamp: -1, // Sort by time
});

Text Indexes

Text indexes enable full-text search capabilities across string fields, supporting language-specific stemming, case-insensitive search, and relevance scoring.

Creating Text Indexes:

// Single field text index
db.articles.createIndex({ content: "text" });

// Multiple field text index
db.articles.createIndex({
  title: "text",
  content: "text",
  tags: "text",
});

// Weighted text index (title is more important)
db.articles.createIndex(
  {
    title: "text",
    content: "text",
  },
  {
    weights: {
      title: 10,
      content: 1,
    },
    name: "article_text_idx",
  }
);

Text Search Queries:

// Basic text search
db.articles.find({ $text: { $search: "mongodb indexing" } });

// Phrase search
db.articles.find({ $text: { $search: '"database optimization"' } });

// Exclude terms
db.articles.find({ $text: { $search: "mongodb -sql" } });

// Search with score (relevance ranking)
db.articles
  .find(
    { $text: { $search: "mongodb performance" } },
    { score: { $meta: "textScore" } }
  )
  .sort({ score: { $meta: "textScore" } });

// Language-specific search
db.articles.find({ $text: { $search: "base de données", $language: "fr" } });

Text Index Limitations:

Only one text index per collection
Cannot be used with sort operations (except textScore)
No support for partial word matching
Limited language support for stemming

Geospatial Indexes

MongoDB provides specialized indexes for location-based queries, supporting both 2D and spherical (Earth-like) coordinate systems.

2dsphere Indexes (Recommended):

// Create 2dsphere index for GeoJSON data
db.places.createIndex({ location: "2dsphere" });

// Sample GeoJSON document
db.places.insertOne({
  name: "Central Park",
  location: {
    type: "Point",
    coordinates: [-73.9665, 40.7812], // [longitude, latitude]
  },
});

Geospatial Query Examples:

// Find nearby locations (within radius)
db.places.find({
  location: {
    $near: {
      $geometry: {
        type: "Point",
        coordinates: [-73.9857, 40.7484], // Times Square
      },
      $maxDistance: 1000, // 1000 meters
    },
  },
});

// Find locations within polygon
db.places.find({
  location: {
    $geoWithin: {
      $geometry: {
        type: "Polygon",
        coordinates: [
          [
            [-74.0, 40.7],
            [-74.0, 40.8],
            [-73.9, 40.8],
            [-73.9, 40.7],
            [-74.0, 40.7],
          ],
        ],
      },
    },
  },
});

// Find intersecting areas
db.regions.find({
  boundaries: {
    $geoIntersects: {
      $geometry: {
        type: "Point",
        coordinates: [-73.9857, 40.7484],
      },
    },
  },
});

Partial and Sparse Indexes

These specialized indexes reduce storage requirements and improve performance by indexing only specific documents.

Partial Indexes: Index only documents matching specified criteria:

// Index only active users
db.users.createIndex(
  { email: 1 },
  {
    partialFilterExpression: {
      status: "active",
    },
  }
);

// Index only high-value orders
db.orders.createIndex(
  { customerId: 1, orderDate: -1 },
  {
    partialFilterExpression: {
      totalAmount: { $gt: 100 },
    },
  }
);

// Index only recent documents
db.logs.createIndex(
  { userId: 1, action: 1 },
  {
    partialFilterExpression: {
      createdAt: {
        $gte: new Date("2023-01-01"),
      },
    },
  }
);

Sparse Indexes: Skip documents that don’t contain the indexed field:

// Index only documents with phone number
db.users.createIndex({ phoneNumber: 1 }, { sparse: true });

// Compound sparse index
db.products.createIndex(
  { category: 1, discount: 1 },
  { sparse: true } // Skip products without discount
);

TTL (Time To Live) Indexes

TTL indexes automatically remove documents after a specified time period, perfect for managing temporary or time-sensitive data.

Creating TTL Indexes:

// Expire documents 30 days after creation
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 2592000 } // 30 days in seconds
);

// Expire at specific date
db.notifications.createIndex({ expireAt: 1 }, { expireAfterSeconds: 0 });

// Sample documents
db.sessions.insertOne({
  userId: "user123",
  sessionToken: "abc123",
  createdAt: new Date(), // Will expire in 30 days
});

db.notifications.insertOne({
  message: "Special offer!",
  expireAt: new Date("2023-12-31T23:59:59Z"),
});

TTL Index Considerations:

Only works on Date fields or arrays containing Dates
MongoDB runs a background task every 60 seconds to remove expired documents
Cannot be compound indexes
Useful for sessions, logs, cache data, and temporary collections

Index Performance Analysis

Query Execution Plans

Understanding execution plans is crucial for optimizing query performance and validating index effectiveness.

Using explain() Method:

// Basic execution plan
db.users.find({ email: "alice@example.com" }).explain();

// Detailed execution plan
db.users
  .find({ status: "active", age: { $gte: 25 } })
  .explain("executionStats");

// All available plans comparison
db.users.find({ city: "NYC" }).explain("allPlansExecution");

Key Metrics to Analyze:

// Example execution plan output
{
  "executionStats": {
    "totalDocsExamined": 1,      // Documents scanned
    "totalKeysExamined": 1,      // Index keys scanned
    "executionTimeMillis": 0,    // Query execution time
    "indexesUsed": ["email_1"],  // Which indexes were used
    "stage": "IXSCAN",          // Index scan (good)
    // vs "COLLSCAN"             // Collection scan (bad)
  }
}

Performance Indicators:

Good Performance: totalKeysExamined ≈ totalDocsExamined ≈ documents returned
Poor Performance: High ratio of examined to returned documents
Index Usage: Stage should be IXSCAN not COLLSCAN

Index Usage Monitoring

Regular monitoring helps identify unused indexes and optimization opportunities.

Index Usage Statistics:

// View index usage statistics
db.users.aggregate([{ $indexStats: {} }]);

// Find unused indexes
db.runCommand({ collStats: "users", indexDetails: true });

// Monitor slow operations
db.setProfilingLevel(2, { slowms: 100 });
db.system.profile.find().limit(5).sort({ ts: -1 }).pretty();

Index Maintenance Commands:

// List all indexes
db.users.getIndexes();

// Get index sizes
db.users.stats().indexSizes;

// Rebuild indexes
db.users.reIndex();

// Drop unused index
db.users.dropIndex("unused_field_1");

Advanced Indexing Strategies

Index Intersection

MongoDB can use multiple indexes simultaneously to optimize complex queries through index intersection.

// Create separate indexes
db.products.createIndex({ category: 1 });
db.products.createIndex({ price: 1 });
db.products.createIndex({ rating: 1 });

// MongoDB can intersect indexes for this query
db.products.find({
  category: "electronics",
  price: { $lt: 500 },
  rating: { $gte: 4.0 },
});

// Check if intersection is used
db.products
  .find({
    category: "electronics",
    price: { $lt: 500 },
    rating: { $gte: 4.0 },
  })
  .explain("executionStats");

Index Hints

Force MongoDB to use specific indexes when the query optimizer doesn’t choose optimally.

// Force use of specific index
db.users.find({ name: "Alice", age: 30 }).hint({ name: 1 });

// Force collection scan (disable indexes)
db.users.find({ status: "active" }).hint({ $natural: 1 });

// Use index by name
db.products.find({ category: "electronics" }).hint("category_price_idx");

Collation and Indexes

Handle locale-specific sorting and comparison rules with collation-aware indexes.

// Create index with collation
db.users.createIndex(
  { name: 1 },
  {
    collation: {
      locale: "en",
      strength: 2, // Case insensitive
    },
  }
);

// Query must use same collation
db.users.find({ name: "Alice" }).collation({ locale: "en", strength: 2 });

Indexing Best Practices

Index Design Principles

Selectivity First:

Index highly selective fields first
High cardinality = better performance
Avoid indexing boolean fields alone

Query Pattern Analysis:

// Analyze actual query patterns
db.setProfilingLevel(2, { slowms: 0 });

// After collecting data, analyze common patterns
db.system.profile.aggregate([
  { $group: { _id: "$command.filter", count: { $sum: 1 } } },
  { $sort: { count: -1 } },
]);

ESR Rule Application:

// Bad: Range field first
db.orders.createIndex({ amount: 1, status: 1, customerId: 1 });

// Good: Equality fields first, range last
db.orders.createIndex({ status: 1, customerId: 1, amount: 1 });

Index Maintenance

Regular Index Review:

Monitor index usage monthly
Remove unused indexes
Update indexes based on new query patterns

Background Index Creation:

// Create indexes in background (production safe)
db.users.createIndex({ email: 1 }, { background: true });

Index Size Management:

// Monitor index sizes
db.users.stats().indexSizes;

// Total collection size vs index size ratio
db.users.stats().size / db.users.stats().totalIndexSize;

Sharding Considerations

Shard Key Indexes:

// Shard key must be indexed
sh.shardCollection("mydb.users", { userId: "hashed" });

// Compound shard key
sh.shardCollection("mydb.orders", { customerId: 1, orderDate: 1 });

Cross-Shard Query Optimization:

Include shard key in queries when possible
Use targeted queries to avoid scatter-gather operations

Common Indexing Pitfalls

Over-Indexing

Problems:

Increased storage requirements
Slower write operations
Memory overhead

Solutions:

// Instead of multiple single-field indexes:
db.users.createIndex({ firstName: 1 });
db.users.createIndex({ lastName: 1 });
db.users.createIndex({ email: 1 });

// Use compound index for common query patterns:
db.users.createIndex({ email: 1, firstName: 1, lastName: 1 });

Incorrect Field Order

Wrong Order:

// Query: Find active users in specific city
db.users.find({ status: "active", city: "NYC" });

// Bad index (low selectivity first)
db.users.createIndex({ status: 1, city: 1 });

Correct Order:

// Good index (high selectivity first)
db.users.createIndex({ city: 1, status: 1 });

Ignoring Sort Operations

Inefficient:

// Query with sort
db.products.find({ category: "electronics" }).sort({ price: -1 });

// Index doesn't support sort efficiently
db.products.createIndex({ category: 1 });

Efficient:

// Index supports both filter and sort
db.products.createIndex({ category: 1, price: -1 });

Real-World Indexing Examples

E-commerce Platform

// Product catalog
db.products.createIndex({
  category: 1, // Filter by category
  subcategory: 1, // Filter by subcategory
  price: 1, // Sort/filter by price
  rating: -1, // Sort by rating (descending)
});

// User orders
db.orders.createIndex({
  customerId: 1, // Customer lookup
  status: 1, // Filter by status
  createdAt: -1, // Sort by date (newest first)
});

// Inventory tracking
db.inventory.createIndex({
  sku: 1, // Unique product identifier
  warehouse: 1, // Filter by location
  quantity: 1, // Filter by availability
});

// User posts timeline
db.posts.createIndex({
  authorId: 1, // Filter by author
  visibility: 1, // Public/private filter
  createdAt: -1, // Chronological order
});

// Friend connections
db.connections.createIndex({
  userId: 1, // User lookup
  status: 1, // Connection status
  connectedAt: -1, // Connection date
});

// Content search
db.posts.createIndex(
  {
    content: "text",
    tags: "text",
  },
  {
    weights: { content: 1, tags: 5 },
  }
);

IoT Data Platform

// Sensor readings
db.readings.createIndex({
  deviceId: 1, // Partition by device
  timestamp: -1, // Time-based queries
  sensorType: 1, // Filter by sensor
});

// Device management
db.devices.createIndex({
  location: "2dsphere", // Geospatial queries
});

// TTL for old data
db.readings.createIndex(
  { timestamp: 1 },
  { expireAfterSeconds: 7776000 } // 90 days
);

Performance Testing and Optimization

Benchmarking Strategies

Create Test Data:

// Generate realistic test data
for (let i = 0; i < 1000000; i++) {
  db.users.insertOne({
    _id: i,
    email: `user${i}@example.com`,
    status: Math.random() > 0.8 ? "inactive" : "active",
    city: cities[Math.floor(Math.random() * cities.length)],
    age: Math.floor(Math.random() * 60) + 18,
    createdAt: new Date(
      Date.now() - Math.random() * 365 * 24 * 60 * 60 * 1000
    ),
  });
}

Test Query Performance:

// Measure query performance
let startTime = new Date();
db.users.find({ city: "NYC", status: "active" }).count();
let endTime = new Date();
print("Query time: " + (endTime - startTime) + "ms");

A/B Test Index Strategies: Compare different indexing approaches using identical queries and measure the performance differences.

Conclusion

Effective MongoDB indexing is both an art and a science. It requires understanding your data patterns, query requirements, and performance goals. The key to success lies in:

Analyzing Query Patterns: Understanding how your application queries data
Strategic Index Design: Creating indexes that support multiple query patterns
Continuous Monitoring: Regularly reviewing index performance and usage
Iterative Improvement: Refining indexes as application requirements evolve

Remember that indexing is about finding the right balance between query performance, storage overhead, and write performance. Start with essential indexes for your most critical queries, then expand strategically based on actual usage patterns and performance requirements.

MongoDB Indexing Strategies

Introduction to MongoDB Indexing

Key Benefits of Proper Indexing

Understanding Index Fundamentals

How MongoDB Indexes Work

Index Types Overview

Core Indexing Strategies

Single Field Indexes

Compound Indexes

Text Indexes

Geospatial Indexes

Partial and Sparse Indexes

TTL (Time To Live) Indexes

Index Performance Analysis

Query Execution Plans

Index Usage Monitoring

Advanced Indexing Strategies

Index Intersection

Index Hints

Collation and Indexes

Indexing Best Practices

Index Design Principles

Index Maintenance

Sharding Considerations

Common Indexing Pitfalls

Over-Indexing

Incorrect Field Order

Ignoring Sort Operations

Real-World Indexing Examples

E-commerce Platform

IoT Data Platform

Performance Testing and Optimization

Benchmarking Strategies

Conclusion

Additional Resources

MongoDB Documentation

Tools and Utilities

Advanced Topics

MongoDB Indexing Strategies

Introduction to MongoDB Indexing

Key Benefits of Proper Indexing

Understanding Index Fundamentals

How MongoDB Indexes Work

Index Types Overview

Core Indexing Strategies

Single Field Indexes

Compound Indexes

Text Indexes

Geospatial Indexes

Partial and Sparse Indexes

TTL (Time To Live) Indexes

Index Performance Analysis

Query Execution Plans

Index Usage Monitoring

Advanced Indexing Strategies

Index Intersection

Index Hints

Collation and Indexes

Indexing Best Practices

Index Design Principles

Index Maintenance

Sharding Considerations

Common Indexing Pitfalls

Over-Indexing

Incorrect Field Order

Ignoring Sort Operations

Real-World Indexing Examples

E-commerce Platform

Social Media Application

IoT Data Platform

Performance Testing and Optimization

Benchmarking Strategies

Conclusion

Additional Resources

MongoDB Documentation

Tools and Utilities

Advanced Topics