How to Find Documents by _id in MongoDB: An In-Depth Guide

As a MongoDB developer, being able to efficiently query and retrieve documents from your collections using the unique _id field is an essential skill. In this comprehensive guide, we‘ll explore how MongoDB generates _id values, discuss the technical details of ObjectId, learn various techniques for querying and sorting by _id, look at real-world use cases, and review best practices for using _id in your applications.

When you insert a document into MongoDB without an explicit _id field, it automatically adds one with a special ObjectId value. This autogeneration makes using _id for lookups easy – you don‘t have to worry about generating unique IDs yourself on the client or in your application code. Let‘s start by understanding what these ObjectId values contain.

Anatomy of a MongoDB ObjectId

A MongoDB ObjectId contains:

A 4 byte timestamp value – this is a seconds-since-Unix-epoch date value that allows OrderIds to be chronologically sorted.
A 5 byte random value – this provides uniqueness at scale and prevents collisions.
A 3 byte incrementing counter – each ObjectId on a given machine and process will have a unique counter field.
A 3 byte machine identifier – identifies the server‘s MAC address.
A 2 byte process id – identifies the ObjectId-generating process on the server.

When combined, these different fields create a 96-bit or 12-byte value that‘s essentially guaranteed to be unique across MongoDB clusters, processes, machines, and time.

Here‘s an example ObjectId broken down into its component parts:

ObjectId("63f961d268efea2690e2ab01")

63f961d2 # Timestamp - 2023-02-20T20:17:38Z
68efea26 # Machine ID
90e2ab01 # Process ID + Counter

As you can see, the timestamp allows ObjectIds to be chronologically sorted by creation time. The additional fields ensure uniqueness.

Now let‘s look at how many ObjectIds can be generated based on the sizing of these fields:

Timestamp – 4 bytes allows ~69 years of timestamp values
Counter – 2 bytes allows ~65,536 unique counter values
Machine ID – 3 bytes = 16,777,216 possible machine ids
Process ID – 2 bytes = 65,536 process ids

With 3 bytes each for machine id and process id, ObjectId can generate over 281 trillion unique values on a given machine. Combined with timestamps for chronological ordering, ObjectIds are great for high volume inserts and queries by _id.

Inserting Documents with `_id`

When inserting a document into a MongoDB collection without providing an explicit _id field, the BSON serializer will add an _id with a generated ObjectId:

> db.users.insert({name: "John"})

> db.users.findOne()
{
  _id: ObjectId("63f9620e3fed7d5c0ac505e4"),
  name: "John"
}

As you can see, MongoDB added a 96-bit ObjectId as the _id.

You can also specify a custom _id when inserting:

> db.users.insert({
  _id: "user1",
  name: "Jane"
})

> db.users.findOne({_id: "user1"})   
{
  _id: "user1",
  name: "Jane"
}

Here we inserted a document with _id: "user1", a string id instead of ObjectId.

Some benefits of custom ids:

Readable, application-specific ids
Control over id generation logic
Configuration of shard keys
Pre-calculated ids for bulk inserts

Downsides of custom ids:

Must ensure uniqueness yourself
No built-in chronological sorting

In most cases, it‘s best to leverage the automatic ObjectId generation. But for some use cases like migrating data or integrating with other systems, a custom _id may be preferable.

Ok, now that we‘ve seen how _id fields are created, let‘s look at querying by _id…

Finding Documents by `_id` Value

To find a document by its _id value in MongoDB, use the find() method and pass an _id query filter like:

db.users.find({
  _id: ObjectId("63f9620e3fed7d5c0ac505e4") 
})

For ObjectIds, you need to wrap the hex string value in ObjectId() to query correctly.

This will return all documents that match the _id value like:

{
  _id: ObjectId("63f9620e3fed7d5c0ac505e4"), 
  name: "John"
}

With indexes on _id, matching documents can be found efficiently without needing to scan all documents in a collection.

For custom string or integer _id values, you can query directly, without ObjectId:

db.users.find({
   _id: "user1" 
})

By default, find() will return a Cursor instance that lazily returns all matching documents. To return only the first match, use findOne():

db.users.findOne({
  _id: "user1" 
})

This is useful for lookups by _id when you know the query returns a single document.

Sorting Query Results by `_id`

To sort find results by the _id field, use the sort() method:

// Ascending sort
db.users.find().sort({_id: 1})

// Descending sort
db.users.find().sort({_id: -1})

By default, MongoDB indexes can provide forward and reverse scans on the _id index for fast sorting without needing to load all documents into memory.

Sorting by _id is handy for use cases like:

Paginating results by chronological _id order
Replaying changes feed in order of document _id update time
Migrating documents from oldest to newest _id

This enables large result sets to be sorted by _id efficiently.

Filtering by `_id` Range

In addition to exact _id match lookups, you can filter by _id ranges using comparison query operators:

// `_id` greater than a value 
db.users.find({
  _id: { $gt: ObjectId("63f9620e3fed7d5c0ac505e4") }
})

// `_id` less than a value
db.users.find({
  _id: { $lt: ObjectId("63f96345ca3e144a085cc753") }  
}) 

// `_id` in between two values
db.users.find({
  _id: { 
    $gt: ObjectId("63f9620e3fed7d5c0ac505e4"),
    $lt: ObjectId("63f96345ca3e144a085cc753")
  }
})

You can also combine _id filters with queries on other fields:

db.users.find({
  age: { $gte: 21 }, 
  _id: { $lt: ObjectId("63f96345ca3e144a085cc753") }
})

This allows filtering documents by _id ranges and indexes can be leveraged for efficient lookups.

Some use cases for _id range queries:

Pagination – fetch next page of results by _id
Change data capture – process updates incrementally by _id range
Time-series data – query data for time range by _id timestamp

So _id queries give us exact matching, sorting, and range-base filtering out of the box!

Use Cases for Queries by `_id`

Now that we‘ve seen how to query by _id, let‘s discuss some common use cases:

Web and Mobile Apps

In web and mobile apps, it‘s very common to pass document _id values in request parameters to fetch data:

// Get user profile
router.get(‘/users/:userId‘, async (req, res) => {
  const userId = req.params.userId

  const user = await db.users.findOne({_id: userId});

  res.json(user)
})

Here the UI displays the userId, passes it to the server, and the document is fetched directly by _id.

Background Jobs

Background jobs like analytics computations often need to iterate through all documents in a collection by _id:

let lastId = null

while(true) {
  // Fetch next batch by ID range
  const users = db.users.find({
    _id: {
      $gt: lastId
    }
  }).limit(100)

  // Process users

  lastId = users.last()._id
}

This incremental processing by _id range allows efficient distributed analysis.

Change Streams

Change streams can watch collections for inserts and updates sorted by _id which enables replaying all changes in order:

const changeStream = db.users.watch()

changeStream.on(‘change‘, (change) => {
  switch(change.operationType) {
    case ‘insert‘:
      // Process insert

    case ‘update‘:
      // Process update 
  } 
})

Sorting the change events by _id order allows rebuilding collection state incrementally.

There are many other examples like looking up documents across services by _id, migrating databases by _id ranges, deduplication, and more.

Now let‘s discuss some best practices when using _id for queries.

Best Practices for Queries by `_id`

Here are some tips for working with _id fields efficiently:

Hide raw _id values – don‘t expose unprocessed ObjectId values directly in APIs and UIs. Map to application-specific ids.
Index _id – the _id index is created automatically but ensure it‘s there.
Use findOne() when expecting 1 match – avoid cursors if the result set will only be a single document.
If sorting by _id, set an index sort – this avoids loading documents into RAM.
Avoid _id in aggregations – match and group using business fields, not raw _id values.
Reference documents by _id – don‘t re-query, keep the _id value on entities returned to clients.
Configure shard key – sharding on _id ensures distribution but other keys may be better.

Following these best practices will optimize application performance and make _id queries more efficient.

While _id queries are extremely useful in MongoDB, let‘s briefly discuss some alternatives…

Alternatives to Queries by `_id`

In some cases, queries by _id may not be the best solution:

If your application or external system requires different primary keys than ObjectId.
When needing sequential, monotonically increasing keys like auto-incrementing integers.
If documents need to be identifiable by multiple primary keys.
When sharding on _id alone is insufficient.

For these use cases, some alternatives are:

Generate custom ids – specify _id values manually on insert.
Create one or more additional unique indexes – index application-specific id fields.
Use composite keys – sharding and querying on multiple fields.
Store lookup values denormalized on documents – avoids secondaries lookups.
Implement application layer mappings – translate app ids to _id.

So while _id is great for general purpose lookups, in some specific cases an alternative solution may be better suited.

Summary

MongoDB‘s _id field provides automatic, performant indexing and unique object identification out of the box. Queries, sorting, filtering, and lookups using _id are easy and efficient in MongoDB.

Here are some key takeaways:

_id is automatically set on insert with unique ObjectId values if not specified.
ObjectIds contain timestamps and machine/process details for global uniqueness.
Find documents by exact _id value using find() and findOne().
Sort query results by _id using .sort().
Filter by _id ranges using comparison query operators.
Reference documents in apps by _id instead of extra lookups.
Avoid handling raw _id values in application code.

I hope this guide gave you a comprehensive overview of finding documents by _id in MongoDB. Now you‘re ready to leverage _id queries like a pro! Let me know if you have any other questions.

How to Find Documents by `_id` in MongoDB: An In-Depth Guide

Anatomy of a MongoDB ObjectId

Inserting Documents with `_id`

Finding Documents by `_id` Value

Sorting Query Results by `_id`

Filtering by `_id` Range

Use Cases for Queries by `_id`

Best Practices for Queries by `_id`

Alternatives to Queries by `_id`

Summary

You maybe like,

Anatomy of a MongoDB ObjectId

Inserting Documents with _id

Finding Documents by _id Value

Sorting Query Results by _id

Filtering by _id Range

Use Cases for Queries by _id

Best Practices for Queries by _id

Alternatives to Queries by _id

Summary

You maybe like,

Related Posts

Inserting Documents with `_id`

Finding Documents by `_id` Value

Sorting Query Results by `_id`

Filtering by `_id` Range

Use Cases for Queries by `_id`

Best Practices for Queries by `_id`

Alternatives to Queries by `_id`