As a MongoDB developer, being able to efficiently query and retrieve documents from your collections using the unique _id field is an essential skill. In this comprehensive guide, we‘ll explore how MongoDB generates _id values, discuss the technical details of ObjectId, learn various techniques for querying and sorting by _id, look at real-world use cases, and review best practices for using _id in your applications.
When you insert a document into MongoDB without an explicit _id field, it automatically adds one with a special ObjectId value. This autogeneration makes using _id for lookups easy – you don‘t have to worry about generating unique IDs yourself on the client or in your application code. Let‘s start by understanding what these ObjectId values contain.
Anatomy of a MongoDB ObjectId
A MongoDB ObjectId contains:
- A 4 byte timestamp value – this is a seconds-since-Unix-epoch date value that allows OrderIds to be chronologically sorted.
- A 5 byte random value – this provides uniqueness at scale and prevents collisions.
- A 3 byte incrementing counter – each ObjectId on a given machine and process will have a unique counter field.
- A 3 byte machine identifier – identifies the server‘s MAC address.
- A 2 byte process id – identifies the ObjectId-generating process on the server.
When combined, these different fields create a 96-bit or 12-byte value that‘s essentially guaranteed to be unique across MongoDB clusters, processes, machines, and time.
Here‘s an example ObjectId broken down into its component parts:
ObjectId("63f961d268efea2690e2ab01")
63f961d2 # Timestamp - 2023-02-20T20:17:38Z
68efea26 # Machine ID
90e2ab01 # Process ID + Counter
As you can see, the timestamp allows ObjectIds to be chronologically sorted by creation time. The additional fields ensure uniqueness.
Now let‘s look at how many ObjectIds can be generated based on the sizing of these fields:
- Timestamp – 4 bytes allows ~69 years of timestamp values
- Counter – 2 bytes allows ~65,536 unique counter values
- Machine ID – 3 bytes = 16,777,216 possible machine ids
- Process ID – 2 bytes = 65,536 process ids
With 3 bytes each for machine id and process id, ObjectId can generate over 281 trillion unique values on a given machine. Combined with timestamps for chronological ordering, ObjectIds are great for high volume inserts and queries by _id.
Inserting Documents with _id
When inserting a document into a MongoDB collection without providing an explicit _id field, the BSON serializer will add an _id with a generated ObjectId:
> db.users.insert({name: "John"})
> db.users.findOne()
{
_id: ObjectId("63f9620e3fed7d5c0ac505e4"),
name: "John"
}
As you can see, MongoDB added a 96-bit ObjectId as the _id.
You can also specify a custom _id when inserting:
> db.users.insert({
_id: "user1",
name: "Jane"
})
> db.users.findOne({_id: "user1"})
{
_id: "user1",
name: "Jane"
}
Here we inserted a document with _id: "user1", a string id instead of ObjectId.
Some benefits of custom ids:
- Readable, application-specific ids
- Control over id generation logic
- Configuration of shard keys
- Pre-calculated ids for bulk inserts
Downsides of custom ids:
- Must ensure uniqueness yourself
- No built-in chronological sorting
In most cases, it‘s best to leverage the automatic ObjectId generation. But for some use cases like migrating data or integrating with other systems, a custom _id may be preferable.
Ok, now that we‘ve seen how _id fields are created, let‘s look at querying by _id…
Finding Documents by _id Value
To find a document by its _id value in MongoDB, use the find() method and pass an _id query filter like:
db.users.find({
_id: ObjectId("63f9620e3fed7d5c0ac505e4")
})
For ObjectIds, you need to wrap the hex string value in ObjectId() to query correctly.
This will return all documents that match the _id value like:
{
_id: ObjectId("63f9620e3fed7d5c0ac505e4"),
name: "John"
}
With indexes on _id, matching documents can be found efficiently without needing to scan all documents in a collection.
For custom string or integer _id values, you can query directly, without ObjectId:
db.users.find({
_id: "user1"
})
By default, find() will return a Cursor instance that lazily returns all matching documents. To return only the first match, use findOne():
db.users.findOne({
_id: "user1"
})
This is useful for lookups by _id when you know the query returns a single document.
Sorting Query Results by _id
To sort find results by the _id field, use the sort() method:
// Ascending sort
db.users.find().sort({_id: 1})
// Descending sort
db.users.find().sort({_id: -1})
By default, MongoDB indexes can provide forward and reverse scans on the _id index for fast sorting without needing to load all documents into memory.
Sorting by _id is handy for use cases like:
- Paginating results by chronological
_idorder - Replaying changes feed in order of document
_idupdate time - Migrating documents from oldest to newest
_id
This enables large result sets to be sorted by _id efficiently.
Filtering by _id Range
In addition to exact _id match lookups, you can filter by _id ranges using comparison query operators:
// `_id` greater than a value
db.users.find({
_id: { $gt: ObjectId("63f9620e3fed7d5c0ac505e4") }
})
// `_id` less than a value
db.users.find({
_id: { $lt: ObjectId("63f96345ca3e144a085cc753") }
})
// `_id` in between two values
db.users.find({
_id: {
$gt: ObjectId("63f9620e3fed7d5c0ac505e4"),
$lt: ObjectId("63f96345ca3e144a085cc753")
}
})
You can also combine _id filters with queries on other fields:
db.users.find({
age: { $gte: 21 },
_id: { $lt: ObjectId("63f96345ca3e144a085cc753") }
})
This allows filtering documents by _id ranges and indexes can be leveraged for efficient lookups.
Some use cases for _id range queries:
- Pagination – fetch next page of results by
_id - Change data capture – process updates incrementally by
_idrange - Time-series data – query data for time range by
_idtimestamp
So _id queries give us exact matching, sorting, and range-base filtering out of the box!
Use Cases for Queries by _id
Now that we‘ve seen how to query by _id, let‘s discuss some common use cases:
Web and Mobile Apps
In web and mobile apps, it‘s very common to pass document _id values in request parameters to fetch data:
// Get user profile
router.get(‘/users/:userId‘, async (req, res) => {
const userId = req.params.userId
const user = await db.users.findOne({_id: userId});
res.json(user)
})
Here the UI displays the userId, passes it to the server, and the document is fetched directly by _id.
Background Jobs
Background jobs like analytics computations often need to iterate through all documents in a collection by _id:
let lastId = null
while(true) {
// Fetch next batch by ID range
const users = db.users.find({
_id: {
$gt: lastId
}
}).limit(100)
// Process users
lastId = users.last()._id
}
This incremental processing by _id range allows efficient distributed analysis.
Change Streams
Change streams can watch collections for inserts and updates sorted by _id which enables replaying all changes in order:
const changeStream = db.users.watch()
changeStream.on(‘change‘, (change) => {
switch(change.operationType) {
case ‘insert‘:
// Process insert
case ‘update‘:
// Process update
}
})
Sorting the change events by _id order allows rebuilding collection state incrementally.
There are many other examples like looking up documents across services by _id, migrating databases by _id ranges, deduplication, and more.
Now let‘s discuss some best practices when using _id for queries.
Best Practices for Queries by _id
Here are some tips for working with _id fields efficiently:
-
Hide raw
_idvalues – don‘t expose unprocessed ObjectId values directly in APIs and UIs. Map to application-specific ids. -
Index
_id– the_idindex is created automatically but ensure it‘s there. -
Use
findOne()when expecting 1 match – avoid cursors if the result set will only be a single document. -
If sorting by
_id, set an index sort – this avoids loading documents into RAM. -
Avoid
_idin aggregations – match and group using business fields, not raw_idvalues. -
Reference documents by
_id– don‘t re-query, keep the_idvalue on entities returned to clients. -
Configure shard key – sharding on
_idensures distribution but other keys may be better.
Following these best practices will optimize application performance and make _id queries more efficient.
While _id queries are extremely useful in MongoDB, let‘s briefly discuss some alternatives…
Alternatives to Queries by _id
In some cases, queries by _id may not be the best solution:
- If your application or external system requires different primary keys than ObjectId.
- When needing sequential, monotonically increasing keys like auto-incrementing integers.
- If documents need to be identifiable by multiple primary keys.
-
When sharding on
_idalone is insufficient.
For these use cases, some alternatives are:
-
Generate custom ids – specify
_idvalues manually on insert. - Create one or more additional unique indexes – index application-specific id fields.
- Use composite keys – sharding and querying on multiple fields.
- Store lookup values denormalized on documents – avoids secondaries lookups.
-
Implement application layer mappings – translate app ids to
_id.
So while _id is great for general purpose lookups, in some specific cases an alternative solution may be better suited.
Summary
MongoDB‘s _id field provides automatic, performant indexing and unique object identification out of the box. Queries, sorting, filtering, and lookups using _id are easy and efficient in MongoDB.
Here are some key takeaways:
-
_idis automatically set on insert with unique ObjectId values if not specified. - ObjectIds contain timestamps and machine/process details for global uniqueness.
-
Find documents by exact
_idvalue usingfind()andfindOne(). -
Sort query results by
_idusing.sort(). -
Filter by
_idranges using comparison query operators. -
Reference documents in apps by
_idinstead of extra lookups. -
Avoid handling raw
_idvalues in application code.
I hope this guide gave you a comprehensive overview of finding documents by _id in MongoDB. Now you‘re ready to leverage _id queries like a pro! Let me know if you have any other questions.



