Mutable in-memory ID tracker without RocksDB by timvisee · Pull Request #6150 · qdrant/qdrant

timvisee · 2025-03-11T10:33:58Z

Tracked in: #6157

Add initial implementation of mutable ID tracker. It's in-memory and persisted on disk. The key selling point is that it does not rely on RocksDB, but on simple files.

The idea is simple: the ID tracker holds a list of mappings and a list of point versions. All changes are simply appended to a file on disk. When loading from disk we scroll through the whole file and deduplicate in memory so that only the last mappings are kept.

Obviously, this structure can grow forever if we're not careful. That's why it relies on Qdrant's optimizers. Once the ID tracker collects too many changes, the optimizer will pick it up and create a new ID tracker. The new ID tracker will start from scratch, dropping all the garbage we had collected along the way.

The new ID tracker is ported from our simple ID tracker. What changed in this type is the backing storage - now using simple files.

I'm implementing this one step at a time. Tests are in the next PR, and there's more to come. Please see the tracking issue for more information.

All Submissions:

Contributions should target the dev branch. Did you create your branch from dev?
Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

Have you added an explanation of what your changes do and why you'd like us to include them?
~~Have you written new tests for your core changes, as applicable?~~
Have you successfully ran tests with your changes locally?

timvisee · 2025-03-12T12:59:38Z

lib/segment/src/id_tracker/mutable_id_tracker.rs

+        // Take out pending mappings to flush and replace it with a preallocated vector to avoid
+        // frequent reallocation on a busy segment
+        let pending_mappings = {
+            let mut pending_mappings = self.pending_mappings.lock();
+            let count = pending_mappings.len();
+            mem::replace(&mut *pending_mappings, Vec::with_capacity(count))
+        };


This tries to be intelligent about memory allocation.

When we flush we immediately reallocate the buffer to have at least the capacity for the current number of points.

It will save us a bunch of (expensive) reallocations on a hot ID tracker receiving a lot of upsertions.

If there's no point ID mapping changes for some time this will eventually pre allocate nothing, which is identical to not having this optimization.

lib/segment/src/id_tracker/mutable_id_tracker.rs

* Add initial mutable ID tracker * Correctly handle duplicate point mappings and deleted flags * Improve error handling in flush * Preallocate capacity for pending mappings/versions more intelligently * Warn or error about missing ID tracker files * Reformat * Don't crash if just the last mapping/version entry is corrupt * Move mapping and point parsing into separate functions * Extract loading logic into separate functions * Do not allow partially corrupted ID tracker files for now * Remove TODOs * Minor improvements * Fsync mappings and versions file after writing to it * Return error when fsync fails

github-actions bot mentioned this pull request Mar 11, 2025

Flaky test hnsw_discover_test::hnsw_discover_precision #2973

Open

timvisee added 12 commits March 12, 2025 13:38

Add initial mutable ID tracker

8676b9c

Correctly handle duplicate point mappings and deleted flags

3d6922e

Improve error handling in flush

4ffc1a6

Preallocate capacity for pending mappings/versions more intelligently

bf1a294

Warn or error about missing ID tracker files

1d88992

Reformat

ed00d2e

Don't crash if just the last mapping/version entry is corrupt

a261a80

Move mapping and point parsing into separate functions

8cf4655

Extract loading logic into separate functions

bc10c67

Do not allow partially corrupted ID tracker files for now

0d01247

Remove TODOs

426606f

Minor improvements

055e9b5

timvisee force-pushed the mutable-id-tracker branch from 370af0d to 055e9b5 Compare March 12, 2025 12:42

This was referenced Mar 12, 2025

Tracking issue: mutable ID tracker without RocksDB #6157

Closed

Add mutable ID tracker tests #6158

Merged

timvisee commented Mar 12, 2025

View reviewed changes

timvisee changed the title ~~WIP: mutable in-memory ID tracker without RocksDB~~ Mutable in-memory ID tracker without RocksDB Mar 12, 2025

timvisee marked this pull request as ready for review March 12, 2025 13:01

This comment was marked as resolved.

Sign in to view

timvisee requested review from JojiiOfficial, agourlay, ffuugoo and generall March 12, 2025 14:03

JojiiOfficial approved these changes Mar 12, 2025

View reviewed changes

agourlay reviewed Mar 12, 2025

View reviewed changes

lib/segment/src/id_tracker/mutable_id_tracker.rs Show resolved Hide resolved

agourlay approved these changes Mar 12, 2025

View reviewed changes

Fsync mappings and versions file after writing to it

f0c92aa

This comment was marked as resolved.

Sign in to view

Return error when fsync fails

4a5f95d

This comment was marked as resolved.

Sign in to view

ffuugoo reviewed Mar 13, 2025

View reviewed changes

lib/segment/src/id_tracker/mutable_id_tracker.rs Show resolved Hide resolved

timvisee merged commit 6a1b9de into dev Mar 13, 2025
17 checks passed

timvisee deleted the mutable-id-tracker branch March 13, 2025 16:12

This was referenced Jul 22, 2025

[map index] use roaring bitmap in mutable map index #6926

Merged

Track payload index schema version #6819

Merged

coderabbitai bot mentioned this pull request Oct 13, 2025

fix segment repair on load #7400

Merged

coderabbitai bot mentioned this pull request Dec 18, 2025

Clone pending updates in buffered storages #7801

Merged

This was referenced Jan 6, 2026

Id tracker single write of pending updates #7872

Closed

id tracker persisted mappings offset #7877

Merged

log mutable id tracker mapping updates #7894

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutable in-memory ID tracker without RocksDB#6150

Mutable in-memory ID tracker without RocksDB#6150
timvisee merged 14 commits intodevfrom
mutable-id-tracker

timvisee commented Mar 11, 2025 •

edited

Loading

Uh oh!

timvisee Mar 12, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

timvisee commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

All Submissions:

New Feature Submissions:

Changes to Core Features:

Uh oh!

timvisee Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

timvisee commented Mar 11, 2025 •

edited

Loading

timvisee Mar 12, 2025 •

edited

Loading