Mutable in-memory ID tracker without RocksDB#6150
Merged
Conversation
370af0d to
055e9b5
Compare
This was referenced Mar 12, 2025
timvisee
commented
Mar 12, 2025
Comment on lines
+236
to
+242
| // Take out pending mappings to flush and replace it with a preallocated vector to avoid | ||
| // frequent reallocation on a busy segment | ||
| let pending_mappings = { | ||
| let mut pending_mappings = self.pending_mappings.lock(); | ||
| let count = pending_mappings.len(); | ||
| mem::replace(&mut *pending_mappings, Vec::with_capacity(count)) | ||
| }; |
Member
Author
There was a problem hiding this comment.
This tries to be intelligent about memory allocation.
When we flush we immediately reallocate the buffer to have at least the capacity for the current number of points.
It will save us a bunch of (expensive) reallocations on a hot ID tracker receiving a lot of upsertions.
If there's no point ID mapping changes for some time this will eventually pre allocate nothing, which is identical to not having this optimization.
This comment was marked as resolved.
This comment was marked as resolved.
JojiiOfficial
approved these changes
Mar 12, 2025
agourlay
reviewed
Mar 12, 2025
agourlay
approved these changes
Mar 12, 2025
ffuugoo
reviewed
Mar 13, 2025
This was referenced Mar 14, 2025
timvisee
added a commit
that referenced
this pull request
Mar 21, 2025
* Add initial mutable ID tracker * Correctly handle duplicate point mappings and deleted flags * Improve error handling in flush * Preallocate capacity for pending mappings/versions more intelligently * Warn or error about missing ID tracker files * Reformat * Don't crash if just the last mapping/version entry is corrupt * Move mapping and point parsing into separate functions * Extract loading logic into separate functions * Do not allow partially corrupted ID tracker files for now * Remove TODOs * Minor improvements * Fsync mappings and versions file after writing to it * Return error when fsync fails
This was referenced Jul 22, 2025
This was referenced Jan 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tracked in: #6157
Add initial implementation of mutable ID tracker. It's in-memory and persisted on disk. The key selling point is that it does not rely on RocksDB, but on simple files.
The idea is simple: the ID tracker holds a list of mappings and a list of point versions. All changes are simply appended to a file on disk. When loading from disk we scroll through the whole file and deduplicate in memory so that only the last mappings are kept.
Obviously, this structure can grow forever if we're not careful. That's why it relies on Qdrant's optimizers. Once the ID tracker collects too many changes, the optimizer will pick it up and create a new ID tracker. The new ID tracker will start from scratch, dropping all the garbage we had collected along the way.
The new ID tracker is ported from our simple ID tracker. What changed in this type is the backing storage - now using simple files.
I'm implementing this one step at a time. Tests are in the next PR, and there's more to come. Please see the tracking issue for more information.
All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --all --all-featurescommand?Changes to Core Features:
Have you written new tests for your core changes, as applicable?