single thread flush by generall · Pull Request #7388 · qdrant/qdrant

generall · 2025-10-12T22:06:48Z

Currently, we rely on segment flush ordering in case of copy-on-write behaviour.
For example:

We move point from immutable segment to mutable
Immutable segment is flushed (points are deleted)
Server gets killed
There are no records of points being moved

To compensate for this, we currently perform flush in a very specific order: we flush im mutable segments first.
There is, however, a problem. If we use sync=false each segment flush operation is spawned in a dedicated thread. Starting from this point there are no guarantees that segment is going to be flushed in the same order as it was scheduled for flushing.

To demonstrate this, it is enough to include random sleep into flush_op function:

2025-10-12T18:33:03.166297Z  INFO actix_web::middleware::logger: 127.0.0.1 "PUT /collections/benchmark/points/payload?wait=true HTTP/1.1" 200 73 "http://localhost:6333/dashboard" 0.005162
Flushing appendable=true segment at ./storage/collections/benchmark/0/segments/dbcca35d-8d40-4228-97cd-d8c82469e1ee after sleeping for 103ms
Flushing appendable=false segment at ./storage/collections/benchmark/0/segments/e3ff4c91-ede5-4406-b569-207683977a40 after sleeping for 261ms

2025-10-12T18:33:40.819012Z  INFO actix_web::middleware::logger: 127.0.0.1 "PUT /collections/benchmark/points/payload?wait=true HTTP/1.1" 200 73 "http://localhost:6333/dashboard" 0.014875
Flushing appendable=true segment at ./storage/collections/benchmark/0/segments/dbcca35d-8d40-4228-97cd-d8c82469e1ee after sleeping for 53ms
Flushing appendable=false segment at ./storage/collections/benchmark/0/segments/e3ff4c91-ede5-4406-b569-207683977a40 after sleeping for 294ms

2025-10-12T18:34:00.287522Z  INFO actix_web::middleware::logger: 127.0.0.1 "PUT /collections/benchmark/points/payload?wait=true HTTP/1.1" 200 73 "http://localhost:6333/dashboard" 0.013730
Flushing appendable=false segment at ./storage/collections/benchmark/0/segments/e3ff4c91-ede5-4406-b569-207683977a40 after sleeping for 236ms
Flushing appendable=true segment at ./storage/collections/benchmark/0/segments/dbcca35d-8d40-4228-97cd-d8c82469e1ee after sleeping for 355ms

What this PR does:

instead of letting segments to individually schedule flushing, they now only return flusher object
SegmentHolder owns single common thread for flushing all segments sequentially
Persisted version is read directly from required variable

flush all segments in one thread sequentially
bonus: move flush-related functions into dedicated file

lib/segment/src/segment/entry.rs

generall · 2025-10-12T22:09:40Z

lib/segment/src/segment/segment_ops.rs


-    // Joins flush thread if exists
-    // Returns lock to guarantee that there will be no other flush in a different thread
-    pub(super) fn lock_flushing(


moved to SegmentHolder

generall · 2025-10-12T22:11:27Z

lib/segment/src/segment/tests.rs

 }

-#[test]
-fn test_background_flush() {


Maybe we need a new test to replace this one?

generall · 2025-10-12T22:12:18Z

lib/shard/src/segment_holder/flush.rs

+    ///
+    /// If there are unsaved changes after flush - detects lowest unsaved change version.
+    /// If all changes are saved - returns max version.
+    fn get_max_persisted_version(


This function doesn't require flush to collect persisted versions anymore

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

lib/segment/src/segment/entry.rs (1)
630-643: Prefer explicit match arm over catch-all pattern.

The catch-all pattern (_, _) => {} at line 642 handles the remaining case (Some(_), None) where data exists but hasn't been persisted yet. Per coding guidelines, prefer explicit patterns to make the intent clearer and catch future enum variants.

Apply this diff to make the pattern explicit:
         match (self.version, current_persisted_version) {
             (None, _) => {
                 // Segment is empty, nothing to flush
                 return None;
             }
             (Some(version), Some(persisted_version)) => {
                 if !force && version == persisted_version {
                     log::trace!("not flushing because version == persisted_version");
                     // Segment is already flushed
                     return None;
                 }
             }
-            (_, _) => {}
+            (Some(_), None) => {
+                // Segment has data but hasn't been persisted yet
+            }
         }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7fe24b and 7ab935a.

📒 Files selected for processing (13)

lib/segment/src/entry/entry_point.rs (3 hunks)
lib/segment/src/segment/entry.rs (4 hunks)
lib/segment/src/segment/mod.rs (2 hunks)
lib/segment/src/segment/segment_ops.rs (1 hunks)
lib/segment/src/segment/snapshot.rs (1 hunks)
lib/segment/src/segment/tests.rs (0 hunks)
lib/segment/src/segment_constructor/segment_constructor_base.rs (0 hunks)
lib/segment/tests/integration/segment_tests.rs (2 hunks)
lib/segment/tests/integration/sparse_vector_index_search_tests.rs (1 hunks)
lib/shard/src/proxy_segment/segment_entry.rs (3 hunks)
lib/shard/src/proxy_segment/tests.rs (1 hunks)
lib/shard/src/segment_holder/flush.rs (1 hunks)
lib/shard/src/segment_holder/mod.rs (3 hunks)

💤 Files with no reviewable changes (2)

lib/segment/src/segment/tests.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/segment/src/segment/snapshot.rs
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/shard/src/segment_holder/mod.rs
lib/segment/src/segment/entry.rs
lib/shard/src/proxy_segment/segment_entry.rs
lib/shard/src/segment_holder/flush.rs
lib/segment/src/entry/entry_point.rs
lib/segment/src/segment/mod.rs
lib/segment/tests/integration/segment_tests.rs
lib/shard/src/proxy_segment/tests.rs
lib/segment/src/segment/segment_ops.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/segment/src/segment/snapshot.rs
lib/shard/src/segment_holder/mod.rs
lib/segment/src/segment/entry.rs
lib/shard/src/proxy_segment/segment_entry.rs
lib/shard/src/segment_holder/flush.rs
lib/segment/src/entry/entry_point.rs
lib/segment/src/segment/mod.rs
lib/shard/src/proxy_segment/tests.rs
lib/segment/src/segment/segment_ops.rs

**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/tests/integration/segment_tests.rs

🧠 Learnings (1)

📚 Learning: 2025-09-01T11:42:06.964Z

Learnt from: timvisee
PR: qdrant/qdrant#7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.

Applied to files:

lib/shard/src/segment_holder/mod.rs
lib/shard/src/segment_holder/flush.rs

🧬 Code graph analysis (7)

lib/shard/src/segment_holder/mod.rs (1)

lib/segment/src/entry/entry_point.rs (1)

flush (277-282)

lib/segment/src/segment/entry.rs (4)

lib/segment/src/entry/entry_point.rs (3)

persistent_version (36-36)

flusher (274-274)

version (33-33)

lib/shard/src/proxy_segment/segment_entry.rs (3)

persistent_version (30-32)

flusher (630-634)

version (26-28)

lib/segment/src/index/struct_payload_index.rs (1)

flusher (1093-1127)

lib/segment/src/vector_storage/vector_storage_base.rs (2)

flusher (113-113)

flusher (1174-1218)

lib/shard/src/proxy_segment/segment_entry.rs (2)

lib/segment/src/entry/entry_point.rs (2)

persistent_version (36-36)

flusher (274-274)

lib/segment/src/segment/entry.rs (2)

persistent_version (43-45)

flusher (627-761)

lib/shard/src/segment_holder/flush.rs (1)

lib/shard/src/segment_holder/mod.rs (4)

segments (941-944)

aloha_lock_segment_read (590-606)

segment_ids (306-311)

get (262-266)

lib/segment/src/entry/entry_point.rs (2)

lib/segment/src/segment/entry.rs (2)

persistent_version (43-45)

flusher (627-761)

lib/shard/src/proxy_segment/segment_entry.rs (2)

persistent_version (30-32)

flusher (630-634)

lib/shard/src/proxy_segment/tests.rs (1)

lib/shard/src/proxy_segment/mod.rs (1)

new (43-69)

lib/segment/src/segment/segment_ops.rs (2)

lib/segment/src/entry/entry_point.rs (1)

flusher (274-274)

lib/segment/src/segment/entry.rs (1)

flusher (627-761)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: e2e-tests
GitHub Check: test-consensus-compose
GitHub Check: integration-tests-consensus
GitHub Check: test-consistency
GitHub Check: integration-tests
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: lint
GitHub Check: rust-tests (macos-latest)

🔇 Additional comments (8)

lib/segment/src/segment/segment_ops.rs (1)

484-485: Sequential flusher usage looks solid.

Using the returned flusher closure here keeps the repair path consistent with the single-threaded flush orchestration.

lib/shard/src/segment_holder/mod.rs (1)

1-77: Centralized flush thread holder fits the redesign.

The shared flush_thread guard and module split give SegmentHolder a clean single-thread flush coordination point.

lib/shard/src/proxy_segment/segment_entry.rs (1)

30-634: Proxy delegates to the new flush API correctly.

Forwarding persistent_version and reusing the wrapped flusher keeps proxy segments aligned with the core segment semantics.

lib/segment/src/segment/mod.rs (1)

138-149: Return type qualification is welcome.

Exporting destroy_rocksdb with the explicit OperationResult path avoids type inference surprises for callers.

lib/segment/src/entry/entry_point.rs (1)

35-282: Trait surface matches the single-thread flusher model.

Exposing persistent_version and offering a default flush that simply drives the optional flusher gives downstream code a clean, consistent contract.

lib/segment/src/segment/entry.rs (3)

12-14: LGTM: Flusher import added.

The addition of Flusher to the imports aligns with the new flush mechanism where segments return closures instead of performing inline flushes.

43-45: LGTM: Persistent version accessor added.

The new persistent_version() method correctly reads the persisted version with proper locking and provides a sensible default of 0 for None.

627-761: Well-designed flush deferral mechanism.

The refactoring from inline flush to closure-based flusher enables sequential execution by the caller (SegmentHolder), which is the core objective of this PR. The implementation correctly:

Returns None early when there's nothing to flush

Captures all necessary flushers and state

Implements proper flush ordering (id_tracker mapping → vectors → payload → versions)

Guards against concurrent flushes with persisted_version locking

Maintains atomicity by checking versions before and after flushing

The detailed comments (lines 662-707) excellently document the flush ordering rationale and recovery scenarios.

lib/segment/src/segment/entry.rs

lib/shard/src/segment_holder/flush.rs

timvisee · 2025-10-13T08:06:04Z

lib/segment/src/entry/entry_point.rs

-    /// Returns maximum version number which is guaranteed to be persisted.
-    fn flush(&self, sync: bool, force: bool) -> OperationResult<SeqNumberType>;
+    /// Returned function returns maximum version number which is guaranteed to be persisted.
+    fn flusher(&self, force: bool) -> Option<Flusher>;


I really like that we have one boolean parameter now.

This currently does not explicitly block parallelism. I'm not sure how this behaves if we flush from two different places. If you haven't done so yet, this might need careful checking.

We do have #7388 (comment), but I'm not confident it is sound.

Note-to-self: check flush implementation in all storages

I manually checked all flushers. All are fine, with one exception and one note:

the mutable ID tracker doe snot support concurrent flushes, we cannot open the files on disk twice concurrently

the last persisted version in the segment holder is not sound when concurrent flushes are happening, we might acknowledge the wrong version

Since these two problems currently don't clearly show, I propose to handle them in a separate PR once all our current work is merged.

lib/segment/src/segment/entry.rs

lib/segment/src/segment/mod.rs

lib/segment/src/segment/entry.rs

timvisee · 2025-10-13T09:29:19Z

To fully fix segment ordering, even with proxies, I think we also need a change like this. Though I suggest to keep the PRs separate.

agourlay · 2025-10-13T10:04:41Z

FYI running Crasher on this branch locally shows no sign of missing points after 30 minutes of runtime 🤞

timvisee · 2025-10-13T14:27:14Z

#7381 is now adjusted and rebased on top of this PR.

generall · 2025-10-13T22:28:20Z

depends on #7400 to fix tests

* flush all segments in one thread sequentially * bonus: move flush-related functions into dedicated file * Minor comment tweaks * await for flush on segment holder level * fmt * Minor improvement, preallocate vector for payload index flushers * Remove invalid comment --------- Co-authored-by: timvisee <tim@visee.me>

generall commented Oct 12, 2025

View reviewed changes

lib/segment/src/segment/entry.rs Show resolved Hide resolved

generall commented Oct 12, 2025

View reviewed changes

lib/segment/src/segment/tests.rs

}

#[test]

fn test_background_flush() {

Copy link

Member Author

generall Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need a new test to replace this one?

generall commented Oct 12, 2025

View reviewed changes

generall requested review from agourlay and timvisee October 12, 2025 22:12

coderabbitai bot reviewed Oct 12, 2025

View reviewed changes

lib/segment/src/segment/entry.rs Show resolved Hide resolved

lib/shard/src/segment_holder/flush.rs Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

qdrant deleted a comment from coderabbitai bot Oct 13, 2025

This comment was marked as resolved.

Sign in to view

timvisee reviewed Oct 13, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

qdrant deleted a comment from coderabbitai bot Oct 13, 2025

timvisee mentioned this pull request Oct 13, 2025

Improved segment flush ordering #7381

Merged

6 tasks

generall mentioned this pull request Oct 13, 2025

fix segment repair on load #7400

Merged

generall and others added 5 commits October 14, 2025 15:11

flush all segments in one thread sequentially

f42bcc2

bonus: move flush-related functions into dedicated file

835758d

Minor comment tweaks

9b75582

await for flush on segment holder level

924fecc

fmt

239e6c6

generall force-pushed the single-thread-flush branch from 995bf4f to 239e6c6 Compare October 14, 2025 13:16

This comment was marked as resolved.

Sign in to view

Minor improvement, preallocate vector for payload index flushers

4c82502

qdrant deleted a comment from coderabbitai bot Oct 14, 2025

Remove invalid comment

dbfb285

This comment was marked as resolved.

Sign in to view

qdrant deleted a comment from coderabbitai bot Oct 14, 2025

timvisee approved these changes Oct 14, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

qdrant deleted a comment from coderabbitai bot Oct 14, 2025

timvisee merged commit b8590a9 into dev Oct 14, 2025
15 checks passed

timvisee deleted the single-thread-flush branch October 14, 2025 14:39

generall mentioned this pull request Oct 17, 2025

use mutex to ensure sequential segment flushed + await on drop #7416

Merged

timvisee mentioned this pull request Nov 14, 2025

Bump version to 1.16.0 #7535

Merged

Conversation

generall commented Oct 12, 2025 • edited by timvisee Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

generall Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

generall Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

generall Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

timvisee Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timvisee Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timvisee commented Oct 13, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

agourlay commented Oct 13, 2025

Uh oh!

timvisee commented Oct 13, 2025

Uh oh!

generall commented Oct 13, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

generall commented Oct 12, 2025 •

edited by timvisee

Loading

timvisee Oct 13, 2025 •

edited

Loading