Skip to content

single thread flush#7388

Merged
timvisee merged 7 commits intodevfrom
single-thread-flush
Oct 14, 2025
Merged

single thread flush#7388
timvisee merged 7 commits intodevfrom
single-thread-flush

Conversation

@generall
Copy link
Member

@generall generall commented Oct 12, 2025

Currently, we rely on segment flush ordering in case of copy-on-write behaviour.
For example:

  • We move point from immutable segment to mutable
  • Immutable segment is flushed (points are deleted)
  • Server gets killed
  • There are no records of points being moved

To compensate for this, we currently perform flush in a very specific order: we flush im mutable segments first.
There is, however, a problem. If we use sync=false each segment flush operation is spawned in a dedicated thread. Starting from this point there are no guarantees that segment is going to be flushed in the same order as it was scheduled for flushing.

To demonstrate this, it is enough to include random sleep into flush_op function:

2025-10-12T18:33:03.166297Z  INFO actix_web::middleware::logger: 127.0.0.1 "PUT /collections/benchmark/points/payload?wait=true HTTP/1.1" 200 73 "http://localhost:6333/dashboard" 0.005162
Flushing appendable=true segment at ./storage/collections/benchmark/0/segments/dbcca35d-8d40-4228-97cd-d8c82469e1ee after sleeping for 103ms
Flushing appendable=false segment at ./storage/collections/benchmark/0/segments/e3ff4c91-ede5-4406-b569-207683977a40 after sleeping for 261ms

2025-10-12T18:33:40.819012Z  INFO actix_web::middleware::logger: 127.0.0.1 "PUT /collections/benchmark/points/payload?wait=true HTTP/1.1" 200 73 "http://localhost:6333/dashboard" 0.014875
Flushing appendable=true segment at ./storage/collections/benchmark/0/segments/dbcca35d-8d40-4228-97cd-d8c82469e1ee after sleeping for 53ms
Flushing appendable=false segment at ./storage/collections/benchmark/0/segments/e3ff4c91-ede5-4406-b569-207683977a40 after sleeping for 294ms

2025-10-12T18:34:00.287522Z  INFO actix_web::middleware::logger: 127.0.0.1 "PUT /collections/benchmark/points/payload?wait=true HTTP/1.1" 200 73 "http://localhost:6333/dashboard" 0.013730
Flushing appendable=false segment at ./storage/collections/benchmark/0/segments/e3ff4c91-ede5-4406-b569-207683977a40 after sleeping for 236ms
Flushing appendable=true segment at ./storage/collections/benchmark/0/segments/dbcca35d-8d40-4228-97cd-d8c82469e1ee after sleeping for 355ms

What this PR does:

  • instead of letting segments to individually schedule flushing, they now only return flusher object
  • SegmentHolder owns single common thread for flushing all segments sequentially
  • Persisted version is read directly from required variable

  • flush all segments in one thread sequentially
  • bonus: move flush-related functions into dedicated file


// Joins flush thread if exists
// Returns lock to guarantee that there will be no other flush in a different thread
pub(super) fn lock_flushing(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to SegmentHolder

}

#[test]
fn test_background_flush() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need a new test to replace this one?

///
/// If there are unsaved changes after flush - detects lowest unsaved change version.
/// If all changes are saved - returns max version.
fn get_max_persisted_version(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't require flush to collect persisted versions anymore

@generall generall requested review from agourlay and timvisee October 12, 2025 22:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
lib/segment/src/segment/entry.rs (1)

630-643: Prefer explicit match arm over catch-all pattern.

The catch-all pattern (_, _) => {} at line 642 handles the remaining case (Some(_), None) where data exists but hasn't been persisted yet. Per coding guidelines, prefer explicit patterns to make the intent clearer and catch future enum variants.

Apply this diff to make the pattern explicit:

         match (self.version, current_persisted_version) {
             (None, _) => {
                 // Segment is empty, nothing to flush
                 return None;
             }
             (Some(version), Some(persisted_version)) => {
                 if !force && version == persisted_version {
                     log::trace!("not flushing because version == persisted_version");
                     // Segment is already flushed
                     return None;
                 }
             }
-            (_, _) => {}
+            (Some(_), None) => {
+                // Segment has data but hasn't been persisted yet
+            }
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7fe24b and 7ab935a.

📒 Files selected for processing (13)
  • lib/segment/src/entry/entry_point.rs (3 hunks)
  • lib/segment/src/segment/entry.rs (4 hunks)
  • lib/segment/src/segment/mod.rs (2 hunks)
  • lib/segment/src/segment/segment_ops.rs (1 hunks)
  • lib/segment/src/segment/snapshot.rs (1 hunks)
  • lib/segment/src/segment/tests.rs (0 hunks)
  • lib/segment/src/segment_constructor/segment_constructor_base.rs (0 hunks)
  • lib/segment/tests/integration/segment_tests.rs (2 hunks)
  • lib/segment/tests/integration/sparse_vector_index_search_tests.rs (1 hunks)
  • lib/shard/src/proxy_segment/segment_entry.rs (3 hunks)
  • lib/shard/src/proxy_segment/tests.rs (1 hunks)
  • lib/shard/src/segment_holder/flush.rs (1 hunks)
  • lib/shard/src/segment_holder/mod.rs (3 hunks)
💤 Files with no reviewable changes (2)
  • lib/segment/src/segment/tests.rs
  • lib/segment/src/segment_constructor/segment_constructor_base.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

  • lib/segment/src/segment/snapshot.rs
  • lib/segment/tests/integration/sparse_vector_index_search_tests.rs
  • lib/shard/src/segment_holder/mod.rs
  • lib/segment/src/segment/entry.rs
  • lib/shard/src/proxy_segment/segment_entry.rs
  • lib/shard/src/segment_holder/flush.rs
  • lib/segment/src/entry/entry_point.rs
  • lib/segment/src/segment/mod.rs
  • lib/segment/tests/integration/segment_tests.rs
  • lib/shard/src/proxy_segment/tests.rs
  • lib/segment/src/segment/segment_ops.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

  • lib/segment/src/segment/snapshot.rs
  • lib/shard/src/segment_holder/mod.rs
  • lib/segment/src/segment/entry.rs
  • lib/shard/src/proxy_segment/segment_entry.rs
  • lib/shard/src/segment_holder/flush.rs
  • lib/segment/src/entry/entry_point.rs
  • lib/segment/src/segment/mod.rs
  • lib/shard/src/proxy_segment/tests.rs
  • lib/segment/src/segment/segment_ops.rs
**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

  • lib/segment/tests/integration/sparse_vector_index_search_tests.rs
  • lib/segment/tests/integration/segment_tests.rs
🧠 Learnings (1)
📚 Learning: 2025-09-01T11:42:06.964Z
Learnt from: timvisee
PR: qdrant/qdrant#7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.

Applied to files:

  • lib/shard/src/segment_holder/mod.rs
  • lib/shard/src/segment_holder/flush.rs
🧬 Code graph analysis (7)
lib/shard/src/segment_holder/mod.rs (1)
lib/segment/src/entry/entry_point.rs (1)
  • flush (277-282)
lib/segment/src/segment/entry.rs (4)
lib/segment/src/entry/entry_point.rs (3)
  • persistent_version (36-36)
  • flusher (274-274)
  • version (33-33)
lib/shard/src/proxy_segment/segment_entry.rs (3)
  • persistent_version (30-32)
  • flusher (630-634)
  • version (26-28)
lib/segment/src/index/struct_payload_index.rs (1)
  • flusher (1093-1127)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • flusher (113-113)
  • flusher (1174-1218)
lib/shard/src/proxy_segment/segment_entry.rs (2)
lib/segment/src/entry/entry_point.rs (2)
  • persistent_version (36-36)
  • flusher (274-274)
lib/segment/src/segment/entry.rs (2)
  • persistent_version (43-45)
  • flusher (627-761)
lib/shard/src/segment_holder/flush.rs (1)
lib/shard/src/segment_holder/mod.rs (4)
  • segments (941-944)
  • aloha_lock_segment_read (590-606)
  • segment_ids (306-311)
  • get (262-266)
lib/segment/src/entry/entry_point.rs (2)
lib/segment/src/segment/entry.rs (2)
  • persistent_version (43-45)
  • flusher (627-761)
lib/shard/src/proxy_segment/segment_entry.rs (2)
  • persistent_version (30-32)
  • flusher (630-634)
lib/shard/src/proxy_segment/tests.rs (1)
lib/shard/src/proxy_segment/mod.rs (1)
  • new (43-69)
lib/segment/src/segment/segment_ops.rs (2)
lib/segment/src/entry/entry_point.rs (1)
  • flusher (274-274)
lib/segment/src/segment/entry.rs (1)
  • flusher (627-761)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: e2e-tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests-consensus
  • GitHub Check: test-consistency
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: lint
  • GitHub Check: rust-tests (macos-latest)
🔇 Additional comments (8)
lib/segment/src/segment/segment_ops.rs (1)

484-485: Sequential flusher usage looks solid.

Using the returned flusher closure here keeps the repair path consistent with the single-threaded flush orchestration.

lib/shard/src/segment_holder/mod.rs (1)

1-77: Centralized flush thread holder fits the redesign.

The shared flush_thread guard and module split give SegmentHolder a clean single-thread flush coordination point.

lib/shard/src/proxy_segment/segment_entry.rs (1)

30-634: Proxy delegates to the new flush API correctly.

Forwarding persistent_version and reusing the wrapped flusher keeps proxy segments aligned with the core segment semantics.

lib/segment/src/segment/mod.rs (1)

138-149: Return type qualification is welcome.

Exporting destroy_rocksdb with the explicit OperationResult path avoids type inference surprises for callers.

lib/segment/src/entry/entry_point.rs (1)

35-282: Trait surface matches the single-thread flusher model.

Exposing persistent_version and offering a default flush that simply drives the optional flusher gives downstream code a clean, consistent contract.

lib/segment/src/segment/entry.rs (3)

12-14: LGTM: Flusher import added.

The addition of Flusher to the imports aligns with the new flush mechanism where segments return closures instead of performing inline flushes.


43-45: LGTM: Persistent version accessor added.

The new persistent_version() method correctly reads the persisted version with proper locking and provides a sensible default of 0 for None.


627-761: Well-designed flush deferral mechanism.

The refactoring from inline flush to closure-based flusher enables sequential execution by the caller (SegmentHolder), which is the core objective of this PR. The implementation correctly:

  • Returns None early when there's nothing to flush
  • Captures all necessary flushers and state
  • Implements proper flush ordering (id_tracker mapping → vectors → payload → versions)
  • Guards against concurrent flushes with persisted_version locking
  • Maintains atomicity by checking versions before and after flushing

The detailed comments (lines 662-707) excellently document the flush ordering rationale and recovery scenarios.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 13, 2025
@qdrant qdrant deleted a comment from coderabbitai bot Oct 13, 2025
@timvisee

This comment was marked as resolved.

/// Returns maximum version number which is guaranteed to be persisted.
fn flush(&self, sync: bool, force: bool) -> OperationResult<SeqNumberType>;
/// Returned function returns maximum version number which is guaranteed to be persisted.
fn flusher(&self, force: bool) -> Option<Flusher>;
Copy link
Member

@timvisee timvisee Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that we have one boolean parameter now.

This currently does not explicitly block parallelism. I'm not sure how this behaves if we flush from two different places. If you haven't done so yet, this might need careful checking.

We do have #7388 (comment), but I'm not confident it is sound.

Note-to-self: check flush implementation in all storages

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually checked all flushers. All are fine, with one exception and one note:

  1. the mutable ID tracker doe snot support concurrent flushes, we cannot open the files on disk twice concurrently
  2. the last persisted version in the segment holder is not sound when concurrent flushes are happening, we might acknowledge the wrong version

Since these two problems currently don't clearly show, I propose to handle them in a separate PR once all our current work is merged.

@timvisee
Copy link
Member

To fully fix segment ordering, even with proxies, I think we also need a change like this. Though I suggest to keep the PRs separate.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 13, 2025
@agourlay
Copy link
Member

FYI running Crasher on this branch locally shows no sign of missing points after 30 minutes of runtime 🤞

@timvisee timvisee mentioned this pull request Oct 13, 2025
6 tasks
@timvisee
Copy link
Member

#7381 is now adjusted and rebased on top of this PR.

@generall
Copy link
Member Author

depends on #7400 to fix tests

@generall generall force-pushed the single-thread-flush branch from 995bf4f to 239e6c6 Compare October 14, 2025 13:16
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 14, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 14, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 14, 2025
@timvisee timvisee merged commit b8590a9 into dev Oct 14, 2025
15 checks passed
@timvisee timvisee deleted the single-thread-flush branch October 14, 2025 14:39
timvisee added a commit that referenced this pull request Nov 14, 2025
* flush all segments in one thread sequentially

* bonus: move flush-related functions into dedicated file

* Minor comment tweaks

* await for flush on segment holder level

* fmt

* Minor improvement, preallocate vector for payload index flushers

* Remove invalid comment

---------

Co-authored-by: timvisee <tim@visee.me>
@timvisee timvisee mentioned this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants