Conversation
|
|
||
| // Joins flush thread if exists | ||
| // Returns lock to guarantee that there will be no other flush in a different thread | ||
| pub(super) fn lock_flushing( |
| } | ||
|
|
||
| #[test] | ||
| fn test_background_flush() { |
There was a problem hiding this comment.
Maybe we need a new test to replace this one?
| /// | ||
| /// If there are unsaved changes after flush - detects lowest unsaved change version. | ||
| /// If all changes are saved - returns max version. | ||
| fn get_max_persisted_version( |
There was a problem hiding this comment.
This function doesn't require flush to collect persisted versions anymore
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
lib/segment/src/segment/entry.rs (1)
630-643: Prefer explicit match arm over catch-all pattern.The catch-all pattern
(_, _) => {}at line 642 handles the remaining case(Some(_), None)where data exists but hasn't been persisted yet. Per coding guidelines, prefer explicit patterns to make the intent clearer and catch future enum variants.Apply this diff to make the pattern explicit:
match (self.version, current_persisted_version) { (None, _) => { // Segment is empty, nothing to flush return None; } (Some(version), Some(persisted_version)) => { if !force && version == persisted_version { log::trace!("not flushing because version == persisted_version"); // Segment is already flushed return None; } } - (_, _) => {} + (Some(_), None) => { + // Segment has data but hasn't been persisted yet + } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
lib/segment/src/entry/entry_point.rs(3 hunks)lib/segment/src/segment/entry.rs(4 hunks)lib/segment/src/segment/mod.rs(2 hunks)lib/segment/src/segment/segment_ops.rs(1 hunks)lib/segment/src/segment/snapshot.rs(1 hunks)lib/segment/src/segment/tests.rs(0 hunks)lib/segment/src/segment_constructor/segment_constructor_base.rs(0 hunks)lib/segment/tests/integration/segment_tests.rs(2 hunks)lib/segment/tests/integration/sparse_vector_index_search_tests.rs(1 hunks)lib/shard/src/proxy_segment/segment_entry.rs(3 hunks)lib/shard/src/proxy_segment/tests.rs(1 hunks)lib/shard/src/segment_holder/flush.rs(1 hunks)lib/shard/src/segment_holder/mod.rs(3 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/segment/tests.rs
- lib/segment/src/segment_constructor/segment_constructor_base.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/segment/src/segment/snapshot.rslib/segment/tests/integration/sparse_vector_index_search_tests.rslib/shard/src/segment_holder/mod.rslib/segment/src/segment/entry.rslib/shard/src/proxy_segment/segment_entry.rslib/shard/src/segment_holder/flush.rslib/segment/src/entry/entry_point.rslib/segment/src/segment/mod.rslib/segment/tests/integration/segment_tests.rslib/shard/src/proxy_segment/tests.rslib/segment/src/segment/segment_ops.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/segment/src/segment/snapshot.rslib/shard/src/segment_holder/mod.rslib/segment/src/segment/entry.rslib/shard/src/proxy_segment/segment_entry.rslib/shard/src/segment_holder/flush.rslib/segment/src/entry/entry_point.rslib/segment/src/segment/mod.rslib/shard/src/proxy_segment/tests.rslib/segment/src/segment/segment_ops.rs
**/{tests,benches}/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged
Files:
lib/segment/tests/integration/sparse_vector_index_search_tests.rslib/segment/tests/integration/segment_tests.rs
🧠 Learnings (1)
📚 Learning: 2025-09-01T11:42:06.964Z
Learnt from: timvisee
PR: qdrant/qdrant#7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.
Applied to files:
lib/shard/src/segment_holder/mod.rslib/shard/src/segment_holder/flush.rs
🧬 Code graph analysis (7)
lib/shard/src/segment_holder/mod.rs (1)
lib/segment/src/entry/entry_point.rs (1)
flush(277-282)
lib/segment/src/segment/entry.rs (4)
lib/segment/src/entry/entry_point.rs (3)
persistent_version(36-36)flusher(274-274)version(33-33)lib/shard/src/proxy_segment/segment_entry.rs (3)
persistent_version(30-32)flusher(630-634)version(26-28)lib/segment/src/index/struct_payload_index.rs (1)
flusher(1093-1127)lib/segment/src/vector_storage/vector_storage_base.rs (2)
flusher(113-113)flusher(1174-1218)
lib/shard/src/proxy_segment/segment_entry.rs (2)
lib/segment/src/entry/entry_point.rs (2)
persistent_version(36-36)flusher(274-274)lib/segment/src/segment/entry.rs (2)
persistent_version(43-45)flusher(627-761)
lib/shard/src/segment_holder/flush.rs (1)
lib/shard/src/segment_holder/mod.rs (4)
segments(941-944)aloha_lock_segment_read(590-606)segment_ids(306-311)get(262-266)
lib/segment/src/entry/entry_point.rs (2)
lib/segment/src/segment/entry.rs (2)
persistent_version(43-45)flusher(627-761)lib/shard/src/proxy_segment/segment_entry.rs (2)
persistent_version(30-32)flusher(630-634)
lib/shard/src/proxy_segment/tests.rs (1)
lib/shard/src/proxy_segment/mod.rs (1)
new(43-69)
lib/segment/src/segment/segment_ops.rs (2)
lib/segment/src/entry/entry_point.rs (1)
flusher(274-274)lib/segment/src/segment/entry.rs (1)
flusher(627-761)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: e2e-tests
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consistency
- GitHub Check: integration-tests
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: lint
- GitHub Check: rust-tests (macos-latest)
🔇 Additional comments (8)
lib/segment/src/segment/segment_ops.rs (1)
484-485: Sequential flusher usage looks solid.Using the returned flusher closure here keeps the repair path consistent with the single-threaded flush orchestration.
lib/shard/src/segment_holder/mod.rs (1)
1-77: Centralized flush thread holder fits the redesign.The shared
flush_threadguard and module split give SegmentHolder a clean single-thread flush coordination point.lib/shard/src/proxy_segment/segment_entry.rs (1)
30-634: Proxy delegates to the new flush API correctly.Forwarding
persistent_versionand reusing the wrapped flusher keeps proxy segments aligned with the core segment semantics.lib/segment/src/segment/mod.rs (1)
138-149: Return type qualification is welcome.Exporting
destroy_rocksdbwith the explicitOperationResultpath avoids type inference surprises for callers.lib/segment/src/entry/entry_point.rs (1)
35-282: Trait surface matches the single-thread flusher model.Exposing
persistent_versionand offering a defaultflushthat simply drives the optional flusher gives downstream code a clean, consistent contract.lib/segment/src/segment/entry.rs (3)
12-14: LGTM: Flusher import added.The addition of
Flusherto the imports aligns with the new flush mechanism where segments return closures instead of performing inline flushes.
43-45: LGTM: Persistent version accessor added.The new
persistent_version()method correctly reads the persisted version with proper locking and provides a sensible default of 0 forNone.
627-761: Well-designed flush deferral mechanism.The refactoring from inline flush to closure-based flusher enables sequential execution by the caller (SegmentHolder), which is the core objective of this PR. The implementation correctly:
- Returns
Noneearly when there's nothing to flush- Captures all necessary flushers and state
- Implements proper flush ordering (id_tracker mapping → vectors → payload → versions)
- Guards against concurrent flushes with persisted_version locking
- Maintains atomicity by checking versions before and after flushing
The detailed comments (lines 662-707) excellently document the flush ordering rationale and recovery scenarios.
This comment was marked as resolved.
This comment was marked as resolved.
| /// Returns maximum version number which is guaranteed to be persisted. | ||
| fn flush(&self, sync: bool, force: bool) -> OperationResult<SeqNumberType>; | ||
| /// Returned function returns maximum version number which is guaranteed to be persisted. | ||
| fn flusher(&self, force: bool) -> Option<Flusher>; |
There was a problem hiding this comment.
I really like that we have one boolean parameter now.
This currently does not explicitly block parallelism. I'm not sure how this behaves if we flush from two different places. If you haven't done so yet, this might need careful checking.
We do have #7388 (comment), but I'm not confident it is sound.
Note-to-self: check flush implementation in all storages
There was a problem hiding this comment.
I manually checked all flushers. All are fine, with one exception and one note:
- the mutable ID tracker doe snot support concurrent flushes, we cannot open the files on disk twice concurrently
- the last persisted version in the segment holder is not sound when concurrent flushes are happening, we might acknowledge the wrong version
Since these two problems currently don't clearly show, I propose to handle them in a separate PR once all our current work is merged.
|
To fully fix segment ordering, even with proxies, I think we also need a change like this. Though I suggest to keep the PRs separate. |
|
FYI running Crasher on this branch locally shows no sign of missing points after 30 minutes of runtime 🤞 |
|
#7381 is now adjusted and rebased on top of this PR. |
|
depends on #7400 to fix tests |
995bf4f to
239e6c6
Compare
* flush all segments in one thread sequentially * bonus: move flush-related functions into dedicated file * Minor comment tweaks * await for flush on segment holder level * fmt * Minor improvement, preallocate vector for payload index flushers * Remove invalid comment --------- Co-authored-by: timvisee <tim@visee.me>
Currently, we rely on segment flush ordering in case of copy-on-write behaviour.
For example:
To compensate for this, we currently perform flush in a very specific order: we flush
immutable segments first.There is, however, a problem. If we use
sync=falseeach segment flush operation is spawned in a dedicated thread. Starting from this point there are no guarantees that segment is going to be flushed in the same order as it was scheduled for flushing.To demonstrate this, it is enough to include random sleep into
flush_opfunction:What this PR does:
flusherobjectSegmentHolderowns single common thread for flushing all segments sequentially