Fix payload indexes drop/flush sequence by agourlay · Pull Request #7626 · qdrant/qdrant

agourlay · 2025-11-27T11:27:12Z

Follow up on #7621

I performed an audit to make sure that other types of payload index also:

drop their mmap handles before removing the files on disk
do not drop during flush
do not have concurrent flush

The MutableBoolIndex uses BufferedDynamicFlags which flushes properly since #7621

All the others indexes modified are built on top of MmapBitSliceBufferedUpdateWrapper which this PR fixes.

I also implemented the following previous suggestions:

agourlay · 2025-11-27T20:25:00Z

lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs


    pub fn wipe(self) -> OperationResult<()> {
        let files = self.files();
-        let Self { path, .. } = self;


the deconstruction is semantically equivalent but feels a bit hacky compared to an explicit drop

coszio

LGTM

timvisee · 2025-12-01T16:06:02Z

lib/segment/src/common/flags/buffered_dynamic_flags.rs

+        // Wait for all background flush operations to finish, and cancel future flushes
+        _ = self.flushing_lock.lock();


I don't think this works.

When we obtain this lock, the Arcs are still alive. And so a flusher thread is still able to upgrade, lock and flush after this drop function if the timing is right.

Thanks for catching this, I will revert all usages for having a lock of a bool to signal properly that the drop occurred

Applied revert b1003fb

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 824c5ad and 7ebd324.

📒 Files selected for processing (8)

lib/gridstore/src/gridstore.rs (1 hunks)
lib/segment/src/common/flags/buffered_dynamic_flags.rs (3 hunks)
lib/segment/src/common/mmap_bitslice_buffered_update_wrapper.rs (3 hunks)
lib/segment/src/index/field_index/bool_index/mutable_bool_index.rs (1 hunks)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (1 hunks)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1 hunks)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1 hunks)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs
lib/segment/src/common/flags/buffered_dynamic_flags.rs
lib/segment/src/common/mmap_bitslice_buffered_update_wrapper.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs
lib/segment/src/index/field_index/map_index/mmap_map_index.rs
lib/segment/src/index/field_index/bool_index/mutable_bool_index.rs

🧠 Learnings (8)

📓 Common learnings

Learnt from: JojiiOfficial
Repo: qdrant/qdrant PR: 5951
File: lib/api/src/grpc/qdrant.rs:6493-6496
Timestamp: 2025-03-12T09:25:16.340Z
Learning: The `payload_index_io_read` field in `HardwareUsage` was introduced in PR #5951 and is safe to modify as it was not used in production before this PR.

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
lib/segment/src/index/field_index/map_index/mmap_map_index.rs

📚 Learning: 2025-09-01T11:42:06.964Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.

Applied to files:

lib/segment/src/common/flags/buffered_dynamic_flags.rs

📚 Learning: 2025-10-13T09:34:22.740Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7388
File: lib/shard/src/segment_holder/flush.rs:73-76
Timestamp: 2025-10-13T09:34:22.740Z
Learning: In Qdrant's SegmentHolder flush logic, when `sync=false` and a background flush is already running, returning early is acceptable because: (1) flushes normally only start if there are changes, (2) the lowest version number is acknowledged in the WAL when changes exist, and (3) missed operations will replay on restart. However, an edge case exists when the `force` flag is used: if the first forced flush has no changes but a follow-up non-force flush with changes is ignored, the system might acknowledge a version that is too recent.

Applied to files:

lib/segment/src/common/flags/buffered_dynamic_flags.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/common/mmap_bitslice_buffered_update_wrapper.rs
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs

📚 Learning: 2025-06-06T07:52:38.478Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.

Applied to files:

lib/gridstore/src/gridstore.rs

📚 Learning: 2025-04-17T10:52:59.516Z

Learnt from: ffuugoo
Repo: qdrant/qdrant PR: 6352
File: lib/segment/src/id_tracker/mutable_id_tracker.rs:289-299
Timestamp: 2025-04-17T10:52:59.516Z
Learning: In Qdrant's MutableIdTracker, FileVersionTracker flushers are designed to capture the pending version at creation time (not at flush time) to ensure they remain in sync with pending_mappings flusher. This design ensures consistency between data and version information during flush operations.

Applied to files:

lib/gridstore/src/gridstore.rs

📚 Learning: 2025-09-01T11:19:26.371Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs

🧬 Code graph analysis (4)

lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (4)

lib/gridstore/src/bitmask/gaps.rs (1)

path (90-92)

lib/segment/src/fixtures/payload_context_fixture.rs (1)

drop (86-89)

lib/segment/src/segment/mod.rs (1)

drop (109-143)

lib/common/memory/src/fadvise.rs (1)

drop (98-104)

lib/segment/src/common/flags/buffered_dynamic_flags.rs (1)

lib/segment/src/common/mmap_bitslice_buffered_update_wrapper.rs (1)

new (23-32)

lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (2)

lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (1)

path (53-55)

lib/gridstore/src/bitmask/gaps.rs (1)

path (90-92)

lib/segment/src/index/field_index/bool_index/mutable_bool_index.rs (1)

lib/segment/src/common/flags/buffered_dynamic_flags.rs (1)

drop (97-100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consensus-compose
GitHub Check: test-consistency
GitHub Check: integration-tests
GitHub Check: integration-tests-consensus
GitHub Check: lint
GitHub Check: Build Qdrant Edge Python bindings
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (macos-latest)

🔇 Additional comments (10)

lib/gridstore/src/gridstore.rs (1)

576-579: LGTM!

The updated comment clearly explains the purpose of using weak references in the flusher pattern—detecting if the Gridstore instance has been dropped to abort the flush gracefully.

lib/segment/src/common/mmap_bitslice_buffered_update_wrapper.rs (2)

18-31: LGTM on the flushing_lock field and initialization.

The weak reference pattern combined with the flushing lock provides proper lifecycle management for concurrent flushes.

73-101: LGTM on the flusher implementation.

The flusher correctly:

Uses weak references to detect if the instance has been dropped

Acquires the flushing lock with let _flushing_lock_guard = flushing_lock.lock() which keeps the guard alive for the duration of the flush

lib/segment/src/common/flags/buffered_dynamic_flags.rs (2)

24-35: LGTM on the flushing_lock field and initialization.

Consistent with the pattern used in MmapBitSliceBufferedUpdateWrapper.

52-93: LGTM on the flusher implementation.

The flusher correctly uses let _flushing_lock_guard = flushing_lock.lock() to keep the guard alive during the flush operation.

lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)

200-210: LGTM on the wipe() implementation.

The updated sequence correctly ensures mmap handles are released before file deletion:

Collect files and clone path while self is valid

Drop self to release mmap resources

Delete files and directory

This prevents potential file locking issues (especially on Windows) and aligns with the PR's objective of ensuring proper resource cleanup order.

lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (1)

57-67: LGTM on the wipe() implementation.

Consistent with the pattern applied across other mmap index implementations in this PR. The explicit drop(self) before file deletion ensures mmap handles are properly released.

lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)

153-163: LGTM! Correct mmap handle cleanup before file deletion.

The pattern is correctly implemented: files() is captured while self is still valid, then path is cloned, self is dropped to release mmap handles, and finally files are deleted. This ensures proper resource cleanup order.

lib/segment/src/index/field_index/bool_index/mutable_bool_index.rs (1)

310-319: LGTM! Correct mmap handle cleanup before directory deletion.

The pattern correctly ensures mmap handles are released before filesystem operations. The Drop implementation in BufferedDynamicFlags (seen in relevant snippets) will wait for any in-flight flush operations to complete before the directory is removed.

lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)

343-353: LGTM! Correct mmap handle cleanup before file deletion.

The explicit drop(self) approach is cleaner than deconstruction (addressing the prior review feedback). The pattern is consistent with other index implementations in this PR: capture files, clone path, drop self to release mmap handles, then perform filesystem operations.

lib/segment/src/common/flags/buffered_dynamic_flags.rs

lib/segment/src/common/mmap_bitslice_buffered_update_wrapper.rs

This reverts commit 7ebd324.

timvisee

Nice!

timvisee · 2025-12-02T09:24:57Z

lib/segment/src/common/mmap_bitslice_buffered_update_wrapper.rs

+            let (Some(bitslice), Some(pending_updates_arc)) =
+                (bitslice.upgrade(), pending_updates_arc.upgrade())
+            else {
+                log::debug!(
+                    "Aborted flushing on a dropped MmapBitSliceBufferedUpdateWrapper instance"
+                );
+                return Ok(());
+            };


I like that we still use weak refs.

It allows us to more reliably drop the Arc'ed structures together with the structure itself. Rather than having to wait on the flushers to 'recognize' the stop too.

* Fix release mmap handles before removing files * weak references to handle after drop * protect from concurrent drop/flush * simplify locking * Revert "simplify locking" This reverts commit 7ebd324.

agourlay mentioned this pull request Nov 27, 2025

Fix BufferedDynamicFlags flushing with concurrent drop/deletion #7621

Merged

timvisee added the release:1.16.2 label Nov 27, 2025

agourlay force-pushed the audit-release-mmap-handles-before-removing-files branch from a3d2681 to 6663b69 Compare November 27, 2025 15:59

agourlay marked this pull request as ready for review November 27, 2025 20:06

This comment was marked as resolved.

Sign in to view

agourlay changed the title ~~Fix release mmap handles before removing files~~ Fix payload indexes drop/flush sequence Nov 27, 2025

agourlay force-pushed the audit-release-mmap-handles-before-removing-files branch from 7ada4e5 to 31f5893 Compare November 27, 2025 20:14

agourlay commented Nov 27, 2025

View reviewed changes

agourlay requested review from coszio and timvisee November 27, 2025 20:26

coszio approved these changes Nov 28, 2025

View reviewed changes

qdrant deleted a comment from coderabbitai bot Nov 28, 2025

timvisee reviewed Dec 1, 2025

View reviewed changes

agourlay added 4 commits December 1, 2025 19:23

Fix release mmap handles before removing files

ce1ee08

weak references to handle after drop

f7ab784

protect from concurrent drop/flush

dde7304

simplify locking

7ebd324

agourlay force-pushed the audit-release-mmap-handles-before-removing-files branch from 31f5893 to 7ebd324 Compare December 1, 2025 18:23

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

lib/segment/src/common/flags/buffered_dynamic_flags.rs Show resolved Hide resolved

lib/segment/src/common/mmap_bitslice_buffered_update_wrapper.rs Show resolved Hide resolved

Revert "simplify locking"

b1003fb

This reverts commit 7ebd324.

qdrant deleted a comment from coderabbitai bot Dec 2, 2025

timvisee approved these changes Dec 2, 2025

View reviewed changes

timvisee merged commit 86ced6d into dev Dec 2, 2025
15 checks passed

timvisee deleted the audit-release-mmap-handles-before-removing-files branch December 2, 2025 09:27

timvisee mentioned this pull request Dec 3, 2025

Bump version to 1.16.2 #7680

Merged

1 task

claude bot mentioned this pull request Dec 4, 2025

chore(deps): update qdrant/qdrant docker tag to v1.16.2 cbcoutinho/nextcloud-mcp-server#372

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix payload indexes drop/flush sequence#7626

Fix payload indexes drop/flush sequence#7626
timvisee merged 5 commits intodevfrom
audit-release-mmap-handles-before-removing-files

agourlay commented Nov 27, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

agourlay Nov 27, 2025

Uh oh!

coszio left a comment

Uh oh!

timvisee Dec 1, 2025

Uh oh!

agourlay Dec 1, 2025 •

edited

Loading

Uh oh!

agourlay Dec 1, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

timvisee left a comment

Uh oh!

timvisee Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// Wait for all background flush operations to finish, and cancel future flushes
		_ = self.flushing_lock.lock();

Conversation

agourlay commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

agourlay Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

coszio left a comment

Choose a reason for hiding this comment

Uh oh!

timvisee Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

agourlay Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agourlay Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

timvisee left a comment

Choose a reason for hiding this comment

Uh oh!

timvisee Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agourlay commented Nov 27, 2025 •

edited

Loading

agourlay Dec 1, 2025 •

edited

Loading