Conversation
📝 WalkthroughWalkthroughThis set of changes introduces a new Suggested reviewers
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/id_tracker/mutable_id_tracker.rs (1)
659-675: Consider usingOneshotFilefor loading versions as well.The
load_versionsfunction still uses a standardFilewhile theload_mappingsfunction was updated to useOneshotFile. For consistency and to fully implement the cache optimization strategy, consider updating this function as well.- let file = File::open(versions_path)?; + let file = OneshotFile::open(versions_path)?; let file_len = file.metadata()?.len(); if file_len % VERSION_ELEMENT_SIZE != 0 { log::warn!( "Corrupted ID tracker versions storage, file size not a multiple of a version, assuming automatic recovery by WAL" ); } let version_count = file_len / VERSION_ELEMENT_SIZE; let mut reader = BufReader::new(file); - Ok((0..version_count) + let result = Ok((0..version_count) .map(|_| reader.read_u64::<FileEndianess>()) - .collect::<Result<_, _>>()?) + .collect::<Result<_, _>>()?); + + // Drop the cache after reading + let file = reader.into_inner(); + file.drop_cache()?; + + result
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
lib/common/memory/Cargo.toml(2 hunks)lib/common/memory/build.rs(1 hunks)lib/common/memory/src/fadvise.rs(1 hunks)lib/common/memory/src/lib.rs(1 hunks)lib/common/memory/src/madvise.rs(0 hunks)lib/gridstore/src/bitmask/gaps.rs(1 hunks)lib/gridstore/src/bitmask/mod.rs(1 hunks)lib/gridstore/src/page.rs(1 hunks)lib/quantization/src/encoded_storage.rs(2 hunks)lib/segment/src/id_tracker/mutable_id_tracker.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs(1 hunks)lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs(1 hunks)lib/segment/src/index/field_index/map_index/mmap_map_index.rs(1 hunks)lib/segment/src/index/field_index/mmap_point_to_values.rs(1 hunks)lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs(1 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(1 hunks)lib/segment/src/vector_storage/chunked_mmap_vectors.rs(1 hunks)lib/segment/src/vector_storage/chunked_vectors.rs(3 hunks)lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs(1 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(1 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(1 hunks)lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs(1 hunks)lib/sparse/src/index/inverted_index/inverted_index_mmap.rs(1 hunks)
💤 Files with no reviewable changes (1)
- lib/common/memory/src/madvise.rs
🧰 Additional context used
🧬 Code Graph Analysis (15)
lib/gridstore/src/bitmask/mod.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)lib/common/memory/src/madvise.rs (4)
madvise(93-95)madvise(101-101)madvise(107-113)madvise(131-137)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)
lib/gridstore/src/page.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)lib/common/memory/src/madvise.rs (4)
madvise(93-95)madvise(101-101)madvise(107-113)madvise(131-137)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)
lib/segment/src/index/field_index/mmap_point_to_values.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)lib/common/memory/src/madvise.rs (4)
madvise(93-95)madvise(101-101)madvise(107-113)madvise(131-137)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)lib/common/memory/src/madvise.rs (4)
madvise(93-95)madvise(101-101)madvise(107-113)madvise(131-137)
lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)lib/common/memory/src/madvise.rs (4)
madvise(93-95)madvise(101-101)madvise(107-113)madvise(131-137)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)lib/common/memory/src/madvise.rs (4)
madvise(93-95)madvise(101-101)madvise(107-113)madvise(131-137)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)lib/common/memory/src/madvise.rs (4)
madvise(93-95)madvise(101-101)madvise(107-113)madvise(131-137)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)lib/common/memory/src/madvise.rs (4)
madvise(93-95)madvise(101-101)madvise(107-113)madvise(131-137)
lib/quantization/src/encoded_storage.rs (1)
lib/common/memory/src/fadvise.rs (1)
fadvise(11-13)
lib/gridstore/src/bitmask/gaps.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise(11-13)clear_disk_cache(18-26)lib/common/memory/src/madvise.rs (4)
madvise(93-95)madvise(101-101)madvise(107-113)madvise(131-137)
lib/common/memory/src/lib.rs (1)
lib/common/memory/src/fadvise.rs (1)
fadvise(11-13)
lib/segment/src/id_tracker/mutable_id_tracker.rs (1)
lib/common/memory/src/fadvise.rs (1)
fadvise(11-13)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: storage-compat-test
🔇 Additional comments (32)
lib/common/memory/Cargo.toml (2)
16-16: Dependency addition is appropriate.Adding the
delegatecrate as a workspace dependency is justified for trait delegation in the newfadvisemodule.
28-28: Good formatting: newline at end of file.Adding a newline at EOF is a standard formatting best practice.
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
12-12: Import update is correct.Switching the import of
clear_disk_cachetomemory::fadviseis accurate and aligns with the new implementation.lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
16-16: Import update is correct.Switching the import of
clear_disk_cachetomemory::fadviseis accurate and consistent with the new implementation.lib/gridstore/src/page.rs (1)
4-4: Import update is correct.Switching the import of
clear_disk_cachetomemory::fadviseis accurate and aligns with the new implementation.lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
9-9: Import update is correct.Switching the import of
clear_disk_cachetomemory::fadviseis accurate and consistent with the new implementation.lib/segment/src/vector_storage/chunked_mmap_vectors.rs (1)
11-11: Import path update is correct.The import of
clear_disk_cachefrommemory::fadviseis accurate and aligns with the refactor. No further action needed.lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
11-11: Import path update is correct.The import of
clear_disk_cachefrommemory::fadviseis correct and matches the refactor. No further action needed.lib/gridstore/src/bitmask/gaps.rs (1)
5-5: Import path update is correct.The import of
clear_disk_cachefrommemory::fadviseis correct and matches the refactor. No further action needed.lib/segment/src/index/field_index/mmap_point_to_values.rs (1)
6-6: Import path update is correct.The import of
clear_disk_cachefrommemory::fadviseis correct and matches the refactor. No further action needed.lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (1)
11-11: Import path update is correct.The import of
clear_disk_cachefrommemory::fadviseis correct and matches the refactor. No further action needed.lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (1)
11-12: Import source correctly changed forclear_disk_cacheThe import was appropriately moved from
memory::madvisetomemory::fadvise, which aligns with the PR objective of providing explicit OS file cache management usingposix_fadvise.lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (1)
8-9: Import source correctly changed forclear_disk_cacheThe import was appropriately moved from
memory::madvisetomemory::fadvise, which aligns with the PR objective of providing explicit OS file cache management usingposix_fadvise.lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (1)
13-14: Import source correctly changed forclear_disk_cacheThe import was appropriately moved from
memory::madvisetomemory::fadvise, which aligns with the PR objective of providing explicit OS file cache management usingposix_fadvise.lib/gridstore/src/bitmask/mod.rs (1)
9-10: Import source correctly changed forclear_disk_cacheThe import was appropriately moved from
memory::madvisetomemory::fadvise, which aligns with the PR objective of providing explicit OS file cache management usingposix_fadvise.lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
9-9: Import updated to use the new fadvise module.The import of
clear_disk_cachehas been updated to use the new location frommemory::fadviseinstead ofmemory::madvise. This is part of the refactoring described in the PR objectives, where cache management functionality has been consolidated in the new fadvise module.lib/segment/src/index/hnsw_index/hnsw.rs (1)
18-18: Import updated to use the new fadvise module.The import of
clear_disk_cachehas been updated to use the new location frommemory::fadviseinstead ofmemory::madvise. This matches the refactoring described in the PR objectives, moving the cache management functionality to the more appropriate fadvise module.lib/common/memory/src/lib.rs (1)
2-2: Export of new fadvise module.Adding the public export of the new
fadvisemodule makes its functionality available to the rest of the codebase. This module houses the implementation of theOneshotFilestruct and theclear_disk_cachefunction that were previously in themadvisemodule, providing more precise OS cache management through theposix_fadvisesystem call.lib/segment/src/vector_storage/chunked_vectors.rs (3)
8-8: Import OneshotFile from the new fadvise module.Added import for the
OneshotFilestruct from the newfadvisemodule. This struct will be used to optimize file access patterns and explicitly manage OS cache.
193-193: Use OneshotFile instead of standard File for improved cache management.Replaced standard
File::openwithOneshotFile::open, which optimizes file access by applyingposix_fadvisehints (POSIX_FADV_SEQUENTIALandPOSIX_FADV_NOREUSE) for better sequential file reading performance.
204-204: Explicitly drop OS file cache after reading.Added call to
drop_cache()after reading the file to explicitly inform the OS that the cached pages for this file are no longer needed, which helps prevent unnecessary memory usage as described in the PR objectives.lib/quantization/src/encoded_storage.rs (2)
5-5: LGTM: Good import for the newOneshotFilestruct.The import aligns with the PR's goal of optimizing OS file cache usage after reading files.
39-42: Excellent change to improve file cache handling.Replacing
File::openwithOneshotFile::openand addingdrop_cache()call addresses the issue described in the PR objectives. This change helps prevent the OS file cache from being polluted with quantized vectors that are unlikely to be read again soon.lib/segment/src/id_tracker/mutable_id_tracker.rs (3)
11-11: LGTM: Good import for the newOneshotFilestruct.The import aligns with the PR's goal of optimizing OS file cache usage.
409-417: Clean implementation ofOneshotFileusage.Using
OneshotFileinstead of standardFileprovides file access pattern hints to the OS. The change from dropping the reader to getting the underlying file withinto_inner()is a necessary adjustment to preserve the file handle for truncation and cache dropping.
431-431: Good addition of cache dropping after file operation.Explicitly dropping the file cache after truncation and syncing aligns with the PR objective of reducing OS file cache usage.
lib/common/memory/build.rs (1)
1-15: Well-designed build script for conditional compilation.The build script correctly detects OS and environment conditions to enable
posix_fadvisefunctionality on supported platforms. The comment referencing the actual implementation in thenixcrate provides good context for maintainers.lib/common/memory/src/fadvise.rs (5)
6-14: Well-implemented conditional wrapper forposix_fadvise.Good use of conditional compilation with
#[cfg(posix_fadvise_supported)]and clean implementation of the wrapper function. The function appropriately uses the entire file (offset 0, length 0) for the advice.
15-26: Robust implementation of publicclear_disk_cachefunction.This function properly handles platforms where
posix_fadviseis not supported by doing nothing silently. The documentation clearly states this behavior, which is important for cross-platform code.
28-59: Excellent design for theOneshotFilestruct and its public API.The
OneshotFilestruct is well-designed for one-time sequential reading with appropriate file access hints. The implementation ofdrop_cachewith explicit error handling and theOption<File>design to prevent double drops is particularly good.
62-87: Good trait implementations for file operations.The implementations for
Deref,Read, andSeekare clean and effectively delegate to the inner file. Thedelegate!macro usage makes the code concise and maintainable.
89-97: RobustDropimplementation for automatic cache dropping.The
Dropimplementation correctly handles the case wheredrop_cache()has already been called explicitly. It also ignores any errors that might occur during the advisory call, which is appropriate for destructors.
e5c5748 to
51e227b
Compare
51e227b to
6414477
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR addresses the file cache pollution issue by replacing the previous disk cache clearing mechanism with a new fadvise‐based implementation using an OneshotFile wrapper. Key changes include:
- Replacing all usages of memory::madvise::clear_disk_cache with memory::fadvise::clear_disk_cache.
- Updating file read operations to use OneshotFile::open and invoking drop_cache() to clear the OS file cache after reading.
- Adding a new fadvise module along with necessary build script and dependency adjustments.
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs | Updated ordering of import for clear_disk_cache to fadvise. |
| lib/segment/src/vector_storage/chunked_vectors.rs | Replaced File::open with OneshotFile::open and added drop_cache call. |
| lib/segment/src/vector_storage/chunked_mmap_vectors.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/segment/src/index/hnsw_index/hnsw.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/segment/src/index/field_index/mmap_point_to_values.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/segment/src/index/field_index/map_index/mmap_map_index.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/segment/src/id_tracker/mutable_id_tracker.rs | Replaced File::open with OneshotFile::open and added drop_cache call. |
| lib/quantization/src/encoded_storage.rs | Replaced File::open with OneshotFile::open and added drop_cache call. |
| lib/gridstore/src/page.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/gridstore/src/bitmask/mod.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/gridstore/src/bitmask/gaps.rs | Updated clear_disk_cache import to use fadvise module. |
| lib/common/memory/src/madvise.rs | Removed clear_disk_cache implementation. |
| lib/common/memory/src/lib.rs | Added export for new fadvise module. |
| lib/common/memory/src/fadvise.rs | New module that implements clear_disk_cache and the OneshotFile wrapper. |
| lib/common/memory/build.rs | New build script to enable posix_fadvise support conditionally. |
| lib/common/memory/Cargo.toml | Added new dependency on delegate. |
* Move clear_disk_cache to memory::fadvise * Add memory::fadvise::OneshotFile * Use OneshotFile
Related: #6323.
Consider this simplified example:
The problem with this code is that after calling this function, the file contents is stored twice in the memory: once as
Vec<u8>(as anonymous memory), and once implicitly as OS file cache, making it more likely to evict other files from the cache.Particularly, after reproducing #6323, I've found that quantized vectors and id tracker files are still in the OS file cache when the mmaped index cache gets evicted. Since we are unlikely to read quantized vectors again, we'd better to avoid polluting the OS file cache.
This PR adds a new wrapper around
File,OneshotFile, that callsposix_fadvise(…, POSIX_FADV_DONTNEED)when the file is closed. Not ideal, but should improve the situation a bit.