Conversation
📝 WalkthroughWalkthroughThis PR introduces the fs-err crate across the workspace (Cargo.toml updates) and adds a Clippy config disallowing many std/tokio filesystem APIs. It replaces std::fs and tokio::fs imports/usages with fs_err equivalents (aliases like Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/storage/src/content_manager/snapshots/download.rs (2)
67-68: Fix typo in doc comment.“returen” → “return”.
Apply this diff:
-/// May returen a `TempPath` if a file was downloaded from a remote source. If it is dropped the +/// May return a `TempPath` if a file was downloaded from a remote source. If it is dropped the
91-95: Wording: “schema” → “scheme”.Correct terminology for URLs is “scheme”.
Apply this diff:
- "URL {} with schema {} is not supported", + "URL {} with scheme {} is not supported",
🧹 Nitpick comments (25)
lib/common/memory/src/fadvise.rs (2)
22-27: Avoid the extraPathBufallocation inclear_disk_cache.
fs_err::File::openalready accepts anyAsRef<Path>and clones the path internally for error context. Cloning here adds an avoidable allocation. Please pass the original&Pathdirectly.- match File::open(file_path.to_path_buf()) { + match File::open(file_path) {
48-55: Skip cloning the path before opening theOneshotFile.Similar to the above,
fs_err::File::opencan take the borrowed path and will capture it for diagnostics on its own. Dropping the manualto_path_buf()avoids redundant work in a hot path.- let file = File::open(path.as_ref().to_path_buf())?; + let file = File::open(path.as_ref())?;lib/quantization/src/encoded_vectors_binary.rs (1)
460-474: Use fs_err for directory creation — optional readability tweakBehavior is correct and now includes path context via fs_err. Consider a small readability refactor to avoid the chained and_then.
Apply within this block:
- meta_path - .parent() - .ok_or_else(|| { - std::io::Error::new( - std::io::ErrorKind::InvalidInput, - "Path must have a parent directory", - ) - }) - .and_then(fs::create_dir_all) - .map_err(|e| { - EncodingError::EncodingError(format!( - "Failed to create metadata directory: {e}", - )) - })?; + let dir = meta_path.parent().ok_or_else(|| { + EncodingError::EncodingError("Path must have a parent directory".to_string()) + })?; + fs::create_dir_all(dir).map_err(|e| { + EncodingError::EncodingError(format!("Failed to create metadata directory: {e}")) + })?;lib/storage/src/content_manager/snapshots/download.rs (2)
11-11: Remove redundant crate import of reqwest.In Rust 2018, the extern prelude makes
reqwest::...usable withoutuse reqwest;. Drop it to avoid a no-op import.Apply this diff:
-use {fs_err as fs, reqwest}; +use fs_err as fs;
37-41: Construction pipeline looks correct; minor readability tweak optional.
fs::File::from_parts(...)->tokio_fs::File::from_std(...)is the right pattern for preserving path context. Consider dropping the explicit turbofish; type inference suffices.Apply this diff to Line 37:
- let file = fs::File::from_parts::<&Path>(file, temp_path.as_ref()); + let file = fs::File::from_parts(file, temp_path.as_ref());lib/common/common/src/mmap_hashmap.rs (1)
122-127: Minor: avoid unnecessary turbofish and prefer owning the path for clarity.
File::from_partsdoesn’t need the<&Path>turbofish; passing an owned PathBuf improves clarity and avoids any lifetime second-guessing.Apply:
- let file = File::from_parts::<&Path>(file, temp_path.as_ref()); + let file = File::from_parts(file, temp_path.as_ref().to_path_buf());lib/common/io/src/file_operations.rs (1)
10-13: Avoid exposing std::fs::File in the API; accept a generic Write instead.You can drop the clippy allow and make the API more general by taking &mut dyn Write. This keeps all current call sites working and avoids disallowed type usage in the signature.
Apply:
-#[allow( - clippy::disallowed_types, - reason = "can't use `fs_err::File` since `atomicwrites` only provides `&mut std::fs::File`" -)] pub fn atomic_save<E, F>(path: &Path, write: F) -> Result<(), E> where E: From<io::Error>, - F: FnOnce(&mut BufWriter<&mut std::fs::File>) -> Result<(), E>, + F: FnOnce(&mut dyn Write) -> Result<(), E>, { let af = AtomicFile::new(path, OverwriteBehavior::AllowOverwrite); af.write(|f| { let mut writer = BufWriter::new(f); - write(&mut writer)?; + let mut w: &mut dyn Write = &mut writer; + write(&mut w)?; writer.flush()?; Ok(()) })Also applies to: 17-25
lib/common/memory/src/mmap_ops.rs (3)
39-39: Rename via fs_err is fine; consider closing TOCTOU on Windows.There’s a small race between exists() and rename() if another process creates the target; on Windows rename fails if dest appears in the meantime. Consider using tempfile::NamedTempFile and persist (or atomicwrites, like elsewhere) to make this more robust.
46-51: Avoid opening with append(true) and create(true) for read-only mmap.Appending isn’t needed for read-only mapping, and creating a 0-byte file can cause mapping failures. Prefer opening read-only and ensuring the file exists/size is valid beforehand.
Apply:
- let file = OpenOptions::new() - .read(true) - .append(true) - .create(true) - .open(path)?; + let file = OpenOptions::new() + .read(true) + .open(path)?;
80-145: Consider bytemuck/zerocopy instead of custom transmute helpers.These helpers are brittle and easy to misuse. Libraries like bytemuck or zerocopy provide safer conversions with compile-time checks. This isn’t new code, but worth planning a migration.
As per coding guidelines
lib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mmap_postings.rs (2)
162-167: Tempfile handling: add path context to InvalidInput and drop turbofish.
- Improve diagnostics by attaching a message to InvalidInput.
- The turbofish on File::from_parts is unnecessary; type inference suffices.
Apply this diff:
- let (file, temp_path) = tempfile::Builder::new() - .prefix(path.file_name().ok_or(io::ErrorKind::InvalidInput)?) - .tempfile_in(path.parent().ok_or(io::ErrorKind::InvalidInput)?)? - .into_parts(); - let file = File::from_parts::<&Path>(file, temp_path.as_ref()); + let (file, temp_path) = tempfile::Builder::new() + .prefix( + path.file_name().ok_or_else(|| io::Error::new( + io::ErrorKind::InvalidInput, + format!("invalid path (no file name): {}", path.display()), + ))? + ) + .tempfile_in( + path.parent().ok_or_else(|| io::Error::new( + io::ErrorKind::InvalidInput, + format!("invalid path (no parent directory): {}", path.display()), + ))? + )? + .into_parts(); + let file = File::from_parts(file, temp_path.as_ref());
5-5: Remove unused Path import.After dropping the turbofish, Path is no longer needed.
Apply this diff:
-use std::path::{Path, PathBuf}; +use std::path::PathBuf;lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
204-207: Make wipe tolerant to already-missing files; log non-NotFound dir errors.Avoid failing the wipe if some files were removed concurrently, and don’t silently swallow unexpected remove_dir errors.
Apply this diff:
- for file in files { - fs::remove_file(file)?; - } - let _ = fs::remove_dir(path); + for file in files { + if let Err(err) = fs::remove_file(&file) { + if err.kind() != std::io::ErrorKind::NotFound { + return Err(err.into()); + } + } + } + if let Err(err) = fs::remove_dir(&path) { + if err.kind() != std::io::ErrorKind::NotFound { + log::warn!("Failed to remove index directory {}: {err}", path.display()); + } + }lib/shard/src/wal.rs (1)
247-249: Import swap is fine; avoid to_str() for paths in metadata.While reviewing the import change, noticed later in the test you do
fs::metadata(dir.path().join("open-1").to_str().unwrap()). Prefer passing a Path directly to avoid lossy UTF-8 conversion:Example adjustment in the test body:
let metadata = fs::metadata(dir.path().join("open-1")).unwrap();This also keeps the richer path context from fs_err intact.
lib/segment/src/segment_constructor/segment_constructor_base.rs (2)
953-959: Cleanup: ignore NotFound to reduce log noise and TOCTOU races.When removing files after a failed migration, NotFound is benign; don’t log it as an error.
Apply:
- if let Err(err) = fs::remove_file(&file) { - log::error!( - "ID tracker migration to mutable failed, failed to remove mutable file {} for cleanup: {err}", - file.display(), - ); - } + if let Err(err) = fs::remove_file(&file) { + if err.kind() != std::io::ErrorKind::NotFound { + log::error!( + "ID tracker migration to mutable failed, failed to remove mutable file {} for cleanup: {err}", + file.display(), + ); + } + }
1454-1461: Avoid pre-check; handle NotFound after remove_dir_all to prevent TOCTOU.Directly attempt remove_dir_all; ignore NotFound, log others.
Apply:
- if storage_dir.is_dir() - && let Err(err) = fs::remove_dir_all(&storage_dir) - { - log::error!( - "Payload storage migration to mmap failed, failed to remove mmap files in {} for cleanup: {err}", - storage_dir.display(), - ); - } + if let Err(err) = fs::remove_dir_all(&storage_dir) { + if err.kind() != std::io::ErrorKind::NotFound { + log::error!( + "Payload storage migration to mmap failed, failed to remove mmap files in {} for cleanup: {err}", + storage_dir.display(), + ); + } + }lib/collection/src/common/file_utils.rs (1)
42-51: Create parent directories too.create_dir will fail if parents don’t exist. Prefer create_dir_all.
Apply:
- tokio_fs::create_dir(&to).await.map_err(|err| { + tokio_fs::create_dir_all(&to).await.map_err(|err| { CollectionError::service_error(format!( "failed to move directory {} to {}: \ failed to create destination directory: \ {err}", from.display(), to.display(), )) })?;lib/collection/src/operations/snapshot_storage_ops.rs (1)
82-99: Avoid usize cast for file sizes; use u64 and convert safely.Prevents potential overflow on 32-bit and improves correctness for very large files.
Apply:
- let file_meta = tokio_fs::metadata(local_source_path).await?; - let file_size = file_meta.len() as usize; + let file_meta = tokio_fs::metadata(local_source_path).await?; + let file_size = file_meta.len(); - if file_size > MAX_UPLOAD_SIZE { + if file_size > MAX_UPLOAD_SIZE { return Err(CollectionError::service_error(format!( "File size exceeds the maximum upload size: {MAX_UPLOAD_SIZE}" ))); } - if file_size > DEFAULT_CHUNK_SIZE * MAX_PART_NUMBER { - let chunk_size = ((file_size - 1) / MAX_PART_NUMBER) + 1; // ceil((file_size) / MAX_PART_NUMBER) - return Ok(chunk_size); - } - Ok(DEFAULT_CHUNK_SIZE) + if file_size > DEFAULT_CHUNK_SIZE * MAX_PART_NUMBER { + let chunk_size_bytes = ((file_size - 1) / MAX_PART_NUMBER) + 1; // ceil((file_size) / MAX_PART_NUMBER) + let chunk_size = usize::try_from(chunk_size_bytes).map_err(|_| { + CollectionError::service_error( + "Calculated chunk size exceeds addressable memory on this platform", + ) + })?; + return Ok(chunk_size); + } + Ok(DEFAULT_CHUNK_SIZE as usize)Also update the constants to u64:
- const DEFAULT_CHUNK_SIZE: usize = 50 * 1024 * 1024; - const MAX_PART_NUMBER: usize = 10000; - const MAX_UPLOAD_SIZE: usize = 5 * 1024 * 1024 * 1024 * 1024; + const DEFAULT_CHUNK_SIZE: u64 = 50 * 1024 * 1024; + const MAX_PART_NUMBER: u64 = 10_000; + const MAX_UPLOAD_SIZE: u64 = 5 * 1024 * 1024 * 1024 * 1024;clippy.toml (1)
16-96: Potentially invalid disallowed paths (std doesn’t define these).Entries like:
- std::fs::File::{lock, lock_shared, try_lock, try_lock_shared, unlock} (Lines 21-23, 31-33)
- std::fs::soft_link (Line 49)
don’t exist in std. Consider removing them or marking as allow-invalid to avoid Clippy config issues.
Apply this minimal adjustment to avoid toolchain breakage:
- { path = "std::fs::File::lock", replacement = "fs_err::File::lock" }, - { path = "std::fs::File::lock_shared", replacement = "fs_err::File::lock_shared" }, + { path = "std::fs::File::lock", replacement = "fs_err::File::lock", allow-invalid = true }, + { path = "std::fs::File::lock_shared", replacement = "fs_err::File::lock_shared", allow-invalid = true }, @@ - { path = "std::fs::File::try_lock", replacement = "fs_err::File::try_lock" }, - { path = "std::fs::File::try_lock_shared", replacement = "fs_err::File::try_lock_shared" }, - { path = "std::fs::File::unlock", replacement = "fs_err::File::unlock" }, + { path = "std::fs::File::try_lock", replacement = "fs_err::File::try_lock", allow-invalid = true }, + { path = "std::fs::File::try_lock_shared", replacement = "fs_err::File::try_lock_shared", allow-invalid = true }, + { path = "std::fs::File::unlock", replacement = "fs_err::File::unlock", allow-invalid = true }, @@ - { path = "std::fs::soft_link", replacement = "fs_err::soft_link" }, + { path = "std::fs::soft_link", replacement = "fs_err::soft_link", allow-invalid = true },Please confirm Clippy loads this config cleanly in CI. If it complains, drop these entries instead. As per coding guidelines.
lib/collection/src/shards/shard_holder/mod.rs (1)
989-991: Avoid blocking create_dir_all in async context.Prefer tokio_fs::create_dir_all to prevent blocking the runtime thread.
Apply:
- if !temp_dir.exists() { - fs::create_dir_all(temp_dir)?; - } + if !temp_dir.exists() { + tokio_fs::create_dir_all(temp_dir).await?; + }lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
308-310: Drop unnecessary PathBuf allocation.OpenOptions::open accepts AsRef; cloning the path isn’t needed.
Apply:
-fn open_append<P: AsRef<Path>>(path: P) -> io::Result<File> { - let path = path.as_ref().to_path_buf(); - OpenOptions::new().append(true).open(path) -} +fn open_append<P: AsRef<Path>>(path: P) -> io::Result<File> { + OpenOptions::new().append(true).open(path) +}lib/collection/src/shards/replica_set/snapshots.rs (1)
134-201: Avoid blocking read_dir in async function.fs::read_dir and Path metadata checks run on the core runtime thread. Prefer tokio_fs::read_dir to prevent blocking during snapshot enumeration.
Apply this pattern:
- for segment_entry in fs::read_dir(segments_path)? { - let segment_path = segment_entry?.path(); + let mut rd = tokio_fs::read_dir(&segments_path).await?; + while let Some(segment_entry) = rd.next_entry().await? { + let segment_path = segment_entry.path(); // ... rest of the loop unchanged ... - } + }Optionally consider replacing other per-iteration stat calls with async equivalents if this path proves hot.
lib/segment/src/utils/fs.rs (1)
61-63: Add EXDEV fallback for cross-device moves.
fs::renamefails with CrossDeviceLink across mount points. Add a fallback: copy+remove for files; for directories, createdest_path, recursively move contents, then remove source.Apply this diff:
- fs::rename(&path, &dest_path) - .map_err(|err| failed_to_move_error(&path, &dest_path, err))?; + if let Err(err) = fs::rename(&path, &dest_path) { + if err.kind() == std::io::ErrorKind::CrossDeviceLink { + if path.is_dir() { + fs::create_dir_all(&dest_path).map_err(|e| { + failed_to_move_error(&path, &dest_path, e) + })?; + move_all_impl(base, &path, &dest_path)?; + fs::remove_dir_all(&path) + .map_err(|e| failed_to_move_error(&path, &dest_path, e))?; + } else { + fs::copy(&path, &dest_path) + .map_err(|e| failed_to_move_error(&path, &dest_path, e))?; + fs::remove_file(&path) + .map_err(|e| failed_to_move_error(&path, &dest_path, e))?; + } + } else { + return Err(failed_to_move_error(&path, &dest_path, err)); + } + }lib/collection/src/common/snapshots_manager.rs (1)
274-275: Avoid blocking fs ops on async paths: use tokio_fs::create_dir_all.These synchronous
create_dir_allcalls block the async runtime. Prefer the fs_err Tokio wrappers.Apply this diff:
- fs::create_dir_all(target_dir)?; + tokio_fs::create_dir_all(target_dir).await?;- fs::create_dir_all(target_dir)?; + tokio_fs::create_dir_all(target_dir).await?;- fs::create_dir_all(target_dir)?; + tokio_fs::create_dir_all(target_dir).await?;Also applies to: 306-307, 417-417
lib/collection/src/shards/local_shard/mod.rs (1)
165-171: Handle NotFound races when deleting shard data.Between
exists()andremove_dir_all, the directory can disappear. Treat NotFound as benign.Apply this diff:
- if wal_path.exists() { - tokio_fs::remove_dir_all(wal_path).await?; - } + if wal_path.exists() { + if let Err(e) = tokio_fs::remove_dir_all(&wal_path).await { + if e.kind() != std::io::ErrorKind::NotFound { + return Err(e.into()); + } + } + } @@ - if segments_path.exists() { - tokio_fs::remove_dir_all(segments_path).await?; - } + if segments_path.exists() { + if let Err(e) = tokio_fs::remove_dir_all(&segments_path).await { + if e.kind() != std::io::ErrorKind::NotFound { + return Err(e.into()); + } + } + }
...nt/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mmap_postings.rs
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/segment/src/id_tracker/mutable_id_tracker.rs (1)
376-382: Don’t unwrap BufWriter::into_inner; map the error (panic risk).
into_inner()can still fail (it flushes). Unwrap would panic instead of returning a structured error.Apply:
- // Explicitly fsync file contents to ensure durability - writer.flush()?; - let file = writer.into_inner().unwrap(); + // Explicitly fsync file contents to ensure durability + writer.flush()?; + let file = writer + .into_inner() + .map_err(|e| { + let err = e.into_error(); + OperationError::service_error(format!( + "Failed to finalize ID tracker point mappings writer ({}): {err}", + mappings_path.display(), + )) + })?; file.sync_all().map_err(|err| { OperationError::service_error(format!("Failed to fsync ID tracker point mappings: {err}")) })?;
🧹 Nitpick comments (25)
lib/segment/tests/integration/segment_builder_test.rs (1)
70-70: Optional: factor out a tiny helper to reduce repetitionIf you want to DRY the repeated directory-count pattern across tests, consider:
fn dir_count<P: AsRef<std::path::Path>>(p: P) -> usize { fs::read_dir(p).unwrap().count() }Then replace occurrences like
fs::read_dir(dir.path()).unwrap().count()withdir_count(dir.path()).Also applies to: 74-74, 88-88, 156-156, 160-160, 174-174, 289-289, 293-293, 307-307, 580-580, 597-597
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (3)
137-146: Fix over-allocation: compute bytes, then align to 64 bytesCurrent length uses 64 * ceil(bits/64), which allocates 8× more bytes than needed. Compute bytes = ceil(bits/8) first, then align bytes to 64.
Apply this diff:
- const BITS_IN_BYTE: usize = 8; - let deleted_flags_count = in_memory_index.point_to_values.len(); - let deleted_file = create_and_ensure_length( - &deleted_path, - BITS_IN_BYTE - * BITS_IN_BYTE - * deleted_flags_count.div_ceil(BITS_IN_BYTE * BITS_IN_BYTE), - )?; + const BITS_IN_BYTE: usize = 8; + let deleted_flags_count = in_memory_index.point_to_values.len(); + // Compute exact bytes needed for `deleted_flags_count` bits, then align to 64 bytes. + let bytes_needed = deleted_flags_count.div_ceil(BITS_IN_BYTE); + let align_bytes = BITS_IN_BYTE * BITS_IN_BYTE; // 64 bytes + let file_len = align_bytes * bytes_needed.div_ceil(align_bytes); + let deleted_file = create_and_ensure_length(&deleted_path, file_len)?;
200-207: Make wipe idempotent and resilient
- Ignore NotFound when removing files (safe to call repeatedly).
- Prefer
remove_dir_allat the end to clean up any empty subdirs left by subcomponents.Apply this diff:
pub fn wipe(self) -> OperationResult<()> { let files = self.files(); let Self { path, .. } = self; for file in files { - fs::remove_file(file)?; + if let Err(err) = fs::remove_file(&file) { + if err.kind() != std::io::ErrorKind::NotFound { + return Err(OperationError::from(err)); + } + } } - let _ = fs::remove_dir(path); + // Best-effort: remove the index directory even if nested dirs remain. + let _ = fs::remove_dir_all(&path); Ok(()) }
322-329: Clarify values_range_size semantics vs. deletions (optional)
end_index - start_indexcounts raw span, not filtered (non-deleted) items. If callers expect the iterator’s effective size, considerself.values_range_iterator(...).count()or rename the method to reflect “span.”lib/segment/src/id_tracker/mutable_id_tracker.rs (4)
134-146: Include path in load error messages for consistency with fs-err.Top-level error mapping for load paths omits the file path; add it for parity with store paths.
- let mappings = if has_mappings { - load_mappings(&mappings_path).map_err(|err| { - OperationError::service_error(format!("Failed to load ID tracker mappings: {err}")) - })? + let mappings = if has_mappings { + load_mappings(&mappings_path).map_err(|err| { + OperationError::service_error(format!( + "Failed to load ID tracker mappings ({}): {err}", + mappings_path.display(), + )) + })? } else { PointMappings::default() }; @@ - let internal_to_version = if has_versions { - load_versions(&versions_path).map_err(|err| { - OperationError::service_error(format!("Failed to load ID tracker versions: {err}")) - })? + let internal_to_version = if has_versions { + load_versions(&versions_path).map_err(|err| { + OperationError::service_error(format!( + "Failed to load ID tracker versions ({}): {err}", + versions_path.display(), + )) + })? } else { vec![] };Based on learnings
724-733: Optional: add context when taking versions writer into inner (consistency).Currently relies on
From<io::Error>; add the path in the message like mappings.- let file = writer.into_inner().map_err(|err| err.into_error())?; + let file = writer.into_inner().map_err(|e| { + let err = e.into_error(); + OperationError::service_error(format!( + "Failed to finalize ID tracker point versions writer ({}): {err}", + versions_path.display(), + )) + })?;
411-420: Optional: consider preserving path context in OneshotFile errors.
OneshotFile::openlikely wrapsstd::fs::File, so errors here may miss the path context fs-err provides. If feasible, adaptOneshotFileto hold/wrapfs_err::File(e.g., viafs_err::File::from_parts::<&Path>as used elsewhere) so subsequent ops (metadata/seek) retain path context.
441-476: Doc mismatch: iterator no longer yields “(change, bytes_read)”.The docs still mention returning a tuple, but the iterator yields
OperationResult<MappingChange>.-/// Each non-errorous item is a tuple of the mapping change and the number of bytes read so far. +/// Each item is a mapping change (`Ok`) or an error (`Err`) if malformed data is encountered.lib/segment/src/vector_storage/async_io.rs (4)
124-135: Use checked 64-bit offset math to avoid overflow and clarify intentOffset can overflow on large datasets or narrow platforms. Compute it in u64 with checked_add/checked_mul and surface a clear error.
- let offset = self.header_size + self.raw_size * point as usize; + // Compute file offset safely in u64 to match io_uring expectations. + let offset = (self.header_size as u64) + .checked_add( + (self.raw_size as u64) + .checked_mul(point as u64) + .ok_or_else(|| OperationError::service_error("offset multiplication overflow"))?, + ) + .ok_or_else(|| OperationError::service_error("offset addition overflow"))?; @@ - .offset(offset as _) + .offset(offset as _)
137-142: Document unsafe rationale at submission siteAdd a brief SAFETY comment explaining buffer lifetime and invariants for io_uring SQE push.
unsafe { - // self.io_uring.submission().push(&read_e).unwrap(); + // SAFETY: + // - `buffer.as_mut_ptr()` points into `self.buffers`, which outlives the I/O and + // is not moved while the request is in flight. + // - We only reuse `buffer_id` after completion (see completion handling below). + // - `offset`/`len` are validated above. io_uring.submission().push(&read_e).map_err(|err| { OperationError::service_error(format!("Failed using io-uring: {err}")) })?; }
180-189: Return richer OS errors from CQE failuresMap negative CQE results to std::io::Error for human-friendly diagnostics (errno message), aligning with the PR’s better error context goal.
- if result < 0 { - return Err(OperationError::service_error(format!( - "io_uring operation failed with {result} error" - ))); + if result < 0 { + let errno = -result; + let io_err = std::io::Error::from_raw_os_error(errno); + return Err(OperationError::service_error(format!( + "io_uring read failed: {io_err} (errno {errno})" + )));
194-195: Avoid transmute-style conversions for slicesConsider replacing transmute_from_u8_to_slice with bytemuck::cast_slice (or zerocopy) to tighten safety contracts. This aligns with our guideline to avoid transmute_* helpers in new code. If T: PrimitiveVectorElement implies Pod/FromBytes, this should be feasible; otherwise, keep as-is.
As per coding guidelines
Example (if T: bytemuck::Pod):
let vector: &[T] = bytemuck::cast_slice(buffer);lib/quantization/src/encoded_vectors_binary.rs (1)
460-474: Use fs_err for directory creation — optional simplification.Logic is correct and benefits from fs_err’s pathful errors. You could simplify the flow to avoid combinators for readability.
- if let Some(meta_path) = meta_path { - meta_path - .parent() - .ok_or_else(|| { - std::io::Error::new( - std::io::ErrorKind::InvalidInput, - "Path must have a parent directory", - ) - }) - .and_then(fs::create_dir_all) - .map_err(|e| { - EncodingError::EncodingError(format!( - "Failed to create metadata directory: {e}", - )) - })?; + if let Some(meta_path) = meta_path { + let parent = meta_path.parent().ok_or_else(|| { + std::io::Error::new( + std::io::ErrorKind::InvalidInput, + "Path must have a parent directory", + ) + })?; + fs::create_dir_all(parent).map_err(|e| { + EncodingError::EncodingError(format!("Failed to create metadata directory: {e}",)) + })?; atomic_save_json(meta_path, &metadata).map_err(|e| { EncodingError::EncodingError(format!("Failed to save metadata: {e}",)) })?; }lib/collection/src/operations/snapshot_ops.rs (1)
117-121: Swap to tokio_fs::metadata — and consider removing unwraps.Change itself is good. While here, consider avoiding panics on non‑UTF8 names:
- let name = path.file_name().unwrap().to_str().unwrap(); + let name = path + .file_name() + .map(|s| s.to_string_lossy().into_owned()) + .unwrap_or_else(|| path.as_os_str().to_string_lossy().into_owned());lib/common/io/src/storage_version.rs (1)
28-36: Preserve path context on read errors using fs_err::read_to_string
file.read_to_stringwon't carry path context if it fails. Since the PR’s goal is richer IO errors, preferfs_err::read_to_string(&version_file)and handle NotFound similarly.- let mut contents = String::new(); - let mut file = match File::open(&version_file) { - Ok(file) => file, - Err(err) if err.kind() == std::io::ErrorKind::NotFound => { - return Ok(None); - } - Err(err) => return Err(err.into()), - }; - file.read_to_string(&mut contents)?; + let contents = match fs_err::read_to_string(&version_file) { + Ok(s) => s, + Err(err) if err.kind() == std::io::ErrorKind::NotFound => { + return Ok(None); + } + Err(err) => return Err(err.into()), + };lib/collection/src/config.rs (1)
262-264: Use fs_err::read_to_string for better diagnostics
read_to_stringon aFileomits path context. Replace withfs_err::read_to_string(&config_path)to keep enriched errors consistent with the PR’s intent.- let mut contents = String::new(); - let mut file = File::open(config_path)?; - file.read_to_string(&mut contents)?; - Ok(serde_json::from_str(&contents)?) + let contents = fs_err::read_to_string(&config_path)?; + Ok(serde_json::from_str(&contents)?)lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (1)
139-147: Avoid TOCTOU: call create_dir_all unconditionallyThe pre-check with Path::exists() is unnecessary and slightly racy. create_dir_all is idempotent.
- // Ensure temp_path exists - if !self.temp_path().exists() { - fs::create_dir_all(self.temp_path()).map_err(|err| { - CollectionError::service_error(format!( - "Could not create temp directory `{}`: {}", - self.temp_path().display(), - err - )) - })?; - } + // Ensure temp_path exists (idempotent) + fs::create_dir_all(self.temp_path()).map_err(|err| { + CollectionError::service_error(format!( + "Could not create temp directory `{}`: {}", + self.temp_path().display(), + err + )) + })?;lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
306-310: Avoid extra allocation in open_append
to_path_buf()is unnecessary here;fs_err::OpenOptions::openacceptsAsRef<Path>. This saves an allocation without changing behavior.fn open_append<P: AsRef<Path>>(path: P) -> io::Result<File> { - let path = path.as_ref().to_path_buf(); - OpenOptions::new().append(true).open(path) + OpenOptions::new().append(true).open(path) }lib/collection/src/shards/local_shard/mod.rs (3)
164-171: Avoid TOCTOU on removal: treat NotFound as success.
exists()+remove_dir_all()can race. Consider ignoringErrorKind::NotFoundto make cleanup idempotent.- if wal_path.exists() { - tokio_fs::remove_dir_all(wal_path).await?; - } + if wal_path.exists() { + if let Err(err) = tokio_fs::remove_dir_all(&wal_path).await { + if err.kind() != std::io::ErrorKind::NotFound { + return Err(err.into()); + } + } + } - if segments_path.exists() { - tokio_fs::remove_dir_all(segments_path).await?; - } + if segments_path.exists() { + if let Err(err) = tokio_fs::remove_dir_all(&segments_path).await { + if err.kind() != std::io::ErrorKind::NotFound { + return Err(err.into()); + } + } + }
347-353: Be lenient on leftover segment purge (ignore NotFound).During concurrent startup/cleanup, the segment directory might already be gone; ignore ENOENT.
- fs::remove_dir_all(&segment_path).map_err(|err| { + fs::remove_dir_all(&segment_path).or_else(|err| { + if err.kind() == std::io::ErrorKind::NotFound { + Ok(()) + } else { + Err(err) + } + }).map_err(|err| { CollectionError::service_error(format!( "failed to remove leftover segment {}: {err}", segment_path.display(), )) })?;
1168-1173: Ignore NotFound when deleting clock files.Races can surface if files are concurrently removed; treat ENOENT as success.
- if newest_clocks_path.exists() { - tokio_fs::remove_file(newest_clocks_path).await?; - } + if newest_clocks_path.exists() { + if let Err(err) = tokio_fs::remove_file(&newest_clocks_path).await { + if err.kind() != std::io::ErrorKind::NotFound { + return Err(err.into()); + } + } + } - if oldest_clocks_path.exists() { - tokio_fs::remove_file(oldest_clocks_path).await?; - } + if oldest_clocks_path.exists() { + if let Err(err) = tokio_fs::remove_file(&oldest_clocks_path).await { + if err.kind() != std::io::ErrorKind::NotFound { + return Err(err.into()); + } + } + }lib/collection/src/shards/replica_set/snapshots.rs (1)
290-298: Make cleanup idempotent: ignore NotFound on removals.Removals during recovery can race with prior cleanup or partial state. Treat ENOENT as success to avoid spurious failures.
- tokio_fs::remove_dir_all(&segment_path) - .await - .map_err(|err| { + if let Err(err) = tokio_fs::remove_dir_all(&segment_path).await { + if err.kind() != io::ErrorKind::NotFound { + return Err(CollectionError::service_error(format!( + "failed to remove outdated segment {}: {err}", + segment_path.display(), + ))); + } + } - })?;- tokio_fs::remove_file(&path).await.map_err(|err| { - CollectionError::service_error(format!( - "failed to remove outdated segment file {}: {err}", - path.display(), - )) - })?; + if let Err(err) = tokio_fs::remove_file(&path).await { + if err.kind() != io::ErrorKind::NotFound { + return Err(CollectionError::service_error(format!( + "failed to remove outdated segment file {}: {err}", + path.display(), + ))); + } + }- tokio_fs::remove_dir_all(&wal_path).await.map_err(|err| { - CollectionError::service_error(format!( - "failed to remove WAL {}: {err}", - wal_path.display(), - )) - })?; + if let Err(err) = tokio_fs::remove_dir_all(&wal_path).await { + if err.kind() != io::ErrorKind::NotFound { + return Err(CollectionError::service_error(format!( + "failed to remove WAL {}: {err}", + wal_path.display(), + ))); + } + }- tokio_fs::remove_file(&shard_flag).await?; + if let Err(err) = tokio_fs::remove_file(&shard_flag).await { + if err.kind() != io::ErrorKind::NotFound { + return Err(err.into()); + } + }Also applies to: 335-341, 375-381, 410-410
lib/collection/src/common/snapshots_manager.rs (1)
274-275: Avoid blocking the async runtime: use tokio_fs::create_dir_all in async fnsThese synchronous directory creations can block the runtime under load. Switch to fs_err::tokio’s async create_dir_all and await it.
@@ - if let Some(target_dir) = target_path.parent() { - fs::create_dir_all(target_dir)?; - } + if let Some(target_dir) = target_path.parent() { + tokio_fs::create_dir_all(target_dir).await?; + } @@ - if let Some(target_dir) = local_path.parent() + if let Some(target_dir) = local_path.parent() && !target_dir.exists() { - fs::create_dir_all(target_dir)?; + tokio_fs::create_dir_all(target_dir).await?; } @@ - if let Some(target_dir) = local_path.parent() + if let Some(target_dir) = local_path.parent() && !target_dir.exists() { - fs::create_dir_all(target_dir)?; + tokio_fs::create_dir_all(target_dir).await?; }Also applies to: 306-307, 416-417
lib/collection/src/collection/snapshots.rs (1)
7-7: Good switch to fs_err::File for richer errors during archive creationUsing fs_err::File surfaces the path on failure when creating the archive, which aligns with the PR’s goals. Looks good.
Minor: File::create inside an async fn can be wrapped in spawn_blocking if you anticipate slow FS in some environments, but it’s typically fine for a single create.
Also applies to: 83-83
lib/collection/benches/prof.rs (1)
6-7: Bench: fs_err integration LGTM
- Using fs_err::{self as fs, File} in benches is consistent.
- create_dir_all via fs_err adds path context; unwraps are acceptable in benches.
Optionally replace the unwrap with expect("failed to create benchmark dir") for a clearer panic.
Also applies to: 65-65
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (107)
Cargo.toml(2 hunks)clippy.toml(1 hunks)lib/api/build.rs(1 hunks)lib/collection/Cargo.toml(1 hunks)lib/collection/benches/prof.rs(2 hunks)lib/collection/src/collection/shard_transfer.rs(2 hunks)lib/collection/src/collection/snapshots.rs(1 hunks)lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs(2 hunks)lib/collection/src/collection_manager/optimizers/segment_optimizer.rs(2 hunks)lib/collection/src/common/file_utils.rs(6 hunks)lib/collection/src/common/sha_256.rs(2 hunks)lib/collection/src/common/snapshots_manager.rs(10 hunks)lib/collection/src/config.rs(1 hunks)lib/collection/src/operations/snapshot_ops.rs(3 hunks)lib/collection/src/operations/snapshot_storage_ops.rs(4 hunks)lib/collection/src/optimizers_builder.rs(2 hunks)lib/collection/src/shards/local_shard/mod.rs(6 hunks)lib/collection/src/shards/local_shard/snapshot.rs(3 hunks)lib/collection/src/shards/local_shard/snapshot_tests.rs(3 hunks)lib/collection/src/shards/mod.rs(2 hunks)lib/collection/src/shards/replica_set/snapshots.rs(8 hunks)lib/collection/src/shards/shard_holder/mod.rs(3 hunks)lib/collection/src/shards/shard_holder/shard_mapping.rs(1 hunks)lib/collection/tests/integration/collection_test.rs(1 hunks)lib/common/common/Cargo.toml(1 hunks)lib/common/common/src/disk.rs(2 hunks)lib/common/common/src/mmap_hashmap.rs(3 hunks)lib/common/common/src/save_on_disk.rs(3 hunks)lib/common/dataset/Cargo.toml(1 hunks)lib/common/dataset/src/lib.rs(3 hunks)lib/common/io/Cargo.toml(1 hunks)lib/common/io/src/file_operations.rs(1 hunks)lib/common/io/src/storage_version.rs(1 hunks)lib/common/memory/Cargo.toml(1 hunks)lib/common/memory/src/checkfs.rs(4 hunks)lib/common/memory/src/chunked_utils.rs(2 hunks)lib/common/memory/src/fadvise.rs(3 hunks)lib/common/memory/src/mmap_ops.rs(2 hunks)lib/common/memory/src/mmap_type.rs(1 hunks)lib/common/memory/src/mmap_type_readonly.rs(1 hunks)lib/gridstore/Cargo.toml(1 hunks)lib/gridstore/benches/real_data_bench.rs(3 hunks)lib/gridstore/src/bitmask/gaps.rs(1 hunks)lib/gridstore/src/gridstore.rs(6 hunks)lib/gridstore/src/page.rs(2 hunks)lib/quantization/Cargo.toml(1 hunks)lib/quantization/src/encoded_storage.rs(2 hunks)lib/quantization/src/encoded_vectors_binary.rs(3 hunks)lib/quantization/src/encoded_vectors_pq.rs(3 hunks)lib/quantization/src/encoded_vectors_u8.rs(4 hunks)lib/segment/Cargo.toml(1 hunks)lib/segment/benches/fixture.rs(3 hunks)lib/segment/benches/hnsw_incremental_build.rs(2 hunks)lib/segment/benches/mmap_bitslice_buffered_update_wrapper.rs(1 hunks)lib/segment/benches/prof.rs(2 hunks)lib/segment/src/common/flags/dynamic_mmap_flags.rs(2 hunks)lib/segment/src/common/validate_snapshot_archive.rs(1 hunks)lib/segment/src/id_tracker/immutable_id_tracker.rs(1 hunks)lib/segment/src/id_tracker/mutable_id_tracker.rs(3 hunks)lib/segment/src/index/field_index/bool_index/mutable_bool_index.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mmap_postings.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs(3 hunks)lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs(3 hunks)lib/segment/src/index/field_index/map_index/mmap_map_index.rs(3 hunks)lib/segment/src/index/field_index/null_index/mutable_null_index.rs(3 hunks)lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs(3 hunks)lib/segment/src/index/hnsw_index/graph_layers.rs(2 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(2 hunks)lib/segment/src/index/plain_payload_index.rs(2 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(4 hunks)lib/segment/src/index/struct_payload_index.rs(2 hunks)lib/segment/src/payload_storage/mmap_payload_storage.rs(2 hunks)lib/segment/src/rocksdb_backup.rs(1 hunks)lib/segment/src/segment/entry.rs(1 hunks)lib/segment/src/segment/segment_ops.rs(4 hunks)lib/segment/src/segment/snapshot.rs(1 hunks)lib/segment/src/segment/tests.rs(3 hunks)lib/segment/src/segment_constructor/segment_builder.rs(3 hunks)lib/segment/src/segment_constructor/segment_constructor_base.rs(7 hunks)lib/segment/src/utils/fs.rs(3 hunks)lib/segment/src/vector_storage/async_io.rs(1 hunks)lib/segment/src/vector_storage/async_io_mock.rs(1 hunks)lib/segment/src/vector_storage/chunked_mmap_vectors.rs(2 hunks)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs(1 hunks)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs(3 hunks)lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs(4 hunks)lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs(2 hunks)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs(3 hunks)lib/segment/tests/integration/payload_index_test.rs(2 hunks)lib/segment/tests/integration/segment_builder_test.rs(9 hunks)lib/segment/tests/integration/segment_on_disk_snapshot.rs(3 hunks)lib/segment/tests/integration/segment_tests.rs(2 hunks)lib/segment/tests/integration/sparse_vector_index_search_tests.rs(2 hunks)lib/shard/Cargo.toml(1 hunks)lib/shard/src/proxy_segment/tests.rs(3 hunks)lib/shard/src/wal.rs(1 hunks)lib/sparse/Cargo.toml(1 hunks)lib/sparse/benches/prof.rs(2 hunks)lib/sparse/src/index/loaders.rs(2 hunks)lib/storage/Cargo.toml(1 hunks)lib/storage/src/content_manager/alias_mapping.rs(1 hunks)lib/storage/src/content_manager/consensus/consensus_wal.rs(1 hunks)lib/storage/src/content_manager/consensus/persistent.rs(2 hunks)lib/storage/src/content_manager/snapshots/download.rs(2 hunks)
⛔ Files not processed due to max files limit (15)
- lib/storage/src/content_manager/snapshots/mod.rs
- lib/storage/src/content_manager/snapshots/recover.rs
- lib/storage/src/content_manager/toc/collection_meta_ops.rs
- lib/storage/src/content_manager/toc/mod.rs
- lib/storage/src/content_manager/toc/snapshots.rs
- lib/storage/src/content_manager/toc/temp_directories.rs
- src/actix/api/snapshot_api.rs
- src/actix/certificate_helpers.rs
- src/common/helpers.rs
- src/common/http_client.rs
- src/main.rs
- src/settings.rs
- src/snapshots.rs
- src/startup.rs
- src/tracing/on_disk.rs
🚧 Files skipped from review as they are similar to previous changes (59)
- lib/api/build.rs
- lib/segment/src/segment_constructor/segment_builder.rs
- lib/common/memory/src/chunked_utils.rs
- lib/collection/src/collection/shard_transfer.rs
- lib/quantization/src/encoded_storage.rs
- lib/segment/src/common/validate_snapshot_archive.rs
- lib/segment/tests/integration/segment_on_disk_snapshot.rs
- lib/segment/src/index/sparse_index/sparse_vector_index.rs
- lib/common/memory/src/mmap_type.rs
- lib/common/memory/src/mmap_ops.rs
- lib/segment/src/index/struct_payload_index.rs
- clippy.toml
- lib/collection/src/common/file_utils.rs
- lib/segment/src/index/hnsw_index/hnsw.rs
- lib/gridstore/Cargo.toml
- lib/common/common/src/save_on_disk.rs
- lib/quantization/src/encoded_vectors_u8.rs
- lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs
- lib/collection/src/shards/shard_holder/mod.rs
- lib/gridstore/src/gridstore.rs
- lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
- lib/segment/src/index/field_index/bool_index/mutable_bool_index.rs
- lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs
- lib/common/common/Cargo.toml
- lib/storage/src/content_manager/snapshots/download.rs
- lib/segment/benches/hnsw_incremental_build.rs
- lib/segment/src/index/field_index/map_index/mmap_map_index.rs
- lib/collection/src/shards/local_shard/snapshot_tests.rs
- lib/common/common/src/mmap_hashmap.rs
- lib/collection/src/common/sha_256.rs
- lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
- lib/collection/src/shards/shard_holder/shard_mapping.rs
- lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
- lib/segment/src/vector_storage/chunked_mmap_vectors.rs
- lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs
- lib/segment/src/id_tracker/immutable_id_tracker.rs
- lib/segment/src/index/plain_payload_index.rs
- lib/segment/tests/integration/sparse_vector_index_search_tests.rs
- lib/segment/src/segment/segment_ops.rs
- lib/segment/benches/prof.rs
- lib/collection/Cargo.toml
- lib/sparse/benches/prof.rs
- lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
- lib/common/dataset/Cargo.toml
- lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs
- lib/segment/src/index/hnsw_index/graph_layers.rs
- lib/collection/src/optimizers_builder.rs
- lib/common/memory/src/fadvise.rs
- lib/quantization/Cargo.toml
- lib/collection/src/operations/snapshot_storage_ops.rs
- lib/segment/src/index/field_index/null_index/mutable_null_index.rs
- lib/segment/tests/integration/payload_index_test.rs
- lib/common/memory/Cargo.toml
- lib/shard/Cargo.toml
- lib/segment/Cargo.toml
- lib/segment/src/segment/entry.rs
- lib/gridstore/src/page.rs
- lib/sparse/src/index/loaders.rs
- lib/segment/benches/mmap_bitslice_buffered_update_wrapper.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/collection/src/collection_manager/optimizers/segment_optimizer.rslib/storage/src/content_manager/alias_mapping.rslib/storage/src/content_manager/consensus/persistent.rslib/segment/tests/integration/segment_builder_test.rslib/common/io/src/storage_version.rslib/storage/src/content_manager/consensus/consensus_wal.rslib/segment/src/id_tracker/mutable_id_tracker.rslib/segment/src/vector_storage/async_io.rslib/collection/src/operations/snapshot_ops.rslib/gridstore/benches/real_data_bench.rslib/collection/tests/integration/collection_test.rslib/collection/src/collection/snapshots.rslib/shard/src/wal.rslib/segment/src/segment/tests.rslib/common/dataset/src/lib.rslib/segment/src/utils/fs.rslib/gridstore/src/bitmask/gaps.rslib/segment/tests/integration/segment_tests.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/common/memory/src/mmap_type_readonly.rslib/collection/src/shards/local_shard/mod.rslib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mmap_postings.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/common/flags/dynamic_mmap_flags.rslib/collection/src/shards/replica_set/snapshots.rslib/segment/src/rocksdb_backup.rslib/shard/src/proxy_segment/tests.rslib/segment/src/segment/snapshot.rslib/common/memory/src/checkfs.rslib/segment/src/vector_storage/quantized/quantized_mmap_storage.rslib/collection/src/shards/mod.rslib/collection/src/common/snapshots_manager.rslib/quantization/src/encoded_vectors_pq.rslib/quantization/src/encoded_vectors_binary.rslib/segment/benches/fixture.rslib/collection/src/shards/local_shard/snapshot.rslib/collection/src/config.rslib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/collection/benches/prof.rslib/common/common/src/disk.rslib/segment/src/vector_storage/async_io_mock.rslib/common/io/src/file_operations.rslib/segment/src/payload_storage/mmap_payload_storage.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/collection/src/collection_manager/optimizers/segment_optimizer.rslib/storage/src/content_manager/alias_mapping.rslib/storage/src/content_manager/consensus/persistent.rslib/common/io/src/storage_version.rslib/storage/src/content_manager/consensus/consensus_wal.rslib/segment/src/id_tracker/mutable_id_tracker.rslib/segment/src/vector_storage/async_io.rslib/collection/src/operations/snapshot_ops.rslib/collection/src/collection/snapshots.rslib/shard/src/wal.rslib/segment/src/segment/tests.rslib/common/dataset/src/lib.rslib/segment/src/utils/fs.rslib/gridstore/src/bitmask/gaps.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/common/memory/src/mmap_type_readonly.rslib/collection/src/shards/local_shard/mod.rslib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mmap_postings.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/common/flags/dynamic_mmap_flags.rslib/collection/src/shards/replica_set/snapshots.rslib/segment/src/rocksdb_backup.rslib/shard/src/proxy_segment/tests.rslib/segment/src/segment/snapshot.rslib/common/memory/src/checkfs.rslib/segment/src/vector_storage/quantized/quantized_mmap_storage.rslib/collection/src/shards/mod.rslib/collection/src/common/snapshots_manager.rslib/quantization/src/encoded_vectors_pq.rslib/quantization/src/encoded_vectors_binary.rslib/collection/src/shards/local_shard/snapshot.rslib/collection/src/config.rslib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/common/common/src/disk.rslib/segment/src/vector_storage/async_io_mock.rslib/common/io/src/file_operations.rslib/segment/src/payload_storage/mmap_payload_storage.rs
**/{tests,benches}/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged
Files:
lib/segment/tests/integration/segment_builder_test.rslib/gridstore/benches/real_data_bench.rslib/collection/tests/integration/collection_test.rslib/segment/tests/integration/segment_tests.rslib/segment/benches/fixture.rslib/collection/benches/prof.rs
🧠 Learnings (11)
📚 Learning: 2025-07-11T11:35:21.549Z
Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Applied to files:
lib/segment/tests/integration/segment_builder_test.rslib/segment/src/id_tracker/mutable_id_tracker.rslib/segment/tests/integration/segment_tests.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/segment/tests/integration/segment_builder_test.rslib/segment/src/id_tracker/mutable_id_tracker.rslib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mmap_postings.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/shard/src/proxy_segment/tests.rslib/segment/src/vector_storage/quantized/quantized_mmap_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-03-12T16:23:44.715Z
Learnt from: timvisee
PR: qdrant/qdrant#6150
File: lib/segment/src/id_tracker/mutable_id_tracker.rs:62-71
Timestamp: 2025-03-12T16:23:44.715Z
Learning: In the MutableIdTracker implementation in Qdrant, the error case where versions file exists but mappings file is missing is considered a system-level error (potential corruption) rather than a user error, and is handled via logging and debug assertions without surfacing user-facing errors.
Applied to files:
lib/segment/src/id_tracker/mutable_id_tracker.rs
📚 Learning: 2025-06-14T20:35:10.603Z
Learnt from: generall
PR: qdrant/qdrant#6635
File: lib/segment/src/index/field_index/full_text_index/text_index.rs:784-832
Timestamp: 2025-06-14T20:35:10.603Z
Learning: Functions gated with `#[cfg(feature = "testing")]` within `#[cfg(test)]` modules are safe from compilation issues since both the function and its call sites are restricted to test builds. The additional feature gate is often used for optional heavy test utilities.
Applied to files:
lib/shard/src/wal.rs
📚 Learning: 2025-08-29T08:16:28.521Z
Learnt from: CR
PR: qdrant/qdrant#0
File: .github/review-rules.md:0-0
Timestamp: 2025-08-29T08:16:28.521Z
Learning: Applies to **/{tests,benches}/**/*.rs : Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged
Applied to files:
lib/shard/src/wal.rs
📚 Learning: 2025-08-20T15:03:18.522Z
Learnt from: generall
PR: qdrant/qdrant#7100
File: lib/segment/src/index/field_index/full_text_index/inverted_index/mutable_inverted_index.rs:82-100
Timestamp: 2025-08-20T15:03:18.522Z
Learning: The merge_postings_iterator function in lib/segment/src/index/field_index/full_text_index/inverted_index/postings_iterator.rs handles deduplication internally, so there's no need to add .dedup() when using it for OR-merge operations in inverted indexes.
Applied to files:
lib/segment/src/index/field_index/full_text_index/inverted_index/mmap_inverted_index/mmap_postings.rs
📚 Learning: 2025-09-01T11:19:26.371Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.
Applied to files:
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_mmap_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.
Applied to files:
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_mmap_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-06-06T07:52:38.478Z
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
Applied to files:
lib/segment/src/payload_storage/mmap_payload_storage.rs
📚 Learning: 2025-04-07T23:31:22.515Z
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` does not return a `Result` because the underlying `mmap.populate()` method doesn't return a Result either. Adding an unnecessary `Ok(())` would cause Clippy to complain.
Applied to files:
lib/segment/src/payload_storage/mmap_payload_storage.rs
🧬 Code graph analysis (33)
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/storage/src/content_manager/consensus/persistent.rs (4)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)src/actix/api/snapshot_api.rs (1)
fs(747-747)lib/common/common/src/mmap_hashmap.rs (1)
File(126-126)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/segment/tests/integration/segment_builder_test.rs (1)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)
lib/storage/src/content_manager/consensus/consensus_wal.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)src/actix/api/snapshot_api.rs (1)
fs(747-747)
lib/segment/src/id_tracker/mutable_id_tracker.rs (1)
lib/common/common/src/mmap_hashmap.rs (1)
File(126-126)
lib/segment/src/vector_storage/async_io.rs (1)
lib/common/common/src/mmap_hashmap.rs (1)
File(126-126)
lib/collection/tests/integration/collection_test.rs (1)
lib/common/common/src/mmap_hashmap.rs (1)
File(126-126)
lib/shard/src/wal.rs (1)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)
lib/segment/src/segment/tests.rs (1)
lib/common/common/src/tar_ext.rs (3)
tar(49-51)new_seekable_owned(84-86)new(113-128)
lib/common/dataset/src/lib.rs (1)
lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/segment/src/utils/fs.rs (1)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)
lib/gridstore/src/bitmask/gaps.rs (1)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)
lib/segment/tests/integration/segment_tests.rs (1)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)
lib/segment/src/segment_constructor/segment_constructor_base.rs (4)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)src/actix/api/snapshot_api.rs (1)
fs(747-747)lib/common/common/src/mmap_hashmap.rs (1)
File(126-126)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/collection/src/shards/local_shard/mod.rs (3)
lib/collection/src/shards/local_shard/snapshot.rs (1)
fs(37-38)lib/collection/src/shards/mod.rs (1)
shard_path(41-43)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/collection/src/shards/replica_set/snapshots.rs (2)
src/common/http_client.rs (1)
io(155-157)lib/collection/src/shards/local_shard/mod.rs (2)
fs(298-305)segments_path(467-469)
lib/shard/src/proxy_segment/tests.rs (1)
lib/common/common/src/tar_ext.rs (2)
tar(49-51)new_seekable_owned(84-86)
lib/segment/src/segment/snapshot.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)src/actix/api/snapshot_api.rs (1)
fs(747-747)
lib/common/memory/src/checkfs.rs (1)
lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/collection/src/common/snapshots_manager.rs (3)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
create(86-118)
lib/quantization/src/encoded_vectors_pq.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/quantization/src/encoded_vectors_binary.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/segment/benches/fixture.rs (1)
lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/collection/src/shards/local_shard/snapshot.rs (1)
lib/collection/src/shards/local_shard/mod.rs (2)
fs(298-305)segments_path(467-469)
lib/collection/src/config.rs (1)
lib/common/common/src/mmap_hashmap.rs (1)
File(126-126)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/collection/benches/prof.rs (1)
lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
lib/segment/src/vector_storage/async_io_mock.rs (1)
lib/common/common/src/mmap_hashmap.rs (1)
File(126-126)
lib/common/io/src/file_operations.rs (1)
lib/common/common/src/mmap_hashmap.rs (1)
File(126-126)
lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
lib/collection/src/shards/local_shard/mod.rs (1)
fs(298-305)lib/segment/src/rocksdb_backup.rs (1)
create_dir_all(49-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: storage-compat-test
This PR integrates the
fs-errcrate that provides wrappers aroundstd::fsandtokio::fswith more helpful messages on errors.Almost every use of the original modules is replaced with
fs-errwrappers, and clippy is configured to discourage using the original modules.Showcase
To illustrate the difference, I've provoked an I/O error by deleting
storage/collections/benchmark/0/temp_segments/*during running optimization and checked the logs.Logs before this PR:
2025-09-26T19:53:42.555172Z ERROR qdrant::startup: Panic occurred in file lib/collection/src/update_handler.rs at line 456: Optimization error: Service internal error: IO Error: No such file or directory (os error 2)Logs after this PR:
2025-09-26T19:52:40.733049Z ERROR qdrant::startup: Panic occurred in file lib/collection/src/update_handler.rs at line 456: Optimization error: Service internal error: IO Error: failed to open file `/qdrant_data/./storage/collections/benchmark/0/temp_segments/segment_builder_izOsWw/id_tracker.tmp`: No such file or directory (os error 2)