Conversation
b8b0626 to
3431c31
Compare
8b4b057 to
137384d
Compare
This comment was marked as resolved.
This comment was marked as resolved.
137384d to
db224df
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
lib/segment/src/types.rs (1)
3161-3165: Borrowed-From wrapper delegates via clone — OK; optional: avoid enum-wide clone.Current impl satisfies earlier guidance and uses WithPayload::from(interface.clone()). If you want to avoid cloning the whole enum, match on the borrowed value and clone only inner data.
-impl From<&WithPayloadInterface> for WithPayload { - fn from(interface: &WithPayloadInterface) -> Self { - WithPayload::from(interface.clone()) - } -} +impl From<&WithPayloadInterface> for WithPayload { + fn from(interface: &WithPayloadInterface) -> Self { + match interface { + WithPayloadInterface::Bool(enable) => WithPayload { enable: *enable, payload_selector: None }, + WithPayloadInterface::Fields(fields) => WithPayload { + enable: true, + payload_selector: Some(PayloadSelector::new_include(fields.clone())), + }, + WithPayloadInterface::Selector(selector) => WithPayload { + enable: true, + payload_selector: Some(selector.clone()), + }, + } + } +}lib/edge/src/lib.rs (4)
128-134: Don’t panic on missing config/segments; return a service error and prefer PathBuf::from.Avoids crashing in embeddable use; also adhere to style (SomeType::from over into).
- let shard = Self { - _path: path.into(), - config: config.expect("config was provided or at least one segment was loaded"), - wal: parking_lot::Mutex::new(wal), - segments: Arc::new(parking_lot::RwLock::new(segments)), - }; + let shard = Self { + _path: PathBuf::from(path), + config: match config { + Some(c) => c, + None => return Err(service_error("no segments found and no config provided")), + }, + wal: Mutex::new(wal), + segments: Arc::new(parking_lot::RwLock::new(segments)), + };
242-247: Don’t expect() on missing vector config; return a clear error.User input can reference an unknown vector name; avoid panic.
- let distance = self - .config - .vector_data - .get(&vector_name) - .expect("vector config exist") - .distance; + let distance = self + .config + .vector_data + .get(&vector_name) + .ok_or_else(|| service_error(format!("vector '{vector_name}' not found in shard config")))? + .distance;
104-116: Strict config equality across segments is too strong; perform compatibility checks.Segments in one shard may differ in index/storage while sharing the vector schema. Validate only invariants (names, sizes, distances, required options), not full equality.
- if let Some(config) = &config { - // TODO: Is simple equality check sufficient? 🤔 - if config != segment.config() { - return Err(OperationError::service_error(format!( - "segment {} does not match expected segment config: expected {:?}, but received {:?}", - segment_path.display(), - config, - segment.config(), - ))); - } - } else { - config = Some(segment.config().clone()); - } + if let Some(cfg) = &config { + if let Err(e) = validate_segment_config_compat(cfg, segment.config()) { + return Err(service_error(format!( + "segment {} incompatible with shard config: {e}", + segment_path.display() + ))); + } + } else { + config = Some(segment.config().clone()); + }Additional helper (place near other free fns):
fn validate_segment_config_compat(base: &SegmentConfig, seg: &SegmentConfig) -> OperationResult<()> { // 1) vector schemas must exist and match name->(size,distance) for (name, base_vec) in &base.vector_data { let Some(seg_vec) = seg.vector_data.get(name) else { return Err(service_error(format!("missing vector '{name}'"))); }; if base_vec.size != seg_vec.size || base_vec.distance != seg_vec.distance { return Err(service_error(format!( "vector '{name}' mismatch: size/distance differ (base: {} {:?}, seg: {} {:?})", base_vec.size, base_vec.distance, seg_vec.size, seg_vec.distance ))); } } // Allow differences in storage/index/quantization unless you have stricter requirements. Ok(()) }
267-268: Offset drain can panic; clamp to length.Drain panics when offset > points.len().
- let _ = points.drain(..offset); + let _ = points.drain(..offset.min(points.len()));
🧹 Nitpick comments (3)
lib/shard/src/search_result_aggregator.rs (1)
82-93: Guard batch_id in debug builds.Add a bounds check to make misuse fail-fast in debug.
pub fn update_batch_results( &mut self, batch_id: usize, search_results: impl IntoIterator<Item = ScoredPoint>, ) { + debug_assert!(batch_id < self.batch_aggregators.len()); let aggregator = &mut self.batch_aggregators[batch_id]; for scored_point in search_results { debug_assert!(self.point_versions.contains_key(&scored_point.id));lib/edge/src/lib.rs (2)
142-147: Limit WAL lock scope to the write.Avoid holding the mutex during heavy processing.
- let mut wal = self.wal.lock(); - - let operation_id = wal.write(&operation).map_err(service_error)?; + let operation_id = { + let mut wal = self.wal.lock(); + wal.write(&operation).map_err(service_error)? + };
46-53: Handle missing segments directory gracefully.Treat a non-existent segments/ as empty when config is provided; only error on other IO failures.
- let segments_dir = fs::read_dir(&segments_path).map_err(|err| { - OperationError::service_error(format!( - "failed to read segments directory {}: {err}", - segments_path.display() - )) - })?; + let segments_dir = match fs::read_dir(&segments_path) { + Ok(it) => Some(it), + Err(e) if e.kind() == std::io::ErrorKind::NotFound => None, + Err(err) => { + return Err(service_error(format!( + "failed to read segments directory {}: {err}", + segments_path.display() + ))) + } + };And iterate only if Some:
let mut segments = SegmentHolder::default(); if let Some(dir) = segments_dir { for entry in dir { // unchanged… } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
Cargo.toml(1 hunks)lib/collection/src/collection_manager/mod.rs(0 hunks)lib/collection/src/collection_manager/segments_searcher.rs(1 hunks)lib/collection/src/shards/local_shard/mod.rs(2 hunks)lib/collection/src/wal_delta.rs(1 hunks)lib/edge/Cargo.toml(1 hunks)lib/edge/examples/edge-cli.rs(1 hunks)lib/edge/src/lib.rs(1 hunks)lib/segment/src/types.rs(5 hunks)lib/shard/src/lib.rs(1 hunks)lib/shard/src/search_result_aggregator.rs(3 hunks)lib/shard/src/wal.rs(2 hunks)src/wal_inspector.rs(1 hunks)
💤 Files with no reviewable changes (1)
- lib/collection/src/collection_manager/mod.rs
🚧 Files skipped from review as they are similar to previous changes (8)
- lib/collection/src/collection_manager/segments_searcher.rs
- lib/collection/src/wal_delta.rs
- Cargo.toml
- lib/edge/Cargo.toml
- lib/edge/examples/edge-cli.rs
- lib/shard/src/wal.rs
- src/wal_inspector.rs
- lib/collection/src/shards/local_shard/mod.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/shard/src/lib.rslib/shard/src/search_result_aggregator.rslib/edge/src/lib.rslib/segment/src/types.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/shard/src/lib.rslib/shard/src/search_result_aggregator.rslib/edge/src/lib.rslib/segment/src/types.rs
🧠 Learnings (2)
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/edge/src/lib.rslib/segment/src/types.rs
📚 Learning: 2025-09-01T11:42:06.964Z
Learnt from: timvisee
PR: qdrant/qdrant#7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.
Applied to files:
lib/edge/src/lib.rs
🧬 Code graph analysis (1)
lib/edge/src/lib.rs (3)
lib/segment/src/data_types/query_context.rs (1)
hardware_counter(192-194)lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
load_segment(687-764)lib/shard/src/update.rs (4)
process_point_operation(27-60)process_vector_operation(62-79)process_payload_operation(81-147)process_field_index_operation(149-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: e2e-tests
- GitHub Check: test-consensus-compose
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: lint
🔇 Additional comments (9)
lib/shard/src/search_result_aggregator.rs (2)
48-53: Broadened API to IntoIterator: good ergonomics with zero-cost.Accepting IntoIterator on new() improves call-site flexibility without runtime overhead.
60-71: IntoIterator on update_point_versions: LGTM.Keeps internals unchanged; iterator consumption is still explicit in the for-loop.
lib/segment/src/types.rs (5)
1278-1288: Add PartialEq to SegmentConfig: sensible and useful.Enables config comparisons in callers (e.g., shard/edge). No concerns.
1431-1450: Add PartialEq to VectorDataConfig: LGTM.Matches usage patterns for schema compatibility checks.
1471-1473: Derive Eq/PartialEq for SparseVectorStorageType: LGTM.No behavioral change; improves comparisons and map/set use.
1498-1507: Add PartialEq to SparseVectorDataConfig: LGTM.Consistent with surrounding config types.
3142-3159: By-value From mapping looks correct.Covers all variants and aligns with existing selector constructors.
lib/shard/src/lib.rs (1)
6-6: Expose search_result_aggregator publicly: LGTM.Matches its new cross-crate usage (collection, edge).
lib/edge/src/lib.rs (1)
93-103: Confirm safety of removing “leftover” segments.load_segment returns None for several cases (deleted marker, missing version). Deleting the directory is destructive; confirm this matches server semantics and expectations for embeddable mode.
db224df to
9025929
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
lib/edge/src/lib.rs (2)
86-91: Optional: pass a shared cancel flag to load_segment.Consider threading an
&AtomicBoolfromShard::loadargs to support cancellation (e.g., SIGINT).
209-226: Avoid panics on single-batch assumptions.Both
.expect("single batched search result")sites will abort the process on API misuse. Prefer returningOperationError.Also applies to: 237-241
lib/segment/src/types.rs (2)
1409-1435: Revisit sparse vector compatibility semantics (keys must match?).
SegmentConfig::is_compatibleenforces identical key sets forsparse_vector_data. Earlier discussion suggested sparse configs can differ; if presence/keys may vary, this will reject valid segments.Would you like to relax the sparse check to not require identical maps (e.g., permit differing keys while validating overlaps)?
Also applies to: 1438-1458
1541-1561: Confirm strictness for multivector/datatype in vector compatibility.You require
multivector_configanddatatypeto match exactly. If differing between segments is acceptable, loosen these checks; if not, keep as-is.Also applies to: 1580-1602
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/edge/src/lib.rs(1 hunks)lib/segment/src/types.rs(8 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/edge/src/lib.rslib/segment/src/types.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/edge/src/lib.rslib/segment/src/types.rs
🧠 Learnings (3)
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/edge/src/lib.rslib/segment/src/types.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/edge/src/lib.rs
📚 Learning: 2025-09-01T11:42:06.964Z
Learnt from: timvisee
PR: qdrant/qdrant#7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.
Applied to files:
lib/edge/src/lib.rs
🧬 Code graph analysis (2)
lib/edge/src/lib.rs (3)
lib/segment/src/data_types/query_context.rs (1)
hardware_counter(192-194)lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
load_segment(687-764)lib/shard/src/update.rs (4)
process_point_operation(27-60)process_vector_operation(62-79)process_payload_operation(81-147)process_field_index_operation(149-167)
lib/segment/src/types.rs (3)
lib/edge/src/lib.rs (1)
config(138-140)lib/segment/src/entry/entry_point.rs (1)
config(258-258)lib/segment/src/segment/entry.rs (1)
config(609-611)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: e2e-tests
- GitHub Check: test-consistency
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: storage-compat-test
- GitHub Check: integration-tests-consensus
- GitHub Check: lint
- GitHub Check: integration-tests
🔇 Additional comments (6)
lib/edge/src/lib.rs (4)
216-216: Use saturating_add for per-segment limit and aggregator top‑k.
offset + limitcan overflowusize.- offset + limit, + offset.saturating_add(limit),- let mut aggregator = BatchResultAggregator::new([offset + limit]); + let mut aggregator = BatchResultAggregator::new([offset.saturating_add(limit)]);Also applies to: 230-230
128-134: Don't panic on missing config; adhere to From-over-Into.
- Avoid
.expect(...); return a proper error when no segments were loaded and no config provided.- Follow the repo style: use
PathBuf::from(path)instead ofpath.into().- Also use the imported
Mutexalias.- let shard = Self { - _path: path.into(), - config: config.expect("config was provided or at least one segment was loaded"), - wal: parking_lot::Mutex::new(wal), - segments: Arc::new(parking_lot::RwLock::new(segments)), - }; + let shard = Self { + _path: PathBuf::from(path), + config: match config { + Some(c) => c, + None => return Err(service_error("no segments found and no config provided")), + }, + wal: Mutex::new(wal), + segments: Arc::new(parking_lot::RwLock::new(segments)), + };
242-248: Return an error if vector config is missing (no panics).Replace
.expect("vector config exist")with a service error to handle unknown vector names gracefully.- let distance = self - .config - .vector_data - .get(&vector_name) - .expect("vector config exist") - .distance; + let distance = self + .config + .vector_data + .get(&vector_name) + .ok_or_else(|| service_error(format!("vector '{vector_name}' not found in shard config")))? + .distance;
277-279: Guard drain against offset > len to avoid panic.Use a capped range.
- let _ = points.drain(..offset); + let _ = points.drain(..offset.min(points.len()));lib/segment/src/types.rs (2)
334-340: Good addition: explicit ordering helper.
Distance::is_orderedcleanly encodes sort direction for debug checks.
3395-3418: Nice: by‑value and by‑ref conversions for WithPayloadInterface.Adds ergonomic
From<WithPayloadInterface>and keepsFrom<&WithPayloadInterface>delegating via clone.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
lib/edge/src/lib.rs (2)
46-53: Create segments dir if missing; fail-fast only on actual I/O errors.Load should initialize an empty shard location. Currently it errors when the segments directory doesn’t exist.
Apply this diff:
- let segments_path = path.join(SEGMENTS_PATH); - let segments_dir = fs::read_dir(&segments_path).map_err(|err| { - OperationError::service_error(format!( - "failed to read segments directory {}: {err}", - segments_path.display() - )) - })?; + let segments_path = path.join(SEGMENTS_PATH); + fs::create_dir_all(&segments_path).map_err(|err| { + service_error(format!( + "failed to create segments directory {}: {err}", + segments_path.display() + )) + })?; + let segments_dir = fs::read_dir(&segments_path).map_err(|err| { + service_error(format!( + "failed to read segments directory {}: {err}", + segments_path.display() + )) + })?;
142-147: Release WAL mutex before heavy segment ops.You hold the WAL lock across all segment updates. Write to WAL, then drop the lock to reduce contention.
Apply this diff:
- let mut wal = self.wal.lock(); - - let operation_id = wal.write(&operation).map_err(service_error)?; + let operation_id = { + let mut wal = self.wal.lock(); + wal.write(&operation).map_err(service_error)? + };Please confirm there’s no ordering/ack coupling requiring the lock to be held during apply for your edge use case.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/edge/src/lib.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/edge/src/lib.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/edge/src/lib.rs
🧠 Learnings (3)
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/edge/src/lib.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/edge/src/lib.rs
📚 Learning: 2025-09-01T11:42:06.964Z
Learnt from: timvisee
PR: qdrant/qdrant#7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.
Applied to files:
lib/edge/src/lib.rs
🧬 Code graph analysis (1)
lib/edge/src/lib.rs (3)
lib/segment/src/data_types/query_context.rs (1)
hardware_counter(192-194)lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
load_segment(687-764)lib/shard/src/update.rs (4)
process_point_operation(27-60)process_vector_operation(62-79)process_payload_operation(81-147)process_field_index_operation(149-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: test-consistency
- GitHub Check: storage-compat-test
🔇 Additional comments (6)
lib/edge/src/lib.rs (6)
86-92: Expose stoppable flag instead ofAtomicBool::new(false)to support SIGINT.Forward a stop flag from the caller so load can be interrupted during long multi-segment loads.
Minimal change now (without API change): accept a parameter later on
Shard::load(path, config, stopped: &AtomicBool)and forward it toload_segment. For now, consider a TODO to wire this through.
128-134: Don’t panic on empty segments + no config; use PathBuf::from.
expectwill abort the process in embeddable scenarios. Return an error instead, and follow the repo style for conversions.Apply this diff:
- let shard = Self { - _path: path.into(), - config: config.expect("config was provided or at least one segment was loaded"), - wal: parking_lot::Mutex::new(wal), - segments: Arc::new(parking_lot::RwLock::new(segments)), - }; + let shard = Self { + _path: PathBuf::from(path), + config: match config { + Some(c) => c, + None => return Err(service_error("no segments found and no config provided")), + }, + wal: Mutex::new(wal), + segments: Arc::new(parking_lot::RwLock::new(segments)), + };
216-219: Use saturating_add for per-segment limit to avoid usize overflow.Apply this diff:
- offset + limit, + offset.saturating_add(limit),
230-231: Use saturating_add for aggregator top-k.Apply this diff:
- let mut aggregator = BatchResultAggregator::new([offset + limit]); + let mut aggregator = BatchResultAggregator::new([offset.saturating_add(limit)]);
242-247: Return a service error if vector config is missing (avoid panic).Apply this diff:
- let distance = self - .config - .vector_data - .get(&vector_name) - .expect("vector config exist") - .distance; + let distance = self + .config + .vector_data + .get(&vector_name) + .ok_or_else(|| service_error(format!("vector '{vector_name}' not found in shard config")))? + .distance;
277-279: Avoid panic when offset > results length.
Vec::drain(..offset)panics ifoffset > len. Clamp the range.Apply this diff:
- let _ = points.drain(..offset); + let _ = points.drain(..offset.min(points.len()));
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/segment/src/types.rs(9 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/segment/src/types.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/segment/src/types.rs
🧠 Learnings (1)
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/segment/src/types.rs
🧬 Code graph analysis (1)
lib/segment/src/types.rs (3)
lib/edge/src/lib.rs (1)
config(138-140)lib/segment/src/entry/entry_point.rs (1)
config(258-258)lib/segment/src/segment/entry.rs (1)
config(609-611)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: test-consistency
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: e2e-tests
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/edge/src/lib.rs(1 hunks)lib/segment/src/types.rs(9 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/edge/src/lib.rslib/segment/src/types.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/edge/src/lib.rslib/segment/src/types.rs
🧠 Learnings (4)
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/edge/src/lib.rslib/segment/src/types.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/edge/src/lib.rs
📚 Learning: 2025-09-01T11:42:06.964Z
Learnt from: timvisee
PR: qdrant/qdrant#7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.
Applied to files:
lib/edge/src/lib.rs
📚 Learning: 2025-04-22T23:17:41.734Z
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
Applied to files:
lib/segment/src/types.rs
🧬 Code graph analysis (1)
lib/edge/src/lib.rs (5)
lib/segment/src/data_types/query_context.rs (1)
hardware_counter(192-194)lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
load_segment(687-764)lib/common/common/src/save_on_disk.rs (1)
load_or_init_default(38-43)lib/shard/src/segment_holder/mod.rs (1)
has_appendable_segment(273-275)lib/shard/src/update.rs (4)
process_point_operation(27-60)process_vector_operation(62-79)process_payload_operation(81-147)process_field_index_operation(149-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: test-consistency
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: lint
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: storage-compat-test
🔇 Additional comments (12)
lib/segment/src/types.rs (9)
334-340: Good addition: explicit distance-based ordering helper.
Distance::is_orderedcleanly encodes the ordering semantics.
1338-1458: SegmentConfig compatibility logic looks sound.Key sets must match; vector configs checked with per-type rules; sparse/payload allowed to differ. Fits shard-level constraints.
Confirm this strict “same vector name set” requirement matches intended behavior for mixed-index/plain segment shards.
1438-1458: Map compatibility helper is correct and minimal.
1521-1530: Clarify MultiVector comparator requirement.Currently requires exact comparator match. If future comparators are order-equivalent, we might allow broader compatibility.
1552-1617: VectorDataConfig compatibility: verify datatype equality constraint.You allow differing storage/index/quantization, but require equal
datatype. Validate this with maintainers; mixing datatypes across segments might be undesirable or okay depending on read path.
1619-1625: Option-compat fix LGTM.
(None, None) => trueresolves the false-negative on defaults.
1627-1639: Expanding derives for SparseVectorStorageType is fine.
1654-1693: SparseVectorDataConfig always-compatible: confirm intent.Always returning
trueskips cross-segment sparse config validation. If storage/index mismatches can affect search correctness/perf, consider a minimal check.
3418-3441: WithPayload conversions look good (by-value and by-ref).Matches prior suggestion and simplifies call sites.
lib/edge/src/lib.rs (3)
154-159: Don’t panic on missing config; prefer PathBuf::from and idiomatic Mutex.Return an error instead of
expect(...), usePathBuf::from(path), and use the importedMutex.- let shard = Self { - _path: path.into(), - config: config.expect("config was provided or at least one segment was loaded"), - wal: parking_lot::Mutex::new(wal), - segments: Arc::new(parking_lot::RwLock::new(segments)), - }; + let shard = Self { + _path: PathBuf::from(path), + config: match config { + Some(c) => c, + None => { + return Err(OperationError::service_error( + "no segments found and no config provided", + )) + } + }, + wal: Mutex::new(wal), + segments: Arc::new(parking_lot::RwLock::new(segments)), + };
242-242: Use saturating_add for offset + limit to avoid usize overflow.- offset + limit, + offset.saturating_add(limit),- let mut aggregator = BatchResultAggregator::new([offset + limit]); + let mut aggregator = BatchResultAggregator::new([offset.saturating_add(limit)]);Also applies to: 256-256
268-273: Avoid panic on missing vector config; return a service error.- let distance = self - .config - .vector_data - .get(&vector_name) - .expect("vector config exist") - .distance; + let distance = self + .config + .vector_data + .get(&vector_name) + .ok_or_else(|| { + OperationError::service_error(format!( + "vector '{}' not found in shard config", + vector_name + )) + })? + .distance;
68d7250 to
c87074c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
lib/segment/src/types.rs (1)
3437-3441: Avoid cloning in From<&WithPayloadInterface>.Tiny nit: match on
&WithPayloadInterfaceto avoid cloning the enum; clone only inner data as needed.Apply this diff:
impl From<&WithPayloadInterface> for WithPayload { fn from(interface: &WithPayloadInterface) -> Self { - WithPayload::from(interface.clone()) + match interface { + WithPayloadInterface::Bool(enable) => WithPayload { + enable: *enable, + payload_selector: None, + }, + WithPayloadInterface::Fields(fields) => WithPayload { + enable: true, + payload_selector: Some(PayloadSelector::new_include(fields.clone())), + }, + WithPayloadInterface::Selector(selector) => WithPayload { + enable: true, + payload_selector: Some(selector.clone()), + }, + } } }lib/edge/Cargo.toml (1)
1-7: Prevent accidental publish to crates.ioIf this crate is not intended for immediate publishing, set publish = false to avoid accidental releases.
[package] name = "edge" version = "0.1.0" authors = ["Qdrant Team <info@qdrant.tech>"] license = "Apache-2.0" edition = "2024" +publish = falselib/edge/examples/edge-cli.rs (1)
1-13: Use args_os to support non‑UTF‑8 paths; simplify arity handlingenv::args() can panic on non‑UTF‑8 input. args_os avoids that and better suits filesystem paths. Also, removing take(2) preserves full unexpected arguments in the error.
use std::env; use std::path::Path; fn main() -> anyhow::Result<()> { - let args: Vec<_> = env::args().skip(1).take(2).collect(); - - let [edge_shard_path] = args - .try_into() - .map_err(|args| anyhow::format_err!("unexpected arguments {args:?}"))?; + let args: Vec<_> = env::args_os().skip(1).collect(); + let [edge_shard_path] = args.try_into().map_err(|args| { + anyhow::format_err!("usage: edge-cli <shard-path>; unexpected arguments {args:?}") + })?; let _edge_shard = edge::Shard::load(Path::new(&edge_shard_path), None)?; Ok(()) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
Cargo.toml(1 hunks)lib/collection/src/collection_manager/mod.rs(0 hunks)lib/collection/src/collection_manager/segments_searcher.rs(1 hunks)lib/collection/src/shards/local_shard/mod.rs(2 hunks)lib/collection/src/wal_delta.rs(1 hunks)lib/edge/Cargo.toml(1 hunks)lib/edge/examples/edge-cli.rs(1 hunks)lib/edge/src/lib.rs(1 hunks)lib/segment/src/types.rs(9 hunks)lib/shard/src/lib.rs(1 hunks)lib/shard/src/search_result_aggregator.rs(3 hunks)lib/shard/src/wal.rs(2 hunks)src/wal_inspector.rs(1 hunks)
💤 Files with no reviewable changes (1)
- lib/collection/src/collection_manager/mod.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- Cargo.toml
- lib/shard/src/lib.rs
- lib/collection/src/wal_delta.rs
- lib/collection/src/shards/local_shard/mod.rs
- lib/edge/src/lib.rs
- src/wal_inspector.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/shard/src/wal.rslib/edge/examples/edge-cli.rslib/shard/src/search_result_aggregator.rslib/collection/src/collection_manager/segments_searcher.rslib/segment/src/types.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/shard/src/wal.rslib/shard/src/search_result_aggregator.rslib/collection/src/collection_manager/segments_searcher.rslib/segment/src/types.rs
🧠 Learnings (2)
📚 Learning: 2025-04-22T23:17:41.734Z
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
Applied to files:
lib/segment/src/types.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/segment/src/types.rs
🧬 Code graph analysis (2)
lib/edge/examples/edge-cli.rs (1)
lib/edge/src/lib.rs (1)
load(37-162)
lib/segment/src/types.rs (4)
lib/edge/src/lib.rs (1)
config(164-166)lib/segment/src/entry/entry_point.rs (1)
config(258-258)lib/segment/src/segment/entry.rs (1)
config(609-611)lib/segment/src/data_types/named_vectors.rs (3)
config(349-351)config(352-354)config(355-357)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: lint
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
🔇 Additional comments (14)
lib/shard/src/search_result_aggregator.rs (1)
48-52: Nice improvement on iterator ergonomics.Switching these APIs to accept
IntoIteratorgives the new edge shard and other call sites more flexibility without any downside since the internal loops consume the iterator immediately. Good call.Also applies to: 62-70, 82-92
lib/collection/src/collection_manager/segments_searcher.rs (1)
21-21: Import update lines up with the module move.The new path correctly reflects the aggregator living under
shard, and everything else stays untouched. Looks good.lib/shard/src/wal.rs (1)
30-35: Path-based constructor looks goodAccepting a
&Pathdirectly removes the string round-trips and keeps the WAL state file handling consistent withWal::with_options.lib/segment/src/types.rs (10)
334-340: LGTM on distance ordering API.Consistent with
distance_order. Please confirm callers expect ties to be treated as ordered (>=/<=).
1338-1349: Derives on SegmentConfig look good.Adding
PartialEqand reordering derives is harmless and useful.
1438-1458: Helper is_map_compatible is sound.Exact key-set match + per-entry check is clear and efficient.
1521-1530: Decide and document multivector compatibility semantics.Currently requires identical
comparator. If that’s intentional, drop the TODO and add a brief doc; otherwise, adjust before this API is relied upon.
1552-1571: Derives on VectorDataConfig are fine.
PartialEqis appropriate here.
1591-1617: VectorDataConfig::is_compatible: confirm datatype and multivector requirements.You enforce
size,distance, anddatatypeequality and delegate multivector viais_opt_compatible. Ifdatatypeor multivector are allowed to differ in practice, relax accordingly; otherwise, remove the TODO.
1619-1625: Option-compatibility default corrected.
(None, None) => truefixes false incompatibilities for defaults. Thanks for addressing.
1627-1630: Derives on SparseVectorStorageType look good.Adding
Eq/PartialEqis appropriate.
1682-1692: SparseVectorDataConfig::is_compatible always true.Intentional permissiveness? If so, document it; otherwise, consider checking fields you deem invariant before we rely on this in merges.
3418-3435: WithPayload From implementation is correct.Good use of explicit
WithPayload::from(...)instead of.into(). As per coding guidelines.lib/edge/Cargo.toml (1)
9-21: Manifest looks goodDependencies and dev-dependencies align with the example and workspace usage.
| pub fn is_compatible(&self, other: &Self) -> bool { | ||
| // Vector data have to be compatible between two segments. | ||
| // Sparse vector data can be different, but a placeholder check is implemented to catch | ||
| // and enforce compatibility check for future changes. | ||
| // Payload storage type can be different. | ||
|
|
||
| // Assert segment config fields | ||
| let Self { | ||
| vector_data: _, | ||
| sparse_vector_data: _, | ||
| payload_storage_type: _, | ||
| } = self; | ||
|
|
||
| let is_vector_config_compatible = is_map_compatible( | ||
| &self.vector_data, | ||
| &other.vector_data, | ||
| VectorDataConfig::is_compatible, | ||
| ); | ||
|
|
||
| let is_sparse_vector_config_compatible = is_map_compatible( | ||
| &self.sparse_vector_data, | ||
| &other.sparse_vector_data, | ||
| SparseVectorDataConfig::is_compatible, | ||
| ); | ||
|
|
||
| is_vector_config_compatible && is_sparse_vector_config_compatible | ||
| } | ||
| } |
There was a problem hiding this comment.
Clarify sparse-vector compatibility comment vs implementation.
Code requires identical key sets for both vector_data and sparse_vector_data (via is_map_compatible), yet the comment says “Sparse vector data can be different”. Either update the comment to match current behavior or relax the check to allow differing sparse maps.
Apply this diff to align the comment with behavior:
- // Sparse vector data can be different, but a placeholder check is implemented to catch
- // and enforce compatibility check for future changes.
+ // Sparse vector data currently also require identical key sets; value compatibility is a placeholder
+ // that always returns true. We may relax the key-set requirement in the future.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn is_compatible(&self, other: &Self) -> bool { | |
| // Vector data have to be compatible between two segments. | |
| // Sparse vector data can be different, but a placeholder check is implemented to catch | |
| // and enforce compatibility check for future changes. | |
| // Payload storage type can be different. | |
| // Assert segment config fields | |
| let Self { | |
| vector_data: _, | |
| sparse_vector_data: _, | |
| payload_storage_type: _, | |
| } = self; | |
| let is_vector_config_compatible = is_map_compatible( | |
| &self.vector_data, | |
| &other.vector_data, | |
| VectorDataConfig::is_compatible, | |
| ); | |
| let is_sparse_vector_config_compatible = is_map_compatible( | |
| &self.sparse_vector_data, | |
| &other.sparse_vector_data, | |
| SparseVectorDataConfig::is_compatible, | |
| ); | |
| is_vector_config_compatible && is_sparse_vector_config_compatible | |
| } | |
| } | |
| pub fn is_compatible(&self, other: &Self) -> bool { | |
| // Vector data have to be compatible between two segments. | |
| // Sparse vector data currently also require identical key sets; value compatibility is a placeholder | |
| // that always returns true. We may relax the key-set requirement in the future. | |
| // Payload storage type can be different. | |
| // Assert segment config fields | |
| let Self { | |
| vector_data: _, | |
| sparse_vector_data: _, | |
| payload_storage_type: _, | |
| } = self; | |
| let is_vector_config_compatible = is_map_compatible( | |
| &self.vector_data, | |
| &other.vector_data, | |
| VectorDataConfig::is_compatible, | |
| ); | |
| let is_sparse_vector_config_compatible = is_map_compatible( | |
| &self.sparse_vector_data, | |
| &other.sparse_vector_data, | |
| SparseVectorDataConfig::is_compatible, | |
| ); | |
| is_vector_config_compatible && is_sparse_vector_config_compatible | |
| } | |
| } |
🤖 Prompt for AI Agents
In lib/segment/src/types.rs around lines 1409 to 1436, the comment claims
“Sparse vector data can be different” but the code uses is_map_compatible which
requires identical key sets; update the comment to reflect the actual behavior
(sparse_vector_data keys must be compatible/identical between segments) or, if
you prefer to allow differing sparse maps, replace the is_map_compatible call
for sparse_vector_data with a permissive comparison that only checks
compatibility for keys present in both maps (e.g., iterate the intersection and
call SparseVectorDataConfig::is_compatible) and adjust tests accordingly.
- Create WAL/segments directories (if don't exist) when loading edge shard - Fix panic when handling offset during search - Add trailing commas to `format` macro calls
Fix a typo 🥲
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/edge/src/lib.rs (3)
40-47: Create directories with create_dir_all and drop existence check.Avoid TOCTOU and support missing parents.
- if !wal_path.exists() { - fs::create_dir(&wal_path).map_err(|err| { - OperationError::service_error(format!( - "failed to create WAL directory {}: {err}", - wal_path.display(), - )) - })?; - } + fs::create_dir_all(&wal_path).map_err(|err| { + OperationError::service_error(format!( + "failed to create/open WAL directory {}: {err}", + wal_path.display(), + )) + })?;
59-66: Same here for segments path: use create_dir_all.Handles fresh paths and missing parents.
- if !segments_path.exists() { - fs::create_dir(&segments_path).map_err(|err| { - OperationError::service_error(format!( - "failed to create segments directory {}: {err}", - segments_path.display(), - )) - })?; - } + fs::create_dir_all(&segments_path).map_err(|err| { + OperationError::service_error(format!( + "failed to create/open segments directory {}: {err}", + segments_path.display(), + )) + })?;
36-38: Consider exposing a stoppable flag for segment loading.Plumb a &AtomicBool into load() and pass it to load_segment to allow SIGINT/cancel support.
Would you like a follow-up patch to add
pub fn load(path: &Path, config: Option<SegmentConfig>, stopped: &AtomicBool)and update call sites?Also applies to: 107-112
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/edge/src/lib.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/edge/src/lib.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/edge/src/lib.rs
🧠 Learnings (4)
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/edge/src/lib.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/edge/src/lib.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.
Applied to files:
lib/edge/src/lib.rs
📚 Learning: 2025-09-01T11:42:06.964Z
Learnt from: timvisee
PR: qdrant/qdrant#7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.
Applied to files:
lib/edge/src/lib.rs
🧬 Code graph analysis (1)
lib/edge/src/lib.rs (5)
lib/segment/src/data_types/query_context.rs (1)
hardware_counter(192-194)lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
load_segment(687-764)lib/common/common/src/save_on_disk.rs (1)
load_or_init_default(38-43)lib/shard/src/segment_holder/mod.rs (1)
has_appendable_segment(255-257)lib/shard/src/update.rs (4)
process_point_operation(27-60)process_vector_operation(62-79)process_payload_operation(81-147)process_field_index_operation(149-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: integration-tests-consensus
- GitHub Check: lint
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: storage-compat-test
🔇 Additional comments (5)
lib/edge/src/lib.rs (5)
188-205: Release WAL lock immediately after write.Reduces contention for concurrent writers.
- pub fn update(&self, operation: CollectionUpdateOperations) -> OperationResult<()> { - let mut wal = self.wal.lock(); - - let operation_id = wal.write(&operation).map_err(service_error)?; + pub fn update(&self, operation: CollectionUpdateOperations) -> OperationResult<()> { + let operation_id = { + let mut wal = self.wal.lock(); + wal.write(&operation).map_err(service_error)? + };
255-265: Use saturating_add for per-segment top-k (offset + limit).Prevents usize wrap on large user inputs.
- offset + limit, + offset.saturating_add(limit),
276-277: Same overflow guard for aggregator capacity.- let mut aggregator = BatchResultAggregator::new([offset + limit]); + let mut aggregator = BatchResultAggregator::new([offset.saturating_add(limit)]);
174-179: Don’t panic on missing config; also prefer PathBuf::from over into.Return a service error instead of expect, and follow style guide.
- let shard = Self { - _path: path.into(), - config: config.expect("config was provided or at least one segment was loaded"), - wal: parking_lot::Mutex::new(wal), - segments: Arc::new(parking_lot::RwLock::new(segments)), - }; + let shard = Self { + _path: PathBuf::from(path), // As per coding guidelines + config: match config { + Some(c) => c, + None => return Err(service_error("no segments found and no config provided")), + }, + wal: Mutex::new(wal), + segments: Arc::new(parking_lot::RwLock::new(segments)), + };
288-293: Avoid panic on unknown vector name; return a service error.User input can reference a missing vector.
- let distance = self - .config - .vector_data - .get(&vector_name) - .expect("vector config exist") - .distance; + let distance = self + .config + .vector_data + .get(&vector_name) + .ok_or_else(|| service_error(format!("vector '{vector_name}' not found in shard config")))? + .distance;
Adds
edgecrate with a simplifiedLocalShardre-implementation, that could be used as "embeddable" (e.g., SQLite) version of Qdrant. Only implements update and search, but more stuff would be added in follow-up PRs.All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --all --all-featurescommand?Changes to Core Features: