Handle operations when having no shards with custom sharding#7856
Handle operations when having no shards with custom sharding#7856
Conversation
📝 WalkthroughWalkthroughReplaces Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.rs📄 CodeRabbit inference engine (.github/review-rules.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/storage/src/content_manager/toc/point_ops.rs (1)
558-560: Consider making the error message more actionable.The error message is clear about the problem, but could be enhanced to guide users toward resolution.
💡 Optional enhancement for error message
return Err(CollectionError::bad_input( - "No shard keys exist to apply operation in custom sharding", + "Cannot apply operation: collection uses custom sharding but has no shard keys. Create at least one shard key first.", ));
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/storage/src/content_manager/toc/point_ops.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicitSomeType::from(x)over implicitx.into()in Rust
In new code, don't usetransmute_from_u8,transmute_to_u8,transmute_from_u8_to_slice,transmute_from_u8_to_mut_slice,transmute_to_u8_slice- usebytemuckorzerocopycrates instead
Prefer explicit exhaustive matches over catch-all_arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using: _over using..in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)
Files:
lib/storage/src/content_manager/toc/point_ops.rs
🧠 Learnings (1)
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
Applied to files:
lib/storage/src/content_manager/toc/point_ops.rs
🧬 Code graph analysis (1)
lib/storage/src/content_manager/toc/point_ops.rs (1)
lib/collection/src/operations/types.rs (1)
bad_input(970-974)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: lint
- GitHub Check: test-consistency
- GitHub Check: e2e-tests
- GitHub Check: test-consensus-compose
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
🔇 Additional comments (2)
lib/storage/src/content_manager/toc/point_ops.rs (2)
7-7: LGTM!The import is necessary for the new custom sharding validation logic.
554-561: Correctly prevents operations on custom sharding with no shard keys.The logic properly addresses the PR objective by returning a clear error when custom sharding is configured but no shard keys exist. The fallback to the existing flow for non-custom sharding (e.g., auto-sharding) with empty keys is appropriate. The
get_sharding_method_and_keys()method exists in the collection module and returns the expected(ShardingMethod, Vec<ShardKey>)tuple as used here.
generall
left a comment
There was a problem hiding this comment.
Why do we want to reject index creation without shard present?
As far as I understand, we can create payload index if shard_key s1 exist, and then create shard_key s2 on creation will receive the same payload index automatically.
I would expect the same behavior without s1
| // Duplicates the information from `key_mapping` for faster access, does not use locking | ||
| shard_id_to_key_mapping: AHashMap<ShardId, ShardKey>, | ||
| sharding_method: ShardingMethod, | ||
| pub(crate) sharding_method: ShardingMethod, |
There was a problem hiding this comment.
please avoid making fields public and use getter instead
| let (sharding_method, shard_keys) = collection.get_sharding_method_and_keys().await; | ||
|
|
||
| if shard_keys.is_empty() { | ||
| if sharding_method == ShardingMethod::Custom { |
There was a problem hiding this comment.
please prefer full match in cases like this
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/collection/src/collection/mod.rs (1)
357-368: LGTM! API updated to provide both sharding method and keys.The method correctly returns both the sharding method and shard keys as a tuple, enabling callers to determine the sharding configuration and available keys in a single call.
Optional: Consider removing clone if ShardingMethod is Copy
If
ShardingMethodimplementsCopy, the explicit.clone()on line 360 is unnecessary:- let sharding_method = shards_holder.get_sharding_method().clone(); + let sharding_method = *shards_holder.get_sharding_method();This is a minor optimization and only applicable if
ShardingMethodis a simple enum implementingCopy.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lib/collection/src/collection/collection_ops.rslib/collection/src/collection/mod.rslib/collection/src/operations/types.rslib/collection/src/shards/shard_holder/mod.rslib/storage/src/content_manager/toc/point_ops.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/storage/src/content_manager/toc/point_ops.rs
- lib/collection/src/shards/shard_holder/mod.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicitSomeType::from(x)over implicitx.into()in Rust
In new code, don't usetransmute_from_u8,transmute_to_u8,transmute_from_u8_to_slice,transmute_from_u8_to_mut_slice,transmute_to_u8_slice- usebytemuckorzerocopycrates instead
Prefer explicit exhaustive matches over catch-all_arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using: _over using..in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)
Files:
lib/collection/src/collection/collection_ops.rslib/collection/src/operations/types.rslib/collection/src/collection/mod.rs
🧬 Code graph analysis (2)
lib/collection/src/collection/collection_ops.rs (1)
lib/collection/src/operations/types.rs (2)
empty(208-226)empty(1573-1575)
lib/collection/src/operations/types.rs (2)
lib/segment/src/index/struct_payload_index.rs (1)
payload_schema(223-247)lib/segment/src/types.rs (6)
new(426-439)new(1804-1807)new(3185-3187)new(3468-3470)new(3481-3483)new(3600-3607)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: lint
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests
- GitHub Check: test-consistency
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: Test Python bindings
🔇 Additional comments (2)
lib/collection/src/collection/collection_ops.rs (1)
330-334: LGTM! Payload schema now populated when no shards exist.The updated call to
CollectionInfo::emptycorrectly passes the payload schema, ensuring it's available in the collection info even when there are no shard responses. This aligns with the PR objective to return payload schema regardless of shard availability.lib/collection/src/operations/types.rs (1)
207-226: LGTM! Constructor properly initializes payload schema for empty collections.The updated
emptyconstructor correctly accepts aPayloadIndexSchemaand initializes thepayload_schemafield by mapping each schema entry toPayloadIndexInfowith 0 points. This ensures payload schema metadata is available even for collections with no data.
Yes it happens but payload index isn't shown in API response. Now that I think about it, we can just return acknowledge here and show payload index from collection if no shards are present. Did these changes. |
* Handle operations when having no shards with custom sharding * Return error instead of acknowledge * fmt * clippy * shorter msg * Return collection payload schema if exists even without any shards * minor clippy fix
* Handle operations when having no shards with custom sharding * Return error instead of acknowledge * fmt * clippy * shorter msg * Return collection payload schema if exists even without any shards * minor clippy fix
When you have custom sharding but no shard keys in the collection, requests return error saying
Shard key not specifiedwhich is wrong.Also, this operation was already updating collection's
payload_index.jsonbut it isn't returned in theGET /collections/benchmarkAPI'spayload_schemafield. So I've fixed that too.