Skip to content

Handle operations when having no shards with custom sharding#7856

Merged
KShivendu merged 7 commits intodevfrom
custom-sharding-index
Jan 6, 2026
Merged

Handle operations when having no shards with custom sharding#7856
KShivendu merged 7 commits intodevfrom
custom-sharding-index

Conversation

@KShivendu
Copy link
Member

@KShivendu KShivendu commented Jan 5, 2026

When you have custom sharding but no shard keys in the collection, requests return error saying Shard key not specified which is wrong.

# Request:
PUT /collections/demo/index
{
  "field_name": "group_id",
  "field_schema": {
    "type": "keyword",
    "is_tenant": true
  }
}

# Before:
{
  "error": "Wrong input: Shard key not specified"
}

# After:
{
  "result": {
    "status": "acknowledged"
  },
  "status": "ok",
  "time": 0.019617464
}

Also, this operation was already updating collection's payload_index.json but it isn't returned in the GET /collections/benchmark API's payload_schema field. So I've fixed that too.

# Request:
GET /collection/demo

# Before:
{
  "result": {
    // ...
    "payload_schema": {},
  "status": "ok",
  "time": 0.000223741
}

# After:
{
  "result": {
    // ...
    "payload_schema": {
      "group_id": {
        "data_type": "keyword",
        "params": {
          "type": "keyword",
          "is_tenant": true
        },
        "points": 0
      }
    }
  },
  "status": "ok",
  "time": 0.000223741
}

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

Replaces get_shard_keys() with get_sharding_method_and_keys() returning (ShardingMethod, Vec<ShardKey>). Adds pub fn get_sharding_method(&self) -> ShardingMethod on ShardHolder. Updates TableOfContent point-ops to call the new API and, for ShardSelector::All, return an acknowledged/bad-input result when sharding_method is Custom and the returned shard keys are empty. Updates CollectionInfo::empty to accept a PayloadIndexSchema and initialize payload index info from it.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • timvisee
  • generall

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: handling operations when having no shards with custom sharding, which directly matches the PR's core objective.
Description check ✅ Passed The description is well-related to the changeset, providing concrete examples of the bug (incorrect error for payload index operations) and the fix (returning acknowledged status and exposing payload_schema).
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68bb4c3 and 55b930b.

📒 Files selected for processing (2)
  • lib/collection/src/collection/mod.rs
  • lib/collection/src/shards/shard_holder/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/collection/src/collection/mod.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

  • lib/collection/src/shards/shard_holder/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: e2e-tests
  • GitHub Check: integration-tests-consensus
  • GitHub Check: Test Python bindings
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: lint
🔇 Additional comments (1)
lib/collection/src/shards/shard_holder/mod.rs (1)

157-159: LGTM! Clean getter implementation.

The getter appropriately exposes the sharding method via a public accessor instead of making the field public, which aligns with the previous review feedback. Returning ShardingMethod by value is suitable for what appears to be a Copy enum type.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/storage/src/content_manager/toc/point_ops.rs (1)

558-560: Consider making the error message more actionable.

The error message is clear about the problem, but could be enhanced to guide users toward resolution.

💡 Optional enhancement for error message
                     return Err(CollectionError::bad_input(
-                        "No shard keys exist to apply operation in custom sharding",
+                        "Cannot apply operation: collection uses custom sharding but has no shard keys. Create at least one shard key first.",
                     ));
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29feba9 and 467931f.

📒 Files selected for processing (1)
  • lib/storage/src/content_manager/toc/point_ops.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

  • lib/storage/src/content_manager/toc/point_ops.rs
🧠 Learnings (1)
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

  • lib/storage/src/content_manager/toc/point_ops.rs
🧬 Code graph analysis (1)
lib/storage/src/content_manager/toc/point_ops.rs (1)
lib/collection/src/operations/types.rs (1)
  • bad_input (970-974)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: lint
  • GitHub Check: test-consistency
  • GitHub Check: e2e-tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: integration-tests-consensus
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
🔇 Additional comments (2)
lib/storage/src/content_manager/toc/point_ops.rs (2)

7-7: LGTM!

The import is necessary for the new custom sharding validation logic.


554-561: Correctly prevents operations on custom sharding with no shard keys.

The logic properly addresses the PR objective by returning a clear error when custom sharding is configured but no shard keys exist. The fallback to the existing flow for non-custom sharding (e.g., auto-sharding) with empty keys is appropriate. The get_sharding_method_and_keys() method exists in the collection module and returns the expected (ShardingMethod, Vec<ShardKey>) tuple as used here.

Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to reject index creation without shard present?

As far as I understand, we can create payload index if shard_key s1 exist, and then create shard_key s2 on creation will receive the same payload index automatically.

I would expect the same behavior without s1

// Duplicates the information from `key_mapping` for faster access, does not use locking
shard_id_to_key_mapping: AHashMap<ShardId, ShardKey>,
sharding_method: ShardingMethod,
pub(crate) sharding_method: ShardingMethod,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid making fields public and use getter instead

let (sharding_method, shard_keys) = collection.get_sharding_method_and_keys().await;

if shard_keys.is_empty() {
if sharding_method == ShardingMethod::Custom {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please prefer full match in cases like this

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/collection/src/collection/mod.rs (1)

357-368: LGTM! API updated to provide both sharding method and keys.

The method correctly returns both the sharding method and shard keys as a tuple, enabling callers to determine the sharding configuration and available keys in a single call.

Optional: Consider removing clone if ShardingMethod is Copy

If ShardingMethod implements Copy, the explicit .clone() on line 360 is unnecessary:

-    let sharding_method = shards_holder.get_sharding_method().clone();
+    let sharding_method = *shards_holder.get_sharding_method();

This is a minor optimization and only applicable if ShardingMethod is a simple enum implementing Copy.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d65541f and 68bb4c3.

📒 Files selected for processing (5)
  • lib/collection/src/collection/collection_ops.rs
  • lib/collection/src/collection/mod.rs
  • lib/collection/src/operations/types.rs
  • lib/collection/src/shards/shard_holder/mod.rs
  • lib/storage/src/content_manager/toc/point_ops.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/storage/src/content_manager/toc/point_ops.rs
  • lib/collection/src/shards/shard_holder/mod.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

  • lib/collection/src/collection/collection_ops.rs
  • lib/collection/src/operations/types.rs
  • lib/collection/src/collection/mod.rs
🧬 Code graph analysis (2)
lib/collection/src/collection/collection_ops.rs (1)
lib/collection/src/operations/types.rs (2)
  • empty (208-226)
  • empty (1573-1575)
lib/collection/src/operations/types.rs (2)
lib/segment/src/index/struct_payload_index.rs (1)
  • payload_schema (223-247)
lib/segment/src/types.rs (6)
  • new (426-439)
  • new (1804-1807)
  • new (3185-3187)
  • new (3468-3470)
  • new (3481-3483)
  • new (3600-3607)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: lint
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: integration-tests
  • GitHub Check: test-consistency
  • GitHub Check: e2e-tests
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: Test Python bindings
🔇 Additional comments (2)
lib/collection/src/collection/collection_ops.rs (1)

330-334: LGTM! Payload schema now populated when no shards exist.

The updated call to CollectionInfo::empty correctly passes the payload schema, ensuring it's available in the collection info even when there are no shard responses. This aligns with the PR objective to return payload schema regardless of shard availability.

lib/collection/src/operations/types.rs (1)

207-226: LGTM! Constructor properly initializes payload schema for empty collections.

The updated empty constructor correctly accepts a PayloadIndexSchema and initializes the payload_schema field by mapping each schema entry to PayloadIndexInfo with 0 points. This ensures payload schema metadata is available even for collections with no data.

@KShivendu
Copy link
Member Author

KShivendu commented Jan 5, 2026

I would expect the same behavior without s1

Yes it happens but payload index isn't shown in API response. Now that I think about it, we can just return acknowledge here and show payload index from collection if no shards are present. Did these changes.

@KShivendu KShivendu merged commit 73eab2c into dev Jan 6, 2026
15 checks passed
@KShivendu KShivendu deleted the custom-sharding-index branch January 6, 2026 05:07
coszio pushed a commit that referenced this pull request Jan 8, 2026
* Handle operations when having no shards with custom sharding

* Return error instead of acknowledge

* fmt

* clippy

* shorter msg

* Return collection payload schema if exists even without any shards

* minor clippy fix
generall pushed a commit that referenced this pull request Feb 9, 2026
* Handle operations when having no shards with custom sharding

* Return error instead of acknowledge

* fmt

* clippy

* shorter msg

* Return collection payload schema if exists even without any shards

* minor clippy fix
@timvisee timvisee mentioned this pull request Feb 17, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants