Conversation
4628518 to
27040c4
Compare
8af2106 to
61bb4f2
Compare
📝 WalkthroughWalkthroughThis change adds shard activity checking functionality across the replica set and shard holder modules. A new public method Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/collection/src/shards/replica_set/mod.rs (1)
1083-1093: Shard-wide activity check is correct; could reuse existing helpersThe implementation correctly reports the shard as active if any peer is in an
is_activestate, which is exactly what you need to distinguish “all Partial/Dead” shards in custom sharding. Note that this usesReplicaState::is_active, so shards that are onlyActiveReadwill be treated as inactive, which aligns with howActiveReadis treated as unhealthy in other health paths. Based on learnings, this is desirable.For consistency with the rest of the codebase you could optionally implement this in terms of existing helpers, e.g.:
pub fn shard_is_active(&self) -> bool { let state = self.replica_state.read(); !state.active_peers().is_empty() // or: !self.active_shards(false).is_empty() }lib/collection/src/shards/shard_holder/mod.rs (1)
588-591: Custom-sharding-only filtering of inactive shards inselect_shards(All)looks rightThis
is_custom_shardingflag plus the!shard.shard_is_active()guard cleanly restricts the “skip inactive shards” behavior to custom sharding, leaving auto sharding unchanged and keeping the existing resharding-up guard intact. That matches the intent to ignore shard keys that currently have no active replicas without masking unexpected full-shard outages in auto mode.As a tiny style nit, you could write the flag as
let is_custom_sharding = matches!(self.sharding_method, ShardingMethod::Custom);but the current match is perfectly fine.Also applies to: 606-613
tests/consensus_tests/test_tenant_promotion.py (1)
29-42:scroll_pointshelper correctly exercises the scroll APIThe helper is straightforward and matches the existing HTTP-test style: it hits
/points/scroll, asserts HTTP OK viaassert_http_ok, and returns the points, withshard_keypassed through so you can test both global and keyed reads. That should be enough to surface the regression around shards with no active replicas.If you ever want Ruff F405 to be fully quiet in this file, you could replace the star imports with explicit ones (e.g.,
import requestsandfrom .assertions import assert_http_ok/from .utils import get_collection_info), but that’s purely a linting concern and consistent with other consensus tests.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/collection/src/shards/replica_set/mod.rs(1 hunks)lib/collection/src/shards/shard_holder/mod.rs(2 hunks)tests/consensus_tests/test_tenant_promotion.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicitSomeType::from(x)over implicitx.into()in Rust
In new code, don't usetransmute_from_u8,transmute_to_u8,transmute_from_u8_to_slice,transmute_from_u8_to_mut_slice,transmute_to_u8_slice- usebytemuckorzerocopycrates instead
Prefer explicit exhaustive matches over catch-all_arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using: _over using..in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)
Files:
lib/collection/src/shards/shard_holder/mod.rslib/collection/src/shards/replica_set/mod.rs
🧠 Learnings (2)
📓 Common learnings
Learnt from: generall
Repo: qdrant/qdrant PR: 7474
File: src/common/health.rs:328-330
Timestamp: 2025-11-10T14:50:02.722Z
Learning: In Qdrant's `ReplicaState`, the `ActiveRead` variant is intentionally considered unhealthy (returns `false` from `is_healthy()`) because it requires a shard transfer operation to finish. This is the correct behavior for health checks.
📚 Learning: 2025-11-10T14:50:02.722Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7474
File: src/common/health.rs:328-330
Timestamp: 2025-11-10T14:50:02.722Z
Learning: In Qdrant's `ReplicaState`, the `ActiveRead` variant is intentionally considered unhealthy (returns `false` from `is_healthy()`) because it requires a shard transfer operation to finish. This is the correct behavior for health checks.
Applied to files:
lib/collection/src/shards/replica_set/mod.rs
🧬 Code graph analysis (1)
tests/consensus_tests/test_tenant_promotion.py (2)
tests/consensus_tests/assertions.py (1)
assert_http_ok(6-13)tests/consensus_tests/utils.py (1)
get_collection_info(314-318)
🪛 Ruff (0.14.8)
tests/consensus_tests/test_tenant_promotion.py
30-30: requests may be undefined, or defined from star imports
(F405)
39-39: assert_http_ok may be undefined, or defined from star imports
(F405)
140-140: get_collection_info may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: Build Qdrant Edge Python bindings
- GitHub Check: e2e-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: lint
- GitHub Check: test-consensus-compose
- GitHub Check: test-consistency
- GitHub Check: integration-tests
🔇 Additional comments (1)
tests/consensus_tests/test_tenant_promotion.py (1)
136-141: New global scroll and collection-info checks cover the intended regressionCalling
scroll_points(..., shard_key=None, ...)followed byget_collection_info(...)right after creating a Partial tenant shard ensures both the global scroll path and collection info endpoint stay available in the presence of shard keys without active replicas, which directly validates the behavior introduced in this PR.
KShivendu
left a comment
There was a problem hiding this comment.
LGTM and working as expected. Thanks!
Motivation
For tiered multitenancy we recommend to create a
shard_keywithPatrialstate.It works, however, while the patrial shard key exist, the collection info API stops being available, because one of the shard ids have no active replicas.
Solution
This PR introduces an exception logic for the case of custom sharding and unavalable shard.