Conversation
892735f to
f4288d7
Compare
1d650d0 to
210fc80
Compare
51f9bf3 to
c97e534
Compare
9dd2e19 to
97fea54
Compare
7001789 to
a580671
Compare
97fea54 to
86bb3bc
Compare
timvisee
left a comment
There was a problem hiding this comment.
See the review remark below, I think that description is a bit more on-point:
Other than that, all good!
2f25d7d to
d382289
Compare
04ac942 to
ea878bc
Compare
6a6386c to
e7cbba6
Compare
Co-authored-by: Tim Visée <tim+github@visee.me>
ea878bc to
9f39ddf
Compare
📝 WalkthroughWalkthroughThe change adds global dead shard tracking to the metrics system in Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/common/metrics.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
src/common/metrics.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
src/common/metrics.rs
🧬 Code graph analysis (1)
src/common/metrics.rs (2)
lib/collection/src/collection/mod.rs (1)
state(542-562)src/actix/api/service_api.rs (1)
metrics(67-96)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: lint
- GitHub Check: e2e-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consensus-compose
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: integration-tests
🔇 Additional comments (2)
src/common/metrics.rs (2)
197-197: LGTM - Variable initialization is correct.The accumulator is properly initialized and placed alongside other metric accumulators.
355-360: Metric definition structure is correct.The metric family definition is properly structured with an appropriate GAUGE type and clear description. However, ensure the calculation logic at lines 297-302 is fixed to match this description.
| total_dead_shards += collection | ||
| .shards | ||
| .iter() | ||
| .flatten() | ||
| .filter(|i| i.replicate_states.values().any(|state| !state.is_active())) | ||
| .count(); |
There was a problem hiding this comment.
Critical: Code counts shards, not replicas as described.
The current implementation counts shards that have at least one non-active replica, but the metric description at line 357 states "total amount of shard replicas in non-active state". This mismatch will produce incorrect metric values.
For example, if a shard has 3 replicas with 2 inactive, this code counts 1 (one problematic shard) rather than 2 (two inactive replicas).
Apply this diff to count actual non-active replicas:
- total_dead_shards += collection
- .shards
- .iter()
- .flatten()
- .filter(|i| i.replicate_states.values().any(|state| !state.is_active()))
- .count();
+ total_dead_shards += collection
+ .shards
+ .iter()
+ .flatten()
+ .flat_map(|shard| shard.replicate_states.values())
+ .filter(|state| !state.is_active())
+ .count();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| total_dead_shards += collection | |
| .shards | |
| .iter() | |
| .flatten() | |
| .filter(|i| i.replicate_states.values().any(|state| !state.is_active())) | |
| .count(); | |
| total_dead_shards += collection | |
| .shards | |
| .iter() | |
| .flatten() | |
| .flat_map(|shard| shard.replicate_states.values()) | |
| .filter(|state| !state.is_active()) | |
| .count(); |
🤖 Prompt for AI Agents
In src/common/metrics.rs around lines 297 to 302, the code currently counts
shards that contain any non-active replica instead of counting the actual number
of non-active replicas; replace the shard-level filter/count with logic that
iterates the shard replica states and sums each replica where state.is_active()
is false (i.e., count all non-active replicate_states values across all shards
and collections) so the metric reflects the total number of non-active replicas
rather than the number of affected shards.
* Dead shards in /metrics * Update src/common/metrics.rs Co-authored-by: Tim Visée <tim+github@visee.me> --------- Co-authored-by: Tim Visée <tim+github@visee.me>
Depends on #7307
Adds dead-shards metrics to /metrics API