Dead shards in /metrics by JojiiOfficial · Pull Request #7310 · qdrant/qdrant

JojiiOfficial · 2025-09-25T10:44:01Z

Depends on #7307

Adds dead-shards metrics to /metrics API

lib/collection/src/shards/replica_set/mod.rs

timvisee

See the review remark below, I think that description is a bit more on-point:

Other than that, all good!

src/common/metrics.rs

Co-authored-by: Tim Visée <tim+github@visee.me>

coderabbitai · 2025-10-29T15:48:50Z

📝 Walkthrough

Walkthrough

The change adds global dead shard tracking to the metrics system in src/common/metrics.rs. A new total_dead_shards accumulator is introduced and incremented while iterating through collections. A new gauge metric dead_shards_total is populated with this accumulated value, providing a global view of non-active shard replicas across all collections alongside existing per-collection metrics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Single file modification with straightforward metric addition
Accumulator pattern is likely consistent with existing code style
No public API changes or complex logic density

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "Dead shards in /metrics" is directly and specifically related to the main change in the pull request. The changeset adds a new metric "dead_shards_total" to track dead shards across collections in the /metrics API, which is exactly what the title communicates. The title is concise, clear, and specific enough for a developer scanning history to understand the primary change without being overly verbose.
Description Check	✅ Passed	The pull request description states "Adds dead-shards metrics to /metrics API" which is directly related to the changeset that introduces a new "dead_shards_total" metric for tracking total dead shards across collections. The description also appropriately notes the dependency on PR #7307. The description is clearly on-topic and directly addresses what is being changed in this pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch metrics-dead-shards

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c890d3a and 9f39ddf.

📒 Files selected for processing (1)

src/common/metrics.rs (4 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

src/common/metrics.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

src/common/metrics.rs

🧬 Code graph analysis (1)

src/common/metrics.rs (2)

lib/collection/src/collection/mod.rs (1)

state (542-562)

src/actix/api/service_api.rs (1)

metrics (67-96)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: lint
GitHub Check: e2e-tests
GitHub Check: integration-tests-consensus
GitHub Check: test-consensus-compose
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consistency
GitHub Check: integration-tests

🔇 Additional comments (2)

src/common/metrics.rs (2)

197-197: LGTM - Variable initialization is correct.

The accumulator is properly initialized and placed alongside other metric accumulators.

355-360: Metric definition structure is correct.

The metric family definition is properly structured with an appropriate GAUGE type and clear description. However, ensure the calculation logic at lines 297-302 is fixed to match this description.

coderabbitai · 2025-10-29T15:48:53Z

src/common/metrics.rs

+            total_dead_shards += collection
+                .shards
+                .iter()
+                .flatten()
+                .filter(|i| i.replicate_states.values().any(|state| !state.is_active()))
+                .count();


⚠️ Potential issue | 🔴 Critical

Critical: Code counts shards, not replicas as described.

The current implementation counts shards that have at least one non-active replica, but the metric description at line 357 states "total amount of shard replicas in non-active state". This mismatch will produce incorrect metric values.

For example, if a shard has 3 replicas with 2 inactive, this code counts 1 (one problematic shard) rather than 2 (two inactive replicas).

Apply this diff to count actual non-active replicas:

- total_dead_shards += collection - .shards - .iter() - .flatten() - .filter(|i| i.replicate_states.values().any(|state| !state.is_active())) - .count(); + total_dead_shards += collection + .shards + .iter() + .flatten() + .flat_map(|shard| shard.replicate_states.values()) + .filter(|state| !state.is_active()) + .count();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

total_dead_shards += collection

.shards

.iter()

.flatten()

.filter(|i| i.replicate_states.values().any(|state| !state.is_active()))

.count();

total_dead_shards += collection

.shards

.iter()

.flatten()

.flat_map(|shard| shard.replicate_states.values())

.filter(|state| !state.is_active())

.count();

🤖 Prompt for AI Agents

In src/common/metrics.rs around lines 297 to 302, the code currently counts shards that contain any non-active replica instead of counting the actual number of non-active replicas; replace the shard-level filter/count with logic that iterates the shard replica states and sums each replica where state.is_active() is false (i.e., count all non-active replicate_states values across all shards and collections) so the metric reflects the total number of non-active replicas rather than the number of affected shards.

* Dead shards in /metrics * Update src/common/metrics.rs Co-authored-by: Tim Visée <tim+github@visee.me> --------- Co-authored-by: Tim Visée <tim+github@visee.me>

timvisee approved these changes Sep 25, 2025

View reviewed changes

lib/collection/src/shards/replica_set/mod.rs Outdated Show resolved Hide resolved

JojiiOfficial force-pushed the metrics-unindexed-vectors branch from 892735f to f4288d7 Compare September 26, 2025 15:39

JojiiOfficial force-pushed the metrics-dead-shards branch 2 times, most recently from 1d650d0 to 210fc80 Compare September 26, 2025 15:47

JojiiOfficial requested review from generall and timvisee September 26, 2025 15:48

JojiiOfficial force-pushed the metrics-unindexed-vectors branch from 51f9bf3 to c97e534 Compare October 22, 2025 14:23

JojiiOfficial force-pushed the metrics-dead-shards branch 2 times, most recently from 9dd2e19 to 97fea54 Compare October 22, 2025 15:05

JojiiOfficial mentioned this pull request Oct 22, 2025

Metrics vectors by name per collection #7441

Merged

JojiiOfficial force-pushed the metrics-unindexed-vectors branch from 7001789 to a580671 Compare October 23, 2025 09:06

JojiiOfficial force-pushed the metrics-dead-shards branch from 97fea54 to 86bb3bc Compare October 23, 2025 09:10

timvisee approved these changes Oct 28, 2025

View reviewed changes

src/common/metrics.rs Outdated Show resolved Hide resolved

JojiiOfficial force-pushed the metrics-unindexed-vectors branch from 2f25d7d to d382289 Compare October 29, 2025 14:14

JojiiOfficial force-pushed the metrics-dead-shards branch from 04ac942 to ea878bc Compare October 29, 2025 14:17

JojiiOfficial force-pushed the metrics-unindexed-vectors branch 2 times, most recently from 6a6386c to e7cbba6 Compare October 29, 2025 15:41

Base automatically changed from metrics-unindexed-vectors to dev October 29, 2025 15:44

JojiiOfficial and others added 2 commits October 29, 2025 16:44

Dead shards in /metrics

0541bff

Update src/common/metrics.rs

9f39ddf

Co-authored-by: Tim Visée <tim+github@visee.me>

JojiiOfficial force-pushed the metrics-dead-shards branch from ea878bc to 9f39ddf Compare October 29, 2025 15:45

timvisee merged commit c1e5b17 into dev Oct 29, 2025
12 checks passed

timvisee deleted the metrics-dead-shards branch October 29, 2025 15:48

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Oct 30, 2025

Various metrics tweaks #7480

Merged

3 tasks

This was referenced Nov 11, 2025

Metrics for snapshots #7497

Merged

Final metrics tweaks for Qdrant 1.16.0 #7516

Merged

timvisee added a commit that referenced this pull request Nov 14, 2025

Dead shards in /metrics (#7310)

cc4bff1

* Dead shards in /metrics * Update src/common/metrics.rs Co-authored-by: Tim Visée <tim+github@visee.me> --------- Co-authored-by: Tim Visée <tim+github@visee.me>

timvisee mentioned this pull request Nov 14, 2025

Bump version to 1.16.0 #7535

Merged

coderabbitai bot mentioned this pull request Dec 8, 2025

Metrics incoming and outgoing shard transfers #7701

Merged

coderabbitai bot mentioned this pull request Mar 14, 2026

Per collection counts and metrics (REST) #8401

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dead shards in /metrics#7310

Dead shards in /metrics#7310
timvisee merged 2 commits intodevfrom
metrics-dead-shards

JojiiOfficial commented Sep 25, 2025

Uh oh!

Uh oh!

timvisee left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Oct 29, 2025

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JojiiOfficial commented Sep 25, 2025

Uh oh!

Uh oh!

timvisee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Oct 29, 2025

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants