Skip to content

Make indexed-only-excluded-points metric default to 0 #7649

Merged
JojiiOfficial merged 2 commits intodevfrom
metric_indexed_only_excluded_defaults_to_zero
Dec 2, 2025
Merged

Make indexed-only-excluded-points metric default to 0 #7649
JojiiOfficial merged 2 commits intodevfrom
metric_indexed_only_excluded_defaults_to_zero

Conversation

@JojiiOfficial
Copy link
Contributor

@JojiiOfficial JojiiOfficial commented Dec 1, 2025

This PR makes the collection_indexed_only_excluded_points metric report 0 if no vector is excluded.

@qdrant qdrant deleted a comment from coderabbitai bot Dec 1, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eba78a6 and 464a2c2.

📒 Files selected for processing (2)
  • lib/collection/src/shards/local_shard/telemetry.rs (3 hunks)
  • tests/openapi/test_service.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust
In new code, don't use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice - use bytemuck or zerocopy crates instead
Prefer explicit exhaustive matches over catch-all _ arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using : _ over using .. in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)

Files:

  • lib/collection/src/shards/local_shard/telemetry.rs
🧠 Learnings (6)
📚 Learning: 2025-09-01T11:19:26.371Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

  • lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-05-21T12:52:15.229Z
Learnt from: timvisee
Repo: qdrant/qdrant PR: 6569
File: lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs:56-76
Timestamp: 2025-05-21T12:52:15.229Z
Learning: In volatile sparse vector storage implementations, the `total_sparse_size` field should be updated when vectors are inserted, updated, or deleted to maintain an accurate count of the total number of non-zero elements across all vectors.

Applied to files:

  • lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

  • lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

  • lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

  • lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

  • lib/collection/src/shards/local_shard/telemetry.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: e2e-tests
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: test-consistency
  • GitHub Check: Build Qdrant Edge Python bindings
  • GitHub Check: test-consensus-compose
  • GitHub Check: lint
🔇 Additional comments (2)
tests/openapi/test_service.py (1)

99-100: LGTM! Test expectations correctly updated.

The test now expects the indexed_only_excluded_vectors field in the telemetry response at detail level 4 and above, which aligns with the implementation changes in telemetry.rs.

lib/collection/src/shards/local_shard/telemetry.rs (1)

189-189: LGTM! Comment accurately describes the new behavior.

The updated documentation clearly states that vector names without excluded vectors will report 0 rather than being omitted, which aligns with the PR objective.

.filter_map(move |vector_name| {
let segment_config = segment_guard.config().vector_data.get(&vector_name)?;

let points = segment_guard.available_point_count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the file around line 220
wc -l lib/collection/src/shards/local_shard/telemetry.rs

Repository: qdrant/qdrant

Length of output: 111


🏁 Script executed:

#!/bin/bash
# Read the relevant section
sed -n '200,260p' lib/collection/src/shards/local_shard/telemetry.rs

Repository: qdrant/qdrant

Length of output: 2226


🏁 Script executed:

#!/bin/bash
# Search for available_point_count method definition
rg -n 'fn available_point_count' --type rust

Repository: qdrant/qdrant

Length of output: 1355


🏁 Script executed:

#!/bin/bash
# Search for methods in SegmentState or similar that provide point counts
rg -n 'available_point_count|point_count|points_count' --type rust -A 2 | head -50

Repository: qdrant/qdrant

Length of output: 3864


🏁 Script executed:

#!/bin/bash
# Search for vector-specific count methods
rg -n 'fn.*vector.*count|fn.*available.*vector' --type rust | head -30

Repository: qdrant/qdrant

Length of output: 3614


🏁 Script executed:

#!/bin/bash
# Read the implementation of available_point_count in entry.rs
sed -n '448,460p' lib/segment/src/segment/entry.rs

Repository: qdrant/qdrant

Length of output: 523


🏁 Script executed:

#!/bin/bash
# Read the implementation in entry_point.rs (trait definition)
sed -n '236,250p' lib/segment/src/entry/entry_point.rs

Repository: qdrant/qdrant

Length of output: 725


🏁 Script executed:

#!/bin/bash
# Search for multi-vector related tests or code
rg -n 'multi.*vector|optional.*vector' --type rust -i | head -20

Repository: qdrant/qdrant

Length of output: 2763


🏁 Script executed:

#!/bin/bash
# Check if there's any mention of vector sparsity or optional vectors
rg -n 'sparse.*vector|optional.*vector|vector.*sparse' --type rust -B 2 -A 2 | head -40

Repository: qdrant/qdrant

Length of output: 2922


🏁 Script executed:

#!/bin/bash
# Search for vector_data structure definition
rg -n 'struct VectorDataConfig|pub vector_data' --type rust -A 5 | head -50

Repository: qdrant/qdrant

Length of output: 2698


🏁 Script executed:

#!/bin/bash
# Check the multi-vector test case
sed -n '421,480p' lib/shard/src/proxy_segment/tests.rs

Repository: qdrant/qdrant

Length of output: 1958


🏁 Script executed:

#!/bin/bash
# Search for vector insertion/update validation
rg -n 'insert.*vector|add.*point' --type rust lib/segment/src/entry/ -A 3 | head -50

Repository: qdrant/qdrant

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check if there's any optional vector tracking or per-vector availability
rg -n 'available_vectors|vector_exists|has_vector' --type rust | head -20

Repository: qdrant/qdrant

Length of output: 2711


🏁 Script executed:

#!/bin/bash
# Look at vector storage trait to understand how vectors are stored per point
rg -n 'fn get_vector|get_raw_vector' --type rust lib/segment/src/vector_storage/ -A 2 | head -50

Repository: qdrant/qdrant

Length of output: 4896


🏁 Script executed:

#!/bin/bash
# Check if VectorStorage methods work per-vector or segment-wide
sed -n '1,100p' lib/segment/src/vector_storage/vector_storage.rs

Repository: qdrant/qdrant

Length of output: 148


🏁 Script executed:

#!/bin/bash
# Search for HasVectorCondition usage
rg -n 'HasVectorCondition|has_vector' --type rust -B 2 -A 2 | head -40

Repository: qdrant/qdrant

Length of output: 3267


🏁 Script executed:

#!/bin/bash
# Check point insertion to verify if all vectors are required
rg -n 'upsert_point|insert_point' --type rust lib/segment/src/entry/ -A 5 | head -60

Repository: qdrant/qdrant

Length of output: 471


🏁 Script executed:

#!/bin/bash
# Check the id_tracker implementation to understand point lifecycle
sed -n '100,150p' lib/segment/src/id_tracker/id_tracker_base.rs

Repository: qdrant/qdrant

Length of output: 1822


🏁 Script executed:

#!/bin/bash
# Verify that points are tracked at segment level, not per-vector
rg -n 'available_point_count|deleted_point_count' --type rust lib/segment/src/id_tracker/ -A 2 | head -40

Repository: qdrant/qdrant

Length of output: 3084


🏁 Script executed:

#!/bin/bash
# Check the available_vector_count method in vector storage
rg -n 'fn available_vector_count' --type rust -A 2 | head -30

Repository: qdrant/qdrant

Length of output: 672


🏁 Script executed:

#!/bin/bash
# Look at the is_deleted_vector implementation to understand vector-level deletion
rg -n 'fn is_deleted_vector' --type rust -A 3 | head -40

Repository: qdrant/qdrant

Length of output: 3486


🏁 Script executed:

#!/bin/bash
# Verify how vectors are updated/deleted independently from points
rg -n 'delete_vector|remove_vector' --type rust lib/segment/src/ -B 1 -A 2 | head -50

Repository: qdrant/qdrant

Length of output: 4580


🏁 Script executed:

#!/bin/bash
# Check the full implementation of available_vector_count in segment_ops.rs
sed -n '624,635p' lib/segment/src/segment/segment_ops.rs

Repository: qdrant/qdrant

Length of output: 509


🏁 Script executed:

#!/bin/bash
# Verify the method is called in the entry.rs (SegmentEntry trait)
rg -n 'fn available_vector_count' lib/segment/src/entry/ -A 3

Repository: qdrant/qdrant

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check if available_vector_count is used elsewhere correctly
rg -n 'available_vector_count.*vector_name' --type rust | head -15

Repository: qdrant/qdrant

Length of output: 194


Use available_vector_count(&vector_name) instead of available_point_count() to get per-vector counts.

Line 220 calls segment_guard.available_point_count() which returns the segment-wide point count, but this should use segment_guard.available_vector_count(&vector_name)? to get the count of available vectors for the specific named vector. Vectors can be independently deleted from points, so different named vectors may have different availability counts. Using the segment-wide point count causes over-counting of excluded points when vectors have been deleted.

🤖 Prompt for AI Agents
In lib/collection/src/shards/local_shard/telemetry.rs around line 220, the code
uses segment_guard.available_point_count() which returns a segment-wide point
count; replace it with segment_guard.available_vector_count(&vector_name)? to
obtain the per-named-vector available count, adjust the variable name
accordingly (e.g., vectors_count or available_vectors) and propagate/handle the
Result (using ? or map_err) consistent with surrounding error handling so
compilation and semantics reflect per-vector availability.

Copy link
Member

@KShivendu KShivendu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest adding a proper test too.

But overall LGTM. I could see:

"indexed_only_excluded_vectors": {
  "": 0
}

@qdrant qdrant deleted a comment from coderabbitai bot Dec 1, 2025
Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this one up

@JojiiOfficial JojiiOfficial merged commit 8da4c22 into dev Dec 2, 2025
15 checks passed
@JojiiOfficial JojiiOfficial deleted the metric_indexed_only_excluded_defaults_to_zero branch December 2, 2025 08:36
timvisee pushed a commit that referenced this pull request Dec 3, 2025
* Make indexed_only_excluded in metrics default to 0

* Fix tests
@timvisee timvisee mentioned this pull request Dec 3, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants