Make indexed-only-excluded-points metric default to 0 #7649
Make indexed-only-excluded-points metric default to 0 #7649JojiiOfficial merged 2 commits intodevfrom
Conversation
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/shards/local_shard/telemetry.rs(3 hunks)tests/openapi/test_service.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicitSomeType::from(x)over implicitx.into()in Rust
In new code, don't usetransmute_from_u8,transmute_to_u8,transmute_from_u8_to_slice,transmute_from_u8_to_mut_slice,transmute_to_u8_slice- usebytemuckorzerocopycrates instead
Prefer explicit exhaustive matches over catch-all_arm in Rust match expressions, so we don't miss handling new enum variants when they are added later (exceptions: tests and benchmarks, or when sure new variants won't affect logic)
Prefer explicit field ignoring using: _over using..in Rust struct destructuring (exceptions: tests and benchmarks, or when sure new fields won't affect logic)
Files:
lib/collection/src/shards/local_shard/telemetry.rs
🧠 Learnings (6)
📚 Learning: 2025-09-01T11:19:26.371Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.
Applied to files:
lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-05-21T12:52:15.229Z
Learnt from: timvisee
Repo: qdrant/qdrant PR: 6569
File: lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs:56-76
Timestamp: 2025-05-21T12:52:15.229Z
Learning: In volatile sparse vector storage implementations, the `total_sparse_size` field should be updated when vectors are inserted, updated, or deleted to maintain an accurate count of the total number of non-zero elements across all vectors.
Applied to files:
lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
Applied to files:
lib/collection/src/shards/local_shard/telemetry.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/collection/src/shards/local_shard/telemetry.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: e2e-tests
- GitHub Check: integration-tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: test-consistency
- GitHub Check: Build Qdrant Edge Python bindings
- GitHub Check: test-consensus-compose
- GitHub Check: lint
🔇 Additional comments (2)
tests/openapi/test_service.py (1)
99-100: LGTM! Test expectations correctly updated.The test now expects the
indexed_only_excluded_vectorsfield in the telemetry response at detail level 4 and above, which aligns with the implementation changes intelemetry.rs.lib/collection/src/shards/local_shard/telemetry.rs (1)
189-189: LGTM! Comment accurately describes the new behavior.The updated documentation clearly states that vector names without excluded vectors will report
0rather than being omitted, which aligns with the PR objective.
| .filter_map(move |vector_name| { | ||
| let segment_config = segment_guard.config().vector_data.get(&vector_name)?; | ||
|
|
||
| let points = segment_guard.available_point_count(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read the file around line 220
wc -l lib/collection/src/shards/local_shard/telemetry.rsRepository: qdrant/qdrant
Length of output: 111
🏁 Script executed:
#!/bin/bash
# Read the relevant section
sed -n '200,260p' lib/collection/src/shards/local_shard/telemetry.rsRepository: qdrant/qdrant
Length of output: 2226
🏁 Script executed:
#!/bin/bash
# Search for available_point_count method definition
rg -n 'fn available_point_count' --type rustRepository: qdrant/qdrant
Length of output: 1355
🏁 Script executed:
#!/bin/bash
# Search for methods in SegmentState or similar that provide point counts
rg -n 'available_point_count|point_count|points_count' --type rust -A 2 | head -50Repository: qdrant/qdrant
Length of output: 3864
🏁 Script executed:
#!/bin/bash
# Search for vector-specific count methods
rg -n 'fn.*vector.*count|fn.*available.*vector' --type rust | head -30Repository: qdrant/qdrant
Length of output: 3614
🏁 Script executed:
#!/bin/bash
# Read the implementation of available_point_count in entry.rs
sed -n '448,460p' lib/segment/src/segment/entry.rsRepository: qdrant/qdrant
Length of output: 523
🏁 Script executed:
#!/bin/bash
# Read the implementation in entry_point.rs (trait definition)
sed -n '236,250p' lib/segment/src/entry/entry_point.rsRepository: qdrant/qdrant
Length of output: 725
🏁 Script executed:
#!/bin/bash
# Search for multi-vector related tests or code
rg -n 'multi.*vector|optional.*vector' --type rust -i | head -20Repository: qdrant/qdrant
Length of output: 2763
🏁 Script executed:
#!/bin/bash
# Check if there's any mention of vector sparsity or optional vectors
rg -n 'sparse.*vector|optional.*vector|vector.*sparse' --type rust -B 2 -A 2 | head -40Repository: qdrant/qdrant
Length of output: 2922
🏁 Script executed:
#!/bin/bash
# Search for vector_data structure definition
rg -n 'struct VectorDataConfig|pub vector_data' --type rust -A 5 | head -50Repository: qdrant/qdrant
Length of output: 2698
🏁 Script executed:
#!/bin/bash
# Check the multi-vector test case
sed -n '421,480p' lib/shard/src/proxy_segment/tests.rsRepository: qdrant/qdrant
Length of output: 1958
🏁 Script executed:
#!/bin/bash
# Search for vector insertion/update validation
rg -n 'insert.*vector|add.*point' --type rust lib/segment/src/entry/ -A 3 | head -50Repository: qdrant/qdrant
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check if there's any optional vector tracking or per-vector availability
rg -n 'available_vectors|vector_exists|has_vector' --type rust | head -20Repository: qdrant/qdrant
Length of output: 2711
🏁 Script executed:
#!/bin/bash
# Look at vector storage trait to understand how vectors are stored per point
rg -n 'fn get_vector|get_raw_vector' --type rust lib/segment/src/vector_storage/ -A 2 | head -50Repository: qdrant/qdrant
Length of output: 4896
🏁 Script executed:
#!/bin/bash
# Check if VectorStorage methods work per-vector or segment-wide
sed -n '1,100p' lib/segment/src/vector_storage/vector_storage.rsRepository: qdrant/qdrant
Length of output: 148
🏁 Script executed:
#!/bin/bash
# Search for HasVectorCondition usage
rg -n 'HasVectorCondition|has_vector' --type rust -B 2 -A 2 | head -40Repository: qdrant/qdrant
Length of output: 3267
🏁 Script executed:
#!/bin/bash
# Check point insertion to verify if all vectors are required
rg -n 'upsert_point|insert_point' --type rust lib/segment/src/entry/ -A 5 | head -60Repository: qdrant/qdrant
Length of output: 471
🏁 Script executed:
#!/bin/bash
# Check the id_tracker implementation to understand point lifecycle
sed -n '100,150p' lib/segment/src/id_tracker/id_tracker_base.rsRepository: qdrant/qdrant
Length of output: 1822
🏁 Script executed:
#!/bin/bash
# Verify that points are tracked at segment level, not per-vector
rg -n 'available_point_count|deleted_point_count' --type rust lib/segment/src/id_tracker/ -A 2 | head -40Repository: qdrant/qdrant
Length of output: 3084
🏁 Script executed:
#!/bin/bash
# Check the available_vector_count method in vector storage
rg -n 'fn available_vector_count' --type rust -A 2 | head -30Repository: qdrant/qdrant
Length of output: 672
🏁 Script executed:
#!/bin/bash
# Look at the is_deleted_vector implementation to understand vector-level deletion
rg -n 'fn is_deleted_vector' --type rust -A 3 | head -40Repository: qdrant/qdrant
Length of output: 3486
🏁 Script executed:
#!/bin/bash
# Verify how vectors are updated/deleted independently from points
rg -n 'delete_vector|remove_vector' --type rust lib/segment/src/ -B 1 -A 2 | head -50Repository: qdrant/qdrant
Length of output: 4580
🏁 Script executed:
#!/bin/bash
# Check the full implementation of available_vector_count in segment_ops.rs
sed -n '624,635p' lib/segment/src/segment/segment_ops.rsRepository: qdrant/qdrant
Length of output: 509
🏁 Script executed:
#!/bin/bash
# Verify the method is called in the entry.rs (SegmentEntry trait)
rg -n 'fn available_vector_count' lib/segment/src/entry/ -A 3Repository: qdrant/qdrant
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check if available_vector_count is used elsewhere correctly
rg -n 'available_vector_count.*vector_name' --type rust | head -15Repository: qdrant/qdrant
Length of output: 194
Use available_vector_count(&vector_name) instead of available_point_count() to get per-vector counts.
Line 220 calls segment_guard.available_point_count() which returns the segment-wide point count, but this should use segment_guard.available_vector_count(&vector_name)? to get the count of available vectors for the specific named vector. Vectors can be independently deleted from points, so different named vectors may have different availability counts. Using the segment-wide point count causes over-counting of excluded points when vectors have been deleted.
🤖 Prompt for AI Agents
In lib/collection/src/shards/local_shard/telemetry.rs around line 220, the code
uses segment_guard.available_point_count() which returns a segment-wide point
count; replace it with segment_guard.available_vector_count(&vector_name)? to
obtain the per-named-vector available count, adjust the variable name
accordingly (e.g., vectors_count or available_vectors) and propagate/handle the
Result (using ? or map_err) consistent with surrounding error handling so
compilation and semantics reflect per-vector availability.
KShivendu
left a comment
There was a problem hiding this comment.
Would suggest adding a proper test too.
But overall LGTM. I could see:
"indexed_only_excluded_vectors": {
"": 0
}
timvisee
left a comment
There was a problem hiding this comment.
Thanks for picking this one up
* Make indexed_only_excluded in metrics default to 0 * Fix tests
This PR makes the
collection_indexed_only_excluded_pointsmetric report0if no vector is excluded.