Add unindexed vectors to telemetry and metrics API#7307
Conversation
3a0d050 to
dd7d9c9
Compare
generall
left a comment
There was a problem hiding this comment.
If this metric is intended to cover
Number of points (per collection) which will be excluded from search if indexed_only is used
then it is not computed correctly.
Segment is only excluded from search with indexed_only if is't size is bigger than
9f27753 to
3d94324
Compare
892735f to
f4288d7
Compare
src/common/metrics.rs
Outdated
|
|
||
| if !indexed_only_excluded.is_empty() { | ||
| metrics.push(metric_family( | ||
| "collection_index_only_excluded", |
There was a problem hiding this comment.
Following open metrics convention, shouldn't this be called something like?
collection_index_only_excluded_vectors_total
Where we have:
- 'vectors' - as a unit for the count
- 'total' - to emphasize this is an aggregated count
3d94324 to
39f8520
Compare
51f9bf3 to
c97e534
Compare
39f8520 to
086d4ff
Compare
7001789 to
a580671
Compare
| .filter(|(_, vector_size_bytes, _)| { | ||
| // Filter out only large segments that do not support full-scan, as smaller segments can | ||
| // be searched quickly without using an index and are included in index-only searches. | ||
| *vector_size_bytes > search_optimized_threshold_bytes | ||
| }) |
There was a problem hiding this comment.
Thought: maybe we can add a function to segments itself that tells us whether to include it in indexed_only search. That would make the logic here simpler.
There was a problem hiding this comment.
Implemented in #7463
I suggest to merge it into this one if you like it, then we can merge this PR as a whole.
There was a problem hiding this comment.
We well merge it later to not block our current PR queue.
938b102 to
2f25d7d
Compare
086d4ff to
bc6b84c
Compare
2f25d7d to
d382289
Compare
bc6b84c to
9666988
Compare
d382289 to
6a6386c
Compare
6a6386c to
e7cbba6
Compare
📝 WalkthroughWalkthroughThis PR extends the telemetry system with tracking for vectors excluded from index-only searches. The change adds a new optional Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
lib/collection/src/shards/local_shard/telemetry.rs (1)
183-204: Defer “should this segment be included in indexed_only?” to segment API (once available).Per earlier discussion, prefer a single per‑segment decision method to centralize logic (index presence + threshold checks). Since this was implemented in PR #7463, consider using it here when merged to avoid drift.
Example (pseudocode):
if !segment_guard.include_in_indexed_only(&vector_name, search_optimized_threshold_bytes) { // accumulate excluded }
🧹 Nitpick comments (6)
src/actix/api/service_api.rs (1)
81-84: Confirm Level4 is required for metrics populationRaising
TelemetryDetail.leveltoLevel4makes sense ifindexed_only_excluded_vectorsis only populated at Level4. Please confirm there’s no cheaper level that still provides this field, and consider noting this rationale in a short comment to avoid future regressions.lib/collection/src/shards/telemetry.rs (1)
68-70: Telemetry privacy: do we anonymize map keys (vector names)?
indexed_only_excluded_vectorsisHashMap<String, usize>. Whenanonymize()is requested, do HashMap keys get anonymized? If not, vector names may leak. Consider adding an anonymization strategy for keys or documenting that names are intended to be public.Additionally, consider
u64for counts to avoid platform-dependentusizeserialization differences.docs/redoc/master/openapi.json (2)
12217-12225: Please document semantics and provide an example forindexed_only_excluded_vectors.Add a short description clarifying:
- It maps vector names to the number of points excluded when
indexed_only=true.- The empty string key represents the default unnamed vector (if applicable).
Optional example improves discoverability.
Apply this minimal patch:
"indexed_only_excluded_vectors": { + "description": "Per-vector counts of points excluded from search when `indexed_only` is enabled. Key is the vector name; use empty string \"\" for the default unnamed vector.", "type": "object", "additionalProperties": { "type": "integer", "format": "uint", "minimum": 0 }, - "nullable": true + "nullable": true, + "example": { "": 89500, "image-embeddings": 1200 } }
12217-12225: Rename struct field to align with metric and semantic meaning.The struct field
indexed_only_excluded_vectorsrepresents point counts (confirmed by metriccollection_indexed_only_excluded_pointsand its description). Rename the field toindexed_only_excluded_pointsin:
lib/collection/src/shards/telemetry.rs(line 69)- Update all usages in
lib/collection/src/shards/local_shard/telemetry.rsandlib/collection/src/shards/dummy_shard.rs- Update
docs/redoc/master/openapi.jsonto reflect the field name changeThis aligns the struct field naming with the Prometheus metric
collection_indexed_only_excluded_pointsand eliminates confusion about what the values represent.lib/collection/src/shards/local_shard/telemetry.rs (2)
21-43: Good use of spawn_blocking; avoid name shadowing for clarity.Logic is sound. Minor readability: don’t reuse
segmentsfor telemetry results (it shadows the earlierself.segments). Rename tosegments_telemetry.Apply:
- let (segments, index_only_excluded_vectors) = segments_data.unwrap_or_default(); + let (segments_telemetry, index_only_excluded_vectors) = segments_data.unwrap_or_default(); @@ - segments: if segments.is_empty() { + segments: if segments_telemetry.is_empty() { None } else { - Some(segments) + Some(segments_telemetry) },Also applies to: 48-49, 80-84
171-198: Avoid unwrap; simplify error handling when fetching vector storage size.Refactor to a
matchto removeunwrap()after the error branch and reduce footguns.- let vector_storage_size = - segment_guard.available_vectors_size_in_bytes(&vector_name); - - if let Err(err) = vector_storage_size { - log::error!("Failed to get vector size from segment: {err:?}"); - return None; - } - - let points = segment_guard.available_point_count(); - Some((vector_name, vector_storage_size.unwrap(), points)) + match segment_guard.available_vectors_size_in_bytes(&vector_name) { + Ok(size_bytes) => { + let points = segment_guard.available_point_count(); + Some((vector_name, size_bytes, points)) + } + Err(err) => { + log::error!("Failed to get vector size from segment: {err:?}"); + None + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
docs/redoc/master/openapi.json(1 hunks)lib/collection/src/shards/dummy_shard.rs(1 hunks)lib/collection/src/shards/local_shard/telemetry.rs(3 hunks)lib/collection/src/shards/telemetry.rs(1 hunks)lib/common/common/src/types.rs(0 hunks)src/actix/api/service_api.rs(1 hunks)src/common/metrics.rs(3 hunks)src/common/telemetry_reporting.rs(0 hunks)
💤 Files with no reviewable changes (2)
- lib/common/common/src/types.rs
- src/common/telemetry_reporting.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
src/actix/api/service_api.rslib/collection/src/shards/telemetry.rslib/collection/src/shards/dummy_shard.rslib/collection/src/shards/local_shard/telemetry.rssrc/common/metrics.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
src/actix/api/service_api.rslib/collection/src/shards/telemetry.rslib/collection/src/shards/dummy_shard.rslib/collection/src/shards/local_shard/telemetry.rssrc/common/metrics.rs
🧬 Code graph analysis (1)
lib/collection/src/shards/local_shard/telemetry.rs (3)
lib/segment/src/vector_storage/common.rs (1)
get_async_scorer(12-14)lib/collection/src/shards/dummy_shard.rs (1)
get_telemetry_data(61-75)lib/collection/src/shards/local_shard/mod.rs (1)
segments(271-273)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: lint
🔇 Additional comments (6)
lib/collection/src/shards/dummy_shard.rs (1)
73-73: Initialization of new telemetry field looks correctSetting
indexed_only_excluded_vectors: Nonekeeps DummyShard aligned with the expanded telemetry without overhead.lib/collection/src/shards/local_shard/telemetry.rs (5)
52-61: LGTM on optimizations aggregation.Fold + telemetry retrieval is correct and efficient.
88-94: LGTM on Level4 gating and Option construction.The
.then(..)and.then_some(..)usage is clean and avoids unnecessary payload.
200-204: Semantics look correct for excluding only large unindexed segments.Matches the intent to not count small segments under the threshold.
161-170: Unit mismatch concern is incorrect; overflow risk remains valid.Both
indexing_thresholdandhnsw_config.full_scan_thresholduse the same unit (KB). The HnswConfig field notes: 1 KB = 1 vector of size 256, and segments_searcher.rs treats both as KB without conversion.However, use
saturating_multo prevent overflow when multiplying byBYTES_IN_KB:- let search_optimized_threshold_bytes = indexing_threshold.max(collection_config.hnsw_config.full_scan_threshold) - // convert KB to bytes - * BYTES_IN_KB; + let threshold_kb = indexing_threshold.max(collection_config.hnsw_config.full_scan_threshold) as u64; + let search_optimized_threshold_bytes = + threshold_kb.saturating_mul(BYTES_IN_KB as u64) as usize;
196-199: Now let me search for the actual telemetry.rs file to see the context more clearly:Based on the shell script results and verification, I can now provide the rewritten review comment:
Use segment.available_vector_count(&vector_name)? to get accurate per-vector counts instead of total points.
The current code counts
available_point_count()which returns total points in the segment, but when some points lackvector_name, this overstates the vector count for that name. The API segment.available_vector_count(vector_name) exists and should be used instead to obtain the accurate count of available vectors for the specific vector name.let points = segment_guard.available_vector_count(&vector_name)?; Some((vector_name, vector_storage_size.unwrap(), points))If not already, confirm this method exists on the Segment trait and wire it through here. Also apply this to lines 205–211.
| /// Returns the number of vectors which will be excluded from requests with `indexed_only` enabled. | ||
| /// | ||
| /// This effectively counts vectors in large unindexed segments. | ||
| fn get_index_only_excluded_vectors( |
There was a problem hiding this comment.
Docstring mismatches what is returned (points vs. vectors).
You aggregate counts of excluded points per vector name. Update docs to prevent confusion and align with the metric name.
-/// Returns the number of vectors which will be excluded from requests with `indexed_only` enabled.
+/// Returns the number of points that will be excluded from requests with `indexed_only` enabled.
///
-/// This effectively counts vectors in large unindexed segments.
+/// This effectively counts points in large segments where the given vector is unindexed.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Returns the number of vectors which will be excluded from requests with `indexed_only` enabled. | |
| /// | |
| /// This effectively counts vectors in large unindexed segments. | |
| fn get_index_only_excluded_vectors( | |
| /// Returns the number of points that will be excluded from requests with `indexed_only` enabled. | |
| /// | |
| /// This effectively counts points in large segments where the given vector is unindexed. | |
| fn get_index_only_excluded_vectors( |
🤖 Prompt for AI Agents
In lib/collection/src/shards/local_shard/telemetry.rs around lines 154 to 157,
the docstring incorrectly says "vectors" while the code aggregates counts of
excluded points per vector name; update the docstring to state that it returns
the number of excluded points (not vectors), aggregated per vector name, and
mention that this corresponds to the metric name (index_only_excluded_points or
similar) to avoid confusion.
| let points_excluded_from_index_only = collection | ||
| .shards | ||
| .iter() | ||
| .flatten() | ||
| .filter_map(|shard| shard.local.as_ref()) | ||
| .filter_map(|local| local.indexed_only_excluded_vectors.as_ref()) | ||
| .flatten() | ||
| .fold( | ||
| HashMap::<&str, usize>::default(), | ||
| |mut acc, (name, vector_size)| { | ||
| *acc.entry(name).or_insert(0) += vector_size; | ||
| acc | ||
| }, | ||
| ); | ||
|
|
||
| for (name, vector_size) in points_excluded_from_index_only { | ||
| indexed_only_excluded.push(gauge( | ||
| vector_size as f64, | ||
| &[("id", &collection.id), ("vector", name)], | ||
| )) | ||
| } | ||
| } |
There was a problem hiding this comment.
Fix type mismatch: adding &usize instead of usize
vector_size here is &usize (from iterating a HashMap<String, usize>), so += vector_size won’t compile. Dereference it.
Apply this minimal fix:
- .fold(
- HashMap::<&str, usize>::default(),
- |mut acc, (name, vector_size)| {
- *acc.entry(name).or_insert(0) += vector_size;
- acc
- },
- );
+ .fold(
+ HashMap::<&str, usize>::default(),
+ |mut acc, (name, vector_size)| {
+ *acc.entry(name).or_insert(0) += *vector_size;
+ acc
+ },
+ );Optional clarity and lifetime simplification (own keys; also rename vector_size -> count):
- .fold(
- HashMap::<&str, usize>::default(),
- |mut acc, (name, vector_size)| {
- *acc.entry(name).or_insert(0) += *vector_size;
- acc
- },
- );
+ .fold(
+ HashMap::<String, usize>::default(),
+ |mut acc, (name, count)| {
+ *acc.entry(name.clone()).or_insert(0) += *count;
+ acc
+ },
+ );
@@
- for (name, vector_size) in points_excluded_from_index_only {
- indexed_only_excluded.push(gauge(
- vector_size as f64,
- &[("id", &collection.id), ("vector", name)],
- ))
- }
+ for (name, count) in points_excluded_from_index_only {
+ indexed_only_excluded.push(gauge(
+ count as f64,
+ &[("id", &collection.id), ("vector", &name)],
+ ))
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let points_excluded_from_index_only = collection | |
| .shards | |
| .iter() | |
| .flatten() | |
| .filter_map(|shard| shard.local.as_ref()) | |
| .filter_map(|local| local.indexed_only_excluded_vectors.as_ref()) | |
| .flatten() | |
| .fold( | |
| HashMap::<&str, usize>::default(), | |
| |mut acc, (name, vector_size)| { | |
| *acc.entry(name).or_insert(0) += vector_size; | |
| acc | |
| }, | |
| ); | |
| for (name, vector_size) in points_excluded_from_index_only { | |
| indexed_only_excluded.push(gauge( | |
| vector_size as f64, | |
| &[("id", &collection.id), ("vector", name)], | |
| )) | |
| } | |
| } | |
| let points_excluded_from_index_only = collection | |
| .shards | |
| .iter() | |
| .flatten() | |
| .filter_map(|shard| shard.local.as_ref()) | |
| .filter_map(|local| local.indexed_only_excluded_vectors.as_ref()) | |
| .flatten() | |
| .fold( | |
| HashMap::<&str, usize>::default(), | |
| |mut acc, (name, vector_size)| { | |
| *acc.entry(name).or_insert(0) += *vector_size; | |
| acc | |
| }, | |
| ); | |
| for (name, vector_size) in points_excluded_from_index_only { | |
| indexed_only_excluded.push(gauge( | |
| vector_size as f64, | |
| &[("id", &collection.id), ("vector", name)], | |
| )) | |
| } |
| let points_excluded_from_index_only = collection | |
| .shards | |
| .iter() | |
| .flatten() | |
| .filter_map(|shard| shard.local.as_ref()) | |
| .filter_map(|local| local.indexed_only_excluded_vectors.as_ref()) | |
| .flatten() | |
| .fold( | |
| HashMap::<&str, usize>::default(), | |
| |mut acc, (name, vector_size)| { | |
| *acc.entry(name).or_insert(0) += vector_size; | |
| acc | |
| }, | |
| ); | |
| for (name, vector_size) in points_excluded_from_index_only { | |
| indexed_only_excluded.push(gauge( | |
| vector_size as f64, | |
| &[("id", &collection.id), ("vector", name)], | |
| )) | |
| } | |
| } | |
| let points_excluded_from_index_only = collection | |
| .shards | |
| .iter() | |
| .flatten() | |
| .filter_map(|shard| shard.local.as_ref()) | |
| .filter_map(|local| local.indexed_only_excluded_vectors.as_ref()) | |
| .flatten() | |
| .fold( | |
| HashMap::<String, usize>::default(), | |
| |mut acc, (name, count)| { | |
| *acc.entry(name.clone()).or_insert(0) += *count; | |
| acc | |
| }, | |
| ); | |
| for (name, count) in points_excluded_from_index_only { | |
| indexed_only_excluded.push(gauge( | |
| count as f64, | |
| &[("id", &collection.id), ("vector", &name)], | |
| )) | |
| } |
🤖 Prompt for AI Agents
In src/common/metrics.rs around lines 272 to 293, the fold closure adds a &usize
to an usize accumulator which causes a type mismatch; change the closure to
dereference the value when adding (e.g., *vector_size) and consider renaming the
variable to `count` for clarity or adjust the iterator to yield owned usize to
avoid borrowing issues.
* New metric for excluded vectors in indexed_only requests * Make metric name align with prometheus naming convention * Fix nit * Fix tests * Fix openapi * Rename index_only into indexed_only --------- Co-authored-by: timvisee <tim@visee.me>
Depends on #7302
Adds information how many vectors will be excluded from requests with
indexed_onlyenabled.