Improve telemetry logic and test by KShivendu · Pull Request #6399 · qdrant/qdrant

KShivendu · 2025-04-18T07:51:36Z

Some improvements on top of #6390.

Also checked latency of telemetry requests with level 3 vs 10 in chaos testing.

Before: (level 10)

p50 0.8347805
p99 1.5783391899999997

After: (level 3)

p50 0.7328605
p99 0.93947373

That's a 40% speed up in this case :)

All Submissions:

Contributions should target the dev branch. Did you create your branch from dev?
Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

coderabbitai · 2025-04-18T07:54:14Z

📝 Walkthrough

## Walkthrough

The changes span multiple modules and test files. In the telemetry module, the `LocalShardTelemetry` struct's `segments` field was changed from a non-optional vector to an optional `Option<Vec<SegmentTelemetry>>` with a serde attribute to skip serialization if `None`. Correspondingly, code initializing or assigning this field was updated to use `None` when no segments are present instead of an empty vector, including in the `DummyShard` and `LocalShard` telemetry data methods. The OpenAPI schema for `LocalShardTelemetry` was updated to make the `segments` property nullable and no longer required. In the operation time statistics module, the `Add` trait implementation for `OperationDurationStatistics` was simplified by replacing explicit pattern matching with concise logic for merging optional fields, preserving original behavior. In the storage types module, the anonymization function used on the `peers` field of the `ClusterInfo` struct was changed to one designed for collections with u64 hashable keys. In the telemetry module, the `count_vectors` method was simplified to sum vector counts directly from shard telemetry. In the test suite, the telemetry endpoint test was refactored from a single test function into a parameterized test running over multiple `details_level` values, verifying the structure and content of the telemetry response with increasing detail and nested data checks. No public API signatures were modified except for the renaming and parameterization of the test function.

## Possibly related PRs

- qdrant/qdrant#6390: Introduces optional telemetry fields and nullable properties in `LocalShardTelemetry`, including telemetry detail level refinements and anonymization improvements for shard telemetry data.

## Suggested reviewers

- coszio  
- timvisee

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)

We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05cf12f and 50920ad.

📒 Files selected for processing (1)

tests/openapi/test_service.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/openapi/test_service.py

⏰ Context from checks skipped due to timeout of 90000ms (11)

GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: test-consistency
GitHub Check: integration-tests-consensus
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: storage-compat-test
GitHub Check: lint
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: integration-tests

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c57d748 and 24fad49.

📒 Files selected for processing (4)

lib/collection/src/shards/telemetry.rs (1 hunks)
lib/segment/src/common/operation_time_statistics.rs (2 hunks)
lib/storage/src/types.rs (2 hunks)
tests/openapi/test_service.py (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

tests/openapi/test_service.py (1)

tests/openapi/helpers/helpers.py (1)

request_with_validation (39-93)

🪛 Ruff (0.8.2)

tests/openapi/test_service.py

65-65: Pointless comparison. Did you mean to assign a value? Otherwise, prepend assert or remove it.

(B015)

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-low-resources
GitHub Check: test-consistency
GitHub Check: test-consensus-compose
GitHub Check: rust-tests (macos-latest)
GitHub Check: integration-tests-consensus
GitHub Check: rust-tests (windows-latest)
GitHub Check: integration-tests
GitHub Check: lint
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: storage-compat-test

🔇 Additional comments (7)

lib/segment/src/common/operation_time_statistics.rs (2)

122-122: Improved pattern matching with more concise code

The change simplifies the pattern matching by using Option::or method instead of explicit pattern matching for None/Some cases. This makes the code more readable while maintaining the same behavior.

142-142: Consistent simplification of pattern matching

Similar to the change above, this uses the more idiomatic Option::or method to handle the option combination instead of explicit pattern matching. The behavior remains equivalent but the code is now more concise.

lib/collection/src/shards/telemetry.rs (1)

61-61: Good optimization for telemetry serialization

Adding skip_serializing_if = "Vec::is_empty" will omit the segments field from serialization when it's empty, reducing the size of telemetry payloads. This aligns well with the parameterized tests that verify telemetry response structure at different detail levels.

lib/storage/src/types.rs (2)

18-18: Updated import for more appropriate anonymization utility

Changed from anonymize_collection_values to the more specific anonymize_collection_with_u64_hashable_key, which better aligns with anonymizing the peers HashMap that's keyed by PeerId.

207-207: Using more appropriate anonymization function for PeerId-keyed HashMap

This change uses a more specific anonymization function that's designed for collections with u64 hashable keys, which is appropriate for the peers HashMap keyed by PeerId.

tests/openapi/test_service.py (2)

47-48: Good test improvement with parameterization

Replacing a single test with a parameterized test provides better coverage of the telemetry API across different detail levels. This is a good testing practice.

75-102: Well-structured test assertions for different detail levels

The test now properly verifies the response structure at each level of detail, checking for presence of expected fields based on the details_level parameter. This aligns well with the behavior of skipping empty segments in serialization introduced in the telemetry module.

tests/openapi/test_service.py

generall · 2025-04-18T08:03:12Z

lib/collection/src/shards/telemetry.rs

    /// Do NOT rely on this number unless you know what you are doing
    #[serde(skip_serializing_if = "Option::is_none")]
    pub num_vectors: Option<usize>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]


Empty vectors will break OpenAPI validation - it should be Option<Vec<SegmentTelemetry>> instead

generall · 2025-04-18T08:03:56Z

lib/storage/src/types.rs

    pub peer_id: PeerId,
    /// Peers composition of the cluster with main information
-    #[anonymize(with = anonymize_collection_values)]
+    #[anonymize(with = anonymize_collection_with_u64_hashable_key)]


generall · 2025-04-18T08:05:45Z

tests/openapi/test_service.py

+
+    collection = result['collections']['collections'][0]
+
+    if level == 1:
+        assert list(collection.keys()) == ['vectors', 'optimizers_status', 'params']
+    elif level == 2:
+        assert list(collection.keys()) == ['id', 'init_time_ms', 'config']
+    elif level >= 3:
+        assert list(collection.keys()) == ['id', 'init_time_ms', 'config', 'shards', 'transfers', 'resharding']
+
+    if level >= 3:
+        shard = collection['shards'][0]
+        assert list(shard.keys()) == ['id', 'key', 'local', 'remote', 'replicate_states']
+
+        local_shard = shard['local']
+
+        if level == 3:
+            assert list(local_shard.keys()) == [
+                'variant_name', 'status', 'total_optimized_points', 'vectors_size_bytes',
+                'payloads_size_bytes', 'num_points', 'num_vectors', 'optimizations', 'async_scorer'
+            ]
+        elif level > 3:
+            assert list(local_shard.keys()) == [
+                'variant_name', 'status', 'total_optimized_points', 'vectors_size_bytes',
+                'payloads_size_bytes', 'num_points', 'num_vectors', 'segments', 'optimizations', 'async_scorer'
+            ]
+
+    if level >= 4:
+        segment = local_shard['segments'][0]
+        assert list(segment.keys()) == ['info', 'config', 'vector_index_searches', 'payload_field_indices']


I am not sure we care that much about those levels. It makes this test kind of fragile. Also, .keys() ordering is not guaranteed

Also, .keys() ordering is not guaranteed

It returns ordered_dict and Qdrant always returns them in same order. there's nothing flaky about this and we don't change it often.

I am not sure we care that much about those levels

yeah it's not critical piece of code. Better to have tests than not have them :)

It returns ordered_dict and Qdrant always returns them in same order.

nothing really enforces that

Using set to avoid order change issues now.

* Improve telemetry logic and test * Parametrize telemetry test * Consistency hash peeer ID across telemetry * clean test * Use Option in segments telemetry * updat openapi spec * Avoid test failure on change in order of params

KShivendu added 3 commits April 18, 2025 12:21

Improve telemetry logic and test

fa16244

Parametrize telemetry test

913c0d8

Consistency hash peeer ID across telemetry

24fad49

coderabbitai bot reviewed Apr 18, 2025

View reviewed changes

tests/openapi/test_service.py Show resolved Hide resolved

generall reviewed Apr 18, 2025

View reviewed changes

KShivendu added 2 commits April 18, 2025 13:37

clean test

3bed69a

Use Option in segments telemetry

9ed1ea2

github-actions bot mentioned this pull request Apr 18, 2025

Flaky test hnsw_discover_test::hnsw_discover_precision #2973

Open

KShivendu added 2 commits April 18, 2025 14:19

updat openapi spec

05cf12f

Avoid test failure on change in order of params

50920ad

KShivendu requested review from generall and timvisee April 18, 2025 10:26

github-actions bot mentioned this pull request Apr 18, 2025

Flaky test index::tests::hw_counter_test::test_hw_counter_for_plain_sparse_search #6231

Closed

generall approved these changes Apr 18, 2025

View reviewed changes

generall merged commit d3d639e into dev Apr 18, 2025
17 checks passed

generall deleted the telemetry-improvements branch April 18, 2025 13:31

This was referenced Oct 29, 2025

Add unindexed vectors to telemetry and metrics API #7307

Merged

Metrics vectors by name per collection #7441

Merged

coderabbitai bot mentioned this pull request Dec 2, 2025

Metrics for LocalShard updates #7499

Open

coderabbitai bot mentioned this pull request Dec 23, 2025

[cluster telemetry] Expose endpoint #7729

Merged

3 tasks

coderabbitai bot mentioned this pull request Feb 4, 2026

Update queue status in metrics and telemetry #8060

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve telemetry logic and test#6399

Improve telemetry logic and test#6399
generall merged 7 commits intodevfrom
telemetry-improvements

KShivendu commented Apr 18, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 18, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

generall Apr 18, 2025

Uh oh!

generall Apr 18, 2025

Uh oh!

generall Apr 18, 2025

Uh oh!

KShivendu Apr 18, 2025 •

edited

Loading

Uh oh!

generall Apr 18, 2025

Uh oh!

KShivendu Apr 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KShivendu commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

All Submissions:

Uh oh!

coderabbitai bot commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

generall Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

generall Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

generall Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

KShivendu Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

generall Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

KShivendu Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KShivendu commented Apr 18, 2025 •

edited

Loading

coderabbitai bot commented Apr 18, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

KShivendu Apr 18, 2025 •

edited

Loading