Skip to content

Improve telemetry logic and test#6399

Merged
generall merged 7 commits intodevfrom
telemetry-improvements
Apr 18, 2025
Merged

Improve telemetry logic and test#6399
generall merged 7 commits intodevfrom
telemetry-improvements

Conversation

@KShivendu
Copy link
Member

@KShivendu KShivendu commented Apr 18, 2025

Some improvements on top of #6390.

Also checked latency of telemetry requests with level 3 vs 10 in chaos testing.

Before: (level 10)

p50 0.8347805
p99 1.5783391899999997

After: (level 3)

p50 0.7328605
p99 0.93947373

That's a 40% speed up in this case :)

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 18, 2025

📝 Walkthrough
## Walkthrough

The changes span multiple modules and test files. In the telemetry module, the `LocalShardTelemetry` struct's `segments` field was changed from a non-optional vector to an optional `Option<Vec<SegmentTelemetry>>` with a serde attribute to skip serialization if `None`. Correspondingly, code initializing or assigning this field was updated to use `None` when no segments are present instead of an empty vector, including in the `DummyShard` and `LocalShard` telemetry data methods. The OpenAPI schema for `LocalShardTelemetry` was updated to make the `segments` property nullable and no longer required. In the operation time statistics module, the `Add` trait implementation for `OperationDurationStatistics` was simplified by replacing explicit pattern matching with concise logic for merging optional fields, preserving original behavior. In the storage types module, the anonymization function used on the `peers` field of the `ClusterInfo` struct was changed to one designed for collections with u64 hashable keys. In the telemetry module, the `count_vectors` method was simplified to sum vector counts directly from shard telemetry. In the test suite, the telemetry endpoint test was refactored from a single test function into a parameterized test running over multiple `details_level` values, verifying the structure and content of the telemetry response with increasing detail and nested data checks. No public API signatures were modified except for the renaming and parameterization of the test function.

## Possibly related PRs

- qdrant/qdrant#6390: Introduces optional telemetry fields and nullable properties in `LocalShardTelemetry`, including telemetry detail level refinements and anonymization improvements for shard telemetry data.

## Suggested reviewers

- coszio  
- timvisee

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05cf12f and 50920ad.

📒 Files selected for processing (1)
  • tests/openapi/test_service.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/openapi/test_service.py
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: integration-tests

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c57d748 and 24fad49.

📒 Files selected for processing (4)
  • lib/collection/src/shards/telemetry.rs (1 hunks)
  • lib/segment/src/common/operation_time_statistics.rs (2 hunks)
  • lib/storage/src/types.rs (2 hunks)
  • tests/openapi/test_service.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/openapi/test_service.py (1)
tests/openapi/helpers/helpers.py (1)
  • request_with_validation (39-93)
🪛 Ruff (0.8.2)
tests/openapi/test_service.py

65-65: Pointless comparison. Did you mean to assign a value? Otherwise, prepend assert or remove it.

(B015)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: integration-tests
  • GitHub Check: lint
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: storage-compat-test
🔇 Additional comments (7)
lib/segment/src/common/operation_time_statistics.rs (2)

122-122: Improved pattern matching with more concise code

The change simplifies the pattern matching by using Option::or method instead of explicit pattern matching for None/Some cases. This makes the code more readable while maintaining the same behavior.


142-142: Consistent simplification of pattern matching

Similar to the change above, this uses the more idiomatic Option::or method to handle the option combination instead of explicit pattern matching. The behavior remains equivalent but the code is now more concise.

lib/collection/src/shards/telemetry.rs (1)

61-61: Good optimization for telemetry serialization

Adding skip_serializing_if = "Vec::is_empty" will omit the segments field from serialization when it's empty, reducing the size of telemetry payloads. This aligns well with the parameterized tests that verify telemetry response structure at different detail levels.

lib/storage/src/types.rs (2)

18-18: Updated import for more appropriate anonymization utility

Changed from anonymize_collection_values to the more specific anonymize_collection_with_u64_hashable_key, which better aligns with anonymizing the peers HashMap that's keyed by PeerId.


207-207: Using more appropriate anonymization function for PeerId-keyed HashMap

This change uses a more specific anonymization function that's designed for collections with u64 hashable keys, which is appropriate for the peers HashMap keyed by PeerId.

tests/openapi/test_service.py (2)

47-48: Good test improvement with parameterization

Replacing a single test with a parameterized test provides better coverage of the telemetry API across different detail levels. This is a good testing practice.


75-102: Well-structured test assertions for different detail levels

The test now properly verifies the response structure at each level of detail, checking for presence of expected fields based on the details_level parameter. This aligns well with the behavior of skipping empty segments in serialization introduced in the telemetry module.

/// Do NOT rely on this number unless you know what you are doing
#[serde(skip_serializing_if = "Option::is_none")]
pub num_vectors: Option<usize>,
#[serde(skip_serializing_if = "Vec::is_empty")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty vectors will break OpenAPI validation - it should be Option<Vec<SegmentTelemetry>> instead

pub peer_id: PeerId,
/// Peers composition of the cluster with main information
#[anonymize(with = anonymize_collection_values)]
#[anonymize(with = anonymize_collection_with_u64_hashable_key)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +72 to +101

collection = result['collections']['collections'][0]

if level == 1:
assert list(collection.keys()) == ['vectors', 'optimizers_status', 'params']
elif level == 2:
assert list(collection.keys()) == ['id', 'init_time_ms', 'config']
elif level >= 3:
assert list(collection.keys()) == ['id', 'init_time_ms', 'config', 'shards', 'transfers', 'resharding']

if level >= 3:
shard = collection['shards'][0]
assert list(shard.keys()) == ['id', 'key', 'local', 'remote', 'replicate_states']

local_shard = shard['local']

if level == 3:
assert list(local_shard.keys()) == [
'variant_name', 'status', 'total_optimized_points', 'vectors_size_bytes',
'payloads_size_bytes', 'num_points', 'num_vectors', 'optimizations', 'async_scorer'
]
elif level > 3:
assert list(local_shard.keys()) == [
'variant_name', 'status', 'total_optimized_points', 'vectors_size_bytes',
'payloads_size_bytes', 'num_points', 'num_vectors', 'segments', 'optimizations', 'async_scorer'
]

if level >= 4:
segment = local_shard['segments'][0]
assert list(segment.keys()) == ['info', 'config', 'vector_index_searches', 'payload_field_indices']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we care that much about those levels. It makes this test kind of fragile. Also, .keys() ordering is not guaranteed

Copy link
Member Author

@KShivendu KShivendu Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, .keys() ordering is not guaranteed

It returns ordered_dict and Qdrant always returns them in same order. there's nothing flaky about this and we don't change it often.

I am not sure we care that much about those levels

yeah it's not critical piece of code. Better to have tests than not have them :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns ordered_dict and Qdrant always returns them in same order.

nothing really enforces that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using set to avoid order change issues now.

@generall generall merged commit d3d639e into dev Apr 18, 2025
17 checks passed
@generall generall deleted the telemetry-improvements branch April 18, 2025 13:31
pull bot pushed a commit to kp-forks/qdrant that referenced this pull request Apr 21, 2025
* Improve telemetry logic and test

* Parametrize telemetry test

* Consistency hash peeer ID across telemetry

* clean test

* Use Option in segments telemetry

* updat openapi spec

* Avoid test failure on change in order of params
@coderabbitai coderabbitai bot mentioned this pull request Dec 23, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants