Skip to content

kv: add metrics for cross-region, cross-zone batch requests / responses, snapshots, raft activities #103983

@wenyihu6

Description

@wenyihu6

Is your feature request related to a problem? Please describe.

Currently, it is difficult to observe cross-region (and cross-zone) traffic for
batch requests / responses, raft activities, and snapshots. This limitation
becomes problematic when we need to assess the volume of cross-region traffic
handled by nodes. In addition, this also allows us to evaluate potential
optimization. For example, consistent follower reads should reduce cross-region
traffic.

Describe the solution you'd like

One solution is to collect byte count metrics across different levels.

Batch requests sent and batch responses received at a node

  • We can capture these metrics at the DistSender level, which runs on the
    gateway node receiving the SQL queries. Additionally, we should also collect metrics
    at the destination range node, which is responsible for the accessed data.
  • Cross Region
  • Cross Zone

Snapshot bytes sent from and received at a store

  • Cross Region (This has been mentioned in another issue.)
  • Cross Zone

Raft messages sent from and received at a store through RaftTransport

  • Cross Region
  • Cross Zone

Limitation of the solution.

  • When tracking the transmission of bytes between regions and zones, we currently
    rely on the byte count of the uncompressed data rather than the actual physical
    bytes sent or received. While this approach is acceptable when the compression
    factor remains consistent, it may hinder the accuracy of these metrics in
    reflecting the true cross-region traffic volume.

  • Ideally, we would want to aggregate the cross-region metrics based on the
    originating or forwarding region of the messages. This would allow us to assess
    the workload of individual regions. However, this tracking across multiple
    regions may lead to high cardinality.

Jira issue: CRDB-28287

Metadata

Metadata

Assignees

Labels

A-kv-observabilityC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions