-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: add metrics for cross-region, cross-zone batch requests / responses, snapshots, raft activities #103983
Description
Is your feature request related to a problem? Please describe.
Currently, it is difficult to observe cross-region (and cross-zone) traffic for
batch requests / responses, raft activities, and snapshots. This limitation
becomes problematic when we need to assess the volume of cross-region traffic
handled by nodes. In addition, this also allows us to evaluate potential
optimization. For example, consistent follower reads should reduce cross-region
traffic.
Describe the solution you'd like
One solution is to collect byte count metrics across different levels.
Batch requests sent and batch responses received at a node
- We can capture these metrics at the DistSender level, which runs on the
gateway node receiving the SQL queries. Additionally, we should also collect metrics
at the destination range node, which is responsible for the accessed data. - Cross Region
- Cross Zone
Snapshot bytes sent from and received at a store
- Cross Region (This has been mentioned in another issue.)
- Cross Zone
Raft messages sent from and received at a store through RaftTransport
- Cross Region
- Cross Zone
Limitation of the solution.
-
When tracking the transmission of bytes between regions and zones, we currently
rely on the byte count of the uncompressed data rather than the actual physical
bytes sent or received. While this approach is acceptable when the compression
factor remains consistent, it may hinder the accuracy of these metrics in
reflecting the true cross-region traffic volume. -
Ideally, we would want to aggregate the cross-region metrics based on the
originating or forwarding region of the messages. This would allow us to assess
the workload of individual regions. However, this tracking across multiple
regions may lead to high cardinality.
Jira issue: CRDB-28287