Is your feature request related to a problem? Please describe.
Currently, it is difficult to observe cross-region (and cross-zone) traffic for
batch requests / responses, raft activities, and snapshots. This limitation
becomes problematic when we need to assess the volume of cross-region traffic
handled by nodes. In addition, this also allows us to evaluate potential
optimization. For example, consistent follower reads should reduce cross-region
traffic.
Describe the solution you'd like
One solution is to collect byte count metrics across different levels.
Batch requests sent and batch responses received at a node
Snapshot bytes sent from and received at a store
Raft messages sent from and received at a store through RaftTransport
Limitation of the solution.
-
When tracking the transmission of bytes between regions and zones, we currently
rely on the byte count of the uncompressed data rather than the actual physical
bytes sent or received. While this approach is acceptable when the compression
factor remains consistent, it may hinder the accuracy of these metrics in
reflecting the true cross-region traffic volume.
-
Ideally, we would want to aggregate the cross-region metrics based on the
originating or forwarding region of the messages. This would allow us to assess
the workload of individual regions. However, this tracking across multiple
regions may lead to high cardinality.
Jira issue: CRDB-28287
Is your feature request related to a problem? Please describe.
Currently, it is difficult to observe cross-region (and cross-zone) traffic for
batch requests / responses, raft activities, and snapshots. This limitation
becomes problematic when we need to assess the volume of cross-region traffic
handled by nodes. In addition, this also allows us to evaluate potential
optimization. For example, consistent follower reads should reduce cross-region
traffic.
Describe the solution you'd like
One solution is to collect byte count metrics across different levels.
Batch requests sent and batch responses received at a node
gateway node receiving the SQL queries. Additionally, we should also collect metrics
at the destination range node, which is responsible for the accessed data.
Snapshot bytes sent from and received at a store
Raft messages sent from and received at a store through RaftTransport
Limitation of the solution.
When tracking the transmission of bytes between regions and zones, we currently
rely on the byte count of the uncompressed data rather than the actual physical
bytes sent or received. While this approach is acceptable when the compression
factor remains consistent, it may hinder the accuracy of these metrics in
reflecting the true cross-region traffic volume.
Ideally, we would want to aggregate the cross-region metrics based on the
originating or forwarding region of the messages. This would allow us to assess
the workload of individual regions. However, this tracking across multiple
regions may lead to high cardinality.
Jira issue: CRDB-28287