kvserver: add cross-zone snapshot byte metrics to StoreMetrics#104417
kvserver: add cross-zone snapshot byte metrics to StoreMetrics#104417craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
ebe5321 to
d5d780c
Compare
|
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
17c4bc9 to
075eea9
Compare
kvoli
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, 7 of 7 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)
-- commits line 29 at r2:
Could you mention the locality-tier keys which are recognized as a "zone" here.
-- commits line 35 at r2:
Could we shorten the release note and move some of this up above.
I think it suffices to mention the supported zone locality-tier keys and then the requirement that there be at most one and consistent across nodes. The example is good!
pkg/kv/kvserver/store_snapshot.go line 977 at r2 (raw file):
// shouldIncrementCrossLocalitySnapshotMetrics returns (bool, bool) - indicating // if the two given replicas are cross-region and cross-zone respectively. func (s *Store) shouldIncrementCrossLocalitySnapshotMetrics(
nit: It may be worth mentioning why this function deals with both x-region and x-zone are handled in the same method, perf - so that in the future someone doesn't increase the overhead because they were unaware.
pkg/roachpb/metadata.go line 622 at r2 (raw file):
// getFirstRegionFirstZone iterates through the locality tiers and returns // multiple values containing: // 1.The value of the first encountered "region" tier.
nit: add some spacing between "1.The..."
pkg/roachpb/metadata.go line 628 at r2 (raw file):
func (l Locality) getFirstRegionFirstZone() (string, bool, string, string, bool) { hasRegion, hasZone := false, false firstRegionValue, firstZoneKey, firstZoneValue := "", "", ""
nit: probably more idiomatic to do something like
var (
hasRegion, hasZone bool
firstRegionValue, firstZoneKey, firstZoneValue string
)Or get that for free if you name the function return values like
func (l Locality) getFirstRegionFirstZone()
(firstRegion string, hasRegion bool, ...) {pkg/roachpb/metadata.go line 654 at r2 (raw file):
// 2. Error indicating if either locality does not have a "region" tier key. // 3. A boolean value indicating if this and the provided locality are // cross-region.
cross-zone*
pkg/roachpb/metadata.go line 667 at r2 (raw file):
// be checked thoroughly for duplicate zone tier keys and key mismatches. // However, due to frequent invocation of this function, we prefer to terminate // the check after examining the first encountered zone tier key-value pairs.
Just checking my understanding. Ignore possibility of any invalid locality keys, so they're all either region or zone.
Is the intention for x-region to imply x-zone? i.e. should it ever be possible to have two localities which are x-region but not x-zone?
When you want to estimate out how many bytes (roughly) you are paying for at x-region rates (excluding cheaper x-zone), you would do sum(x-region) - sum(x-zone)?
If so, do we need to additionally assume clusters won't have a mix of locality=region=...,zone=..., locality=region=... and locality=zone=...?
For example
L1: region=us-east, zone=us-east-1
L2: zone=us-east-2
L3: region=us-west
L1 x L2 = x-zone
L1 x L3 = x-region
L2 x L3 = none
Presume 3 snaps/requests each with value 1 for each of the Li x Lj enumerated above. Thesum(x-region)-sum(x-zone) metric query would return 0 despite there really being 1 x-region and 1 x-zone.
I know we previously chatted about the x-zone condition requiring a region flag also be set, what are your thoughts on this?
pkg/roachpb/metadata.go line 668 at r2 (raw file):
// However, due to frequent invocation of this function, we prefer to terminate // the check after examining the first encountered zone tier key-value pairs. func (l Locality) IsCrossRegionCrossZone(other Locality) (bool, error, bool, error) {
nit: Consider naming the return values.
pkg/roachpb/metadata_test.go line 224 at r2 (raw file):
} func TestLocalityIsCrossRegionCrossZone(t *testing.T) {
Great test!
35fbfe1 to
d49f7d7
Compare
Previously, kvoli (Austen) wrote…
Done. |
Previously, kvoli (Austen) wrote…
I wasn't sure the level of details we want to include for the commit and release notes. Let me know if this is too wordy. |
|
Previously, kvoli (Austen) wrote…
I added a comment for the locality helper instead. I think if someone is looking to add more locality tiers comparison here, they would come across that function eventually. I'm adding |
|
Previously, kvoli (Austen) wrote…
Done. |
|
Previously, kvoli (Austen) wrote…
Done. |
|
Previously, kvoli (Austen) wrote…
Discussed this offline:
In addition, the initial commit always increments cross-zone metrics. If we want to calculate for cross zone within same region activities, we would have to use DetailsThe new commit now only increments if it is cross-zone, within-region activities for properly configured region tiers. This makes the metrics more meaningful. |
1fcd7e4 to
ad60de0
Compare
c719251 to
6c70039
Compare
kvoli
left a comment
There was a problem hiding this comment.
Reviewed 4 of 6 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)
-- commits line 19 at r4:
nit: github will pickup the link fine with just #104111. No need for markdown.
-- commits line 33 at r4:
nit: Are these commit messages formatted to have correct line length?
pkg/kv/kvserver/store_snapshot.go line 1004 at r4 (raw file):
isCrossRegion, isCrossZone := s.shouldIncrementCrossLocalitySnapshotMetrics(ctx, firstReplica, secReplica) switch sentOrReceived { case "sent":
Should this instead be a bool? It would be better to be as strict as possible.
|
Previously, kvoli (Austen) wrote…
Done. |
This commit refactors `getSnapshotBytesMetrics` function in `replica_learner_test`. Instead of rigidly defining an array of metrics information to be extracted within the function, the refactoring now allows the caller to pass a metrics slice array to retrieve specific metrics of the caller’s choice. This commit does not change any existing functionality, and the main purpose is to make future commits cleaner. Part of: cockroachdb#103983 Release note: none
Previously, we [added](cockroachdb#104111) cross-region snapshot byte metrics to track the aggregate of snapshot bytes sent from and received at a store across different regions. We should add metrics to track cross-zone snapshots as well. To improve this issue, this commit adds two new store metrics - ``` range.snapshots.cross-zone.sent-bytes range.snapshots.cross-zone.rcvd-bytes ``` These metrics track the aggregate of snapshot bytes sent from and received at a store across different zones within the same region if the zone and region tier keys are properly configured across nodes. To ensure accurate metrics and consistent error logging, it is important to follow the assumption when configuring locality tiers across nodes: 1. For cross-region metrics, region tier keys must be present. 2. For cross-zone metrics, zone tier keys must be present. It is also essential to maintain consistency in the zone tier key across nodes. (e.g. using different keys, such as “az” and “zone”, can lead to unexpected behavior). 3. Within a node locality, region and zone tier keys should be unique. 4. Ensure consistent configuration of region and zone tiers across nodes. If all nodes configure both region and zone tiers: Cross-region and cross-zone metrics can be used to track following information: a. Aggregate of cross-region, cross-zone activities: `range.snapshots.cross-region.(sent|received)-bytes` b. Aggregate of same-region, cross-zone activities: `range.snapshots.cross-zone.(sent|received)-bytes` c. Aggregate of same-region, same-zone activities: `range.snapshots.(sent|received)-bytes` - a - b d. Cross-region, same zone activities will be considered as misconfigured, and an error will be logged. If all nodes have zone tiers configured, but not regions: cross-zone metrics will still be accurate. Problems arise if some nodes have region tier keys configured while others do not. If all nodes have region tiers configured: cross-region metrics will be accurate regardless of the nodes’ zone tier setup. Part of: cockroachdb#103983 Release note (ops change): Two new store metrics - `range.snapshots.cross-zone.sent-bytes` and `range.snapshots.cross-zone.rcvd-bytes` - are now added to track the aggregate of snapshot bytes sent from and received at a store across different zones. For accurate metrics, follow these assumptions: - Configure region and zone tier keys consistently across nodes. - Within a node locality, ensure unique region and zone tier keys. - Maintain consistent configuration of region and zone tiers across nodes.
|
TFTR! bors r=kvoli |
|
Build succeeded: |
kvserver: refactor getSnapshotBytesMetrics
This commit refactors
getSnapshotBytesMetricsfunction inreplica_learner_test. Instead of rigidly defining an array of metricsinformation to be extracted within the function, the refactoring now allows the
caller to pass a metrics slice array to retrieve specific metrics of the
caller’s choice.
This commit does not change any existing functionality, and the main purpose is
to make future commits cleaner.
Part of: #103983
Release note: none
kvserver: add cross-zone snapshot byte metrics to StoreMetrics
Previously, we added cross-region snapshot byte metrics to track the
aggregate of snapshot bytes sent from and received at a store across different
regions. We should add metrics to track cross-zone snapshots as well.
To improve this issue, this commit adds two new store metrics -
These metrics track the aggregate of snapshot bytes sent from and received at a
store across different zones within the same region if the zone and region tier
keys are properly configured across nodes.
To ensure accurate metrics and consistent error logging, it is important to
follow the assumption when configuring locality tiers across nodes:
to maintain consistency in the zone tier key across nodes. (e.g. using different
keys, such as “az” and “zone”, can lead to unexpected behavior).
If all nodes configure both region and zone tiers:
Cross-region and cross-zone metrics can be used to track following information:
a. Aggregate of cross-region, cross-zone activities:
range.snapshots.cross-region.(sent|received)-bytesb. Aggregate of same-region, cross-zone activities:
range.snapshots.cross-zone.(sent|received)-bytesc. Aggregate of same-region, same-zone activities:
range.snapshots.(sent|received)-bytes- a - bd. Cross-region, same zone activities will be considered as misconfigured, and
an error will be logged.
If all nodes have zone tiers configured, but not regions: cross-zone metrics
will still be accurate. Problems arise if some nodes have region tier keys
configured while others do not.
If all nodes have region tiers configured: cross-region metrics will be accurate
regardless of the nodes’ zone tier setup.
Part of: #103983
Release note (ops change): Two new store metrics -
range.snapshots.cross-zone.sent-bytesandrange.snapshots.cross-zone.rcvd-bytes- are now added to track the aggregateof snapshot bytes sent from and received at a store across different zones.
For accurate metrics, follow these assumptions: