kvserver/closedts: add metrics for policy refresher by wenyihu6 · Pull Request #144518 · cockroachdb/cockroach

wenyihu6 · 2025-04-15T21:51:29Z

kvserver: add kv.closed_timestamp.policy_change

Previously, it was difficult to measure how often policies changed for ranges,
which is important because such changes can trigger additional range updates
sent in side transport.

This commit adds a metric to track the number of policy changes on replicas.

Part of: #143890
Release note: none

kvserver: add more metrics for policies

Previously, it was difficult to determine how many ranges fell into each latency
bucket policy. This commit adds 18 new metrics to StoreMetrics to track the
number of ranges per policy bucket for every store.

Part of: #143890
Release note: none

kvserver: add kv.closed_timestamp.policy_latency_info_missing

When a replica refreshes its policies, it looks up its peer replicas latency
info via a map passed by PolicyRefresher, which in turn periodically pulls node
latency info from RPCContext. If latency data for a node is missing, a default
hardcoded max RTT of 150ms is used.

Previously, it was hard to tell when this is happening. This commit adds metrics
to track how often the closed timestamp policy refresh falls back to the default
RTT due to missing node latency info. A high count might indicate the latency
cache isn’t refreshed frequently enough, suggesting we should consider lowering
kv.closed_timestamp.policy_latency_refresh_interval.

Resolves: #143890
Release note: none

cockroach-teamcity · 2025-04-15T21:51:45Z

This change is

blathers-crl · 2025-04-23T19:24:28Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

arulajmani

Reviewed 3 of 3 files at r1, 4 of 4 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)

pkg/kv/kvserver/replica.go line 1385 at r1 (raw file):

	newPolicy := computeNewPolicy(oldPolicy)
	r.cachedClosedTimestampPolicy.Store(int32(newPolicy))
	if oldPolicy != newPolicy {

nit:

ctpb.RangeClosedTimestampPolicy(r.cachedClosedTimestampPolicy.Load()) != newPolic {
...
// update metric
// store new policy
}

Previously, it was difficult to measure how often policies changed for ranges, which is important because such changes can trigger additional range updates sent in side transport. This commit adds a metric to track the number of policy changes on replicas. Part of: cockroachdb#143890 Release note: none

Previously, it was difficult to determine how many ranges fell into each latency bucket policy. This commit adds 18 new metrics to StoreMetrics to track the number of ranges per policy bucket for every store. Part of: cockroachdb#143890 Release note: none

When a replica refreshes its policies, it looks up its peer replicas latency info via a map passed by PolicyRefresher, which in turn periodically pulls node latency info from RPCContext. If latency data for a node is missing, a default hardcoded max RTT of 150ms is used. Previously, it was hard to tell when this is happening. This commit adds metrics to track how often the closed timestamp policy refresh falls back to the default RTT due to missing node latency info. A high count might indicate the latency cache isn’t refreshed frequently enough, suggesting we should consider lowering kv.closed_timestamp.policy_latency_refresh_interval. Resolves: cockroachdb#143890 Release note: none

wenyihu6

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)

pkg/kv/kvserver/replica.go line 1385 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit:

ctpb.RangeClosedTimestampPolicy(r.cachedClosedTimestampPolicy.Load()) != newPolic {
...
// update metric
// store new policy
}

Moved the policy update to be inside the if statement. I think this is what you had in mind, but lmk if I'm off. We need to instantiate oldPolicy as a separate variable because it's required by computeNewPolicy for dampening and also needed for policy comparison metrics update.

wenyihu6 · 2025-04-24T02:21:08Z

TFTR!

bors r=arulajmani

craig · 2025-04-24T02:52:35Z

Build succeeded:

wenyihu6 force-pushed the dampeningmetrics branch 10 times, most recently from 6f7167f to 009dccf Compare April 15, 2025 22:52

wenyihu6 mentioned this pull request Apr 16, 2025

kvserver: follow up work for auto-tuning closed ts #143890

Closed

3 tasks

wenyihu6 force-pushed the dampeningmetrics branch 2 times, most recently from 29955aa to d5c445e Compare April 17, 2025 13:36

wenyihu6 mentioned this pull request Apr 17, 2025

kvserver: add kv.closed_timestamp.policy_switch_latency_bucket_exceed_threshold #144115

Merged

wenyihu6 force-pushed the dampeningmetrics branch 3 times, most recently from dd36f87 to d614235 Compare April 23, 2025 19:24

wenyihu6 force-pushed the dampeningmetrics branch 2 times, most recently from 62012ba to 6e1f757 Compare April 23, 2025 19:53

wenyihu6 marked this pull request as ready for review April 23, 2025 19:53

wenyihu6 requested a review from a team as a code owner April 23, 2025 19:53

wenyihu6 requested a review from arulajmani April 23, 2025 19:53

arulajmani approved these changes Apr 23, 2025

View reviewed changes

wenyihu6 added 3 commits April 23, 2025 19:56

wenyihu6 force-pushed the dampeningmetrics branch from 6e1f757 to 2375f70 Compare April 23, 2025 23:56

wenyihu6 commented Apr 24, 2025

View reviewed changes

craig bot merged commit fc4a855 into cockroachdb:master Apr 24, 2025
23 checks passed

celeste-cockroachdb bot added the target-release-25.3.0 label Apr 24, 2025

celeste-cockroachdb bot added v25.3.0-prerelease and removed target-release-25.3.0 labels Jun 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver/closedts: add metrics for policy refresher#144518

kvserver/closedts: add metrics for policy refresher#144518
craig[bot] merged 3 commits intocockroachdb:masterfrom
wenyihu6:dampeningmetrics

wenyihu6 commented Apr 15, 2025 •

edited

Loading

Uh oh!

cockroach-teamcity commented Apr 15, 2025

Uh oh!

blathers-crl bot commented Apr 23, 2025

Uh oh!

arulajmani left a comment

Uh oh!

wenyihu6 left a comment

Uh oh!

wenyihu6 commented Apr 24, 2025

Uh oh!

craig bot commented Apr 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wenyihu6 commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Apr 15, 2025

Uh oh!

blathers-crl bot commented Apr 23, 2025

Uh oh!

arulajmani left a comment

Choose a reason for hiding this comment

Uh oh!

wenyihu6 left a comment

Choose a reason for hiding this comment

Uh oh!

wenyihu6 commented Apr 24, 2025

Uh oh!

craig bot commented Apr 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenyihu6 commented Apr 15, 2025 •

edited

Loading