kvserver: add kv.closed_timestamp.policy_switch_latency_bucket_exceed_threshold#144115
kvserver: add kv.closed_timestamp.policy_switch_latency_bucket_exceed_threshold#144115craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
d4bbe76 to
b4e5703
Compare
c45ece2 to
7602164
Compare
2df8bda to
c7316c3
Compare
|
@arulajmani friendly ping on this pr in case it slipped through the cracks |
arulajmani
left a comment
There was a problem hiding this comment.
modulo comments. This did fall through the cracks, thanks for pinging!
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)
pkg/kv/kvserver/closedts/policy_calculation.go line 31 at r1 (raw file):
// the new policy is returned. // // 2. If there's an adjacent bucket jump, the new policy is returned.
The wording here is a bit confusing. Could you say more about what this means?
EDIT: did you mean "jump to a non-adjacent bucket"?
pkg/kv/kvserver/closedts/policy_calculation.go line 87 at r1 (raw file):
return newPolicy } return oldPolicy
Should we add metrics for this case (in a followup should be fine)?
pkg/kv/kvserver/closedts/setting.go line 103 at r1 (raw file):
"the fraction of the closed timestamp policy bucket width which need be "+ "exceeded before the closed timestamp policy will be changed", 0,
Should we enable this by default? Maybe 20%?
pkg/kv/kvserver/closedts/policy_calculation_test.go line 496 at r1 (raw file):
}, { name: "full dampening - multi-bucket jump",
Should we add a test case for multi-bucket jump with a lower dampening fraction as well?
4274d22 to
34abf04
Compare
wenyihu6
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)
pkg/kv/kvserver/closedts/policy_calculation.go line 31 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
The wording here is a bit confusing. Could you say more about what this means?
EDIT: did you mean "jump to a non-adjacent bucket"?
yes, the blame is on cursor : )
pkg/kv/kvserver/closedts/policy_calculation.go line 87 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should we add metrics for this case (in a followup should be fine)?
Yeah, this will be addressed in a follow-up PR: #144518. I've been considering whether it makes more sense to add the metrics to the side transport layer instead, to track the number of range updates there. Or we could track at both places (at policy refresher + side transport). Wdyt?
pkg/kv/kvserver/closedts/setting.go line 103 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should we enable this by default? Maybe 20%?
I changed the default to 20%. I plan to put up a separate pr to override it to 0 in the follower reads roachtests. I want to first stabilize and gain confidence in the current behavior using the roachtests. Adding dampening could introduce inaccuracies, making it hard to tell if test failures are due to the inadequacy of policy refresher itself or dampening. Does that sound good to you?
pkg/kv/kvserver/closedts/policy_calculation_test.go line 496 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should we add a test case for multi-bucket jump with a lower dampening fraction as well?
Done.
arulajmani
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)
pkg/kv/kvserver/closedts/policy_calculation.go line 31 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
yes, the blame is on cursor : )
😞
pkg/kv/kvserver/closedts/policy_calculation.go line 87 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Yeah, this will be addressed in a follow-up PR: #144518. I've been considering whether it makes more sense to add the metrics to the side transport layer instead, to track the number of range updates there. Or we could track at both places (at policy refresher + side transport). Wdyt?
Let's just do both. These are counters, so they're cheap. It's nice to be able to say "we didn't move X number of ranges because of the threshold"; contrast that to the side transport, where we may or may not move ranges for wider range of reasons.
pkg/kv/kvserver/closedts/setting.go line 103 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
I changed the default to 20%. I plan to put up a separate pr to override it to 0 in the follower reads roachtests. I want to first stabilize and gain confidence in the current behavior using the roachtests. Adding dampening could introduce inaccuracies, making it hard to tell if test failures are due to the inadequacy of policy refresher itself or dampening. Does that sound good to you?
Sounds reasonable to me!
|
Last push was an empty commit to manually trigger the stuck CI - reported in https://cockroachlabs.slack.com/archives/CJ0H8Q97C/p1745243652330699. |
…_threshold This commit introduces a new cluster setting: kv.closed_timestamp.policy_switch_latency_bucket_exceed_threshold. It defines the fraction of the closed timestamp policy bucket width that must be exceeded before a policy switch is triggered. This helps prevent aggressive policy changes for ranges near bucket boundaries, reducing excessive updates sent via side transport. Part of: cockroachdb#143890 Release note: none Epic: none
|
Last push was to rebase on master to pick up 8cb77d6 (more in https://cockroachlabs.slack.com/archives/C05B1BPMJLE/p1745253521900019). |
|
TFTR! bors r=arulajmani |
|
This PR was included in a batch that successfully built, but then failed to merge into master (it was a non-fast-forward update). It will be automatically retried. |
|
Build succeeded: |
This commit introduces a new cluster setting:
kv.closed_timestamp.policy_switch_latency_bucket_crossing_threshold. It defines
the fraction of the closed timestamp policy bucket width that must be exceeded
before a policy switch is triggered.
This helps prevent aggressive policy changes for ranges near bucket boundaries,
reducing excessive updates sent via side transport.
Part of: #143890
Release note: none
Epic: none