kv: configure leading closed timestamp target for global_read ranges#61386
kv: configure leading closed timestamp target for global_read ranges#61386craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
It returns! I guess we do need to support AOST with future-time timestamps after all. I'm going to patch in what I had for that to see if CI likes it, but I'll pull out the real thing into a separate PR. |
|
Supporting AOST with future-time timestamps fixed the testing failures. However, I am seeing some serious slowness in some tests like That would do the trick. I don't think we want the @dt do you have any time tomorrow for a quick discussion on this? I'd like to make sure I'm not dramatically slowing down any bulk operations that break work into many small transactions. |
|
FWIW if I don't commit-wait in |
93a34e9 to
e131973
Compare
e131973 to
e14b64e
Compare
|
This is ready for a look now. I updated the PR with a commit to allow future-time AOST queries. I also opened #61444, which tracks the egregious slowdown in the |
andreimatei
left a comment
There was a problem hiding this comment.
but I wouldn't merge until we mess with the code around the assertion in #61443
Reviewed 2 of 9 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/ccl/multiregionccl/regional_by_row_test.go, line 370 at r2 (raw file):
// Drop the closed timestamp target lead for GLOBAL tables. // The test passes with it configured to its default, but it // is very slow due to #61444 (2.5s vs. 35s).
please make a note on the issue to come back to this code
pkg/kv/kvserver/closedts/policy.go, line 34 at r2 (raw file):
// Simple calculation: lag now by desired duration. return now.ToTimestamp().Add(-lagTargetDuration.Nanoseconds(), 0) case roachpb.LEAD_FOR_GLOBAL_READS:
please comment what the default ends up being
pkg/kv/kvserver/closedts/setting.go, line 53 at r2 (raw file):
// LEAD_FOR_GLOBAL_READS closed timestamp policy use to publish close // timestamps, if it is set to a non-zero value. Meant as an escape hatch. var LeadForGlobalReadsOverride = settings.RegisterDurationSetting(
pls link to where the target is computed when this setting is not set
pkg/sql/sem/tree/as_of.go, line 112 at r1 (raw file):
// Parse synthetic flag. syn := false if strings.HasSuffix(s, "?") && evalCtx.Settings.Version.IsActive(context.Background(), clusterversion.PriorReadSummaries) {
doesn't the EvalContext have a ctx in it?
pkg/sql/sem/tree/as_of.go, line 112 at r1 (raw file):
// Parse synthetic flag. syn := false if strings.HasSuffix(s, "?") && evalCtx.Settings.Version.IsActive(context.Background(), clusterversion.PriorReadSummaries) {
pls explain why we refuse to parse if the version is old. And consider generating an explicit error here directly when we don't accept it
This commit updates the handling of AS OF SYSTEM TIME clauses to accept future-time synthetic timestamps. This functionality is needed when performing certain kinds of schema changes on GLOBAL tables, as the operating transaction will get its commit timestamp bumped into the future and will then try to call into `CountLeases` from `CheckTwoVersionInvariant` with the commit timestamp. We're already protected from users shooting themselves in the foot too badly with this, as we only allow timestamps to lead present time by a few seconds before they are rejected by KV. Release justification: Needed for new functionality
e14b64e to
d0eb544
Compare
nvb
left a comment
There was a problem hiding this comment.
TFTR! I'll wait on #61443.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei)
pkg/ccl/multiregionccl/regional_by_row_test.go, line 370 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please make a note on the issue to come back to this code
Done.
pkg/kv/kvserver/closedts/policy.go, line 34 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please comment what the default ends up being
Done.
pkg/kv/kvserver/closedts/setting.go, line 53 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls link to where the target is computed when this setting is not set
Done.
pkg/sql/sem/tree/as_of.go, line 112 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
doesn't the
EvalContexthave a ctx in it?
Done.
pkg/sql/sem/tree/as_of.go, line 112 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls explain why we refuse to parse if the version is old. And consider generating an explicit error here directly when we don't accept it
Done.
Informs cockroachdb#59680. Informs cockroachdb#52745. This commit updates `closedts.TargetForPolicy` to calculate a target closed timestamp that leads present time for ranges with the LEAD_FOR_GLOBAL_READS closed timestamp policy. This is needed for non-blocking transactions, which require ranges to closed time in the future. TargetForPolicy's LEAD_FOR_GLOBAL_READS calculation is more complex than its LAG_BY_CLUSTER_SETTING calculation. Instead of the policy defining an offset from the publisher's perspective, the policy defines a goal from the consumer's perspective - the goal being that present time reads (with a possible uncertainty interval) can be served from all followers. To accomplish this, we must work backwards to establish a lead time to publish closed timestamps at. The calculation looks something like the following: ``` // this should be sufficient for any present-time transaction, // because its global uncertainty limit should be <= this time. // For more, see (*Transaction).RequiredFrontier. closed_ts_at_follower = now + max_offset // the sender must account for the time it takes to propagate a // closed timestamp update to its followers. closed_ts_at_sender = closed_ts_at_follower + propagation_time // closed timestamps propagate in two ways. Both need to make it to // followers in time. propagation_time = max(raft_propagation_time, side_propagation_time) // raft propagation takes 3 network hops to go from a leader proposing // a write (with a closed timestamp update) to the write being applied. // 1. leader sends MsgProp with entry // 2. followers send MsgPropResp with vote // 3. leader sends MsgProp with higher commit index // // we also add on a small bit of overhead for request evaluation, log // sync, and state machine apply latency. raft_propagation_time = max_network_rtt * 1.5 + raft_overhead // side-transport propagation takes 1 network hop, as there is no voting. // However, it is delayed by the full side_transport_close_interval in // the worst-case. side_propagation_time = max_network_rtt * 0.5 + side_transport_close_interval // put together, we get the following result closed_ts_at_sender = now + max_offset + max( max_network_rtt * 1.5 + raft_overhead, max_network_rtt * 0.5 + side_transport_close_interval, ) ``` While writing this, I explored what it would take to use dynamic network latency measurements in this calculation to complete cockroachdb#59680. The code for that wasn't too bad, but brought up a number of questions, including how far into the tail we care about and whether we place upper and lower bounds on this value. To avoid needing to immediately answer these questions, the commit hardcodes a maximum network RTT of 150ms, which should be an overestimate for almost any cluster we expect to run on. The commit also adds a new `kv.closed_timestamp.lead_for_global_reads_override` cluster setting, which, if nonzero, overrides the lead time that global_read ranges use to publish closed timestamps. The cluster setting is hidden, but should provide an escape hatch for cases where we get the calculation (especially when it becomes dynamic) wrong. Release justification: needed for new functionality
d0eb544 to
b320c03
Compare
|
Merging, because it sounds like we understand #61443 and that it's an artifact of the test bumping the clock, so this won't make it worse. bors r+ |
|
Build failed (retrying...): |
|
bors r- |
|
bors r+ |
|
Build succeeded: |
Informs #59680.
Informs #52745.
This commit updates
closedts.TargetForPolicyto calculate a target closedtimestamp that leads present time for ranges with the LEAD_FOR_GLOBAL_READS
closed timestamp policy. This is needed for non-blocking transactions, which
require ranges to closed time in the future.
TargetForPolicy's LEAD_FOR_GLOBAL_READS calculation is more complex than its
LAG_BY_CLUSTER_SETTING calculation. Instead of the policy defining an offset
from the publisher's perspective, the policy defines a goal from the consumer's
perspective - the goal being that present time reads (with a possible
uncertainty interval) can be served from all followers. To accomplish this, we
must work backwards to establish a lead time to publish closed timestamps at.
The calculation looks something like the following:
While writing this, I explored what it would take to use dynamic network latency
measurements in this calculation to complete #59680. The code for that wasn't
too bad, but brought up a number of questions, including how far into the tail
we care about and whether we place upper and lower bounds on this value. To
avoid needing to immediately answer these questions, the commit hardcodes a
maximum network RTT of 150ms, which should be an overestimate for almost any
cluster we expect to run on.
The commit also adds a new
kv.closed_timestamp.lead_for_global_reads_overridecluster setting, which, if nonzero, overrides the lead time that global_read
ranges use to publish closed timestamps. The cluster setting is hidden, but
should provide an escape hatch for cases where we get the calculation
(especially when it becomes dynamic) wrong.
Release justification: needed for new functionality