kv: avoid regressions in closed timestamp sidetransportAccess#68313
kv: avoid regressions in closed timestamp sidetransportAccess#68313craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
This commit strengthens the guarantees of the sidetransportAccess, which now promises that the returned timestamp from its get method will never regress across sequential calls. This is made possible by retaining two closed timestamps in the access - a `cur` closed timestamp that is known to be applied and an optional `next` closed timestamp that is not yet applied. Using two timestamps ensures that we never lose information about an applied closed timestamp, so we can make the stronger guarantee to users of the abstraction that returned timestamp will always increase monotonically across consecutive calls. This was causing flakiness in tests added in cockroachdb#68194. The property-based tests in that PR were asserting (in one form or another) that after a `QueryResolvedTimestamp` request returns a resolved timestamp, a subsequent read-only request at that timestamp will never block on replication (i.e. the closed timestamp) or on conflicting transactions when executed. These tests were very rarely finding this not to be the case. The reason for this was that it was possible for the `sidetransportAccess` to regress across two sequential calls.
a26a81f to
f40a038
Compare
|
Friendly ping @aayushshah15. |
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_closedts.go, line 343 at r1 (raw file):
// stored, so what we have stored is not usable. There's no point in going // to the receiver, as that one can only have an even higher LAI. return cur.ts
Does the following test case for TestSideTransportClosed look correct to you?
{
name: "current + next + receiver set, next not reached, receiver not reached",
curSet: true,
nextSet: true,
recSet: true,
applied: cur.lai - 1,
expClosed: hlc.Timestamp{},
},
It produces cur.ts as the result, but shouldn't it produce hlc.Timestamp{}?
pkg/kv/kvserver/replica_closedts_internal_test.go, line 385 at r1 (raw file):
// TestSideTransportClosedMonotonic tests that sequential calls to // sidetransportAccess.get return monotonically increasing timestamps. func TestSideTransportClosedMonotonic(t *testing.T) {
nice test!
nvb
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/replica_closedts.go, line 343 at r1 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
Does the following test case for
TestSideTransportClosedlook correct to you?{ name: "current + next + receiver set, next not reached, receiver not reached", curSet: true, nextSet: true, recSet: true, applied: cur.lai - 1, expClosed: hlc.Timestamp{}, },It produces
cur.tsas the result, but shouldn't it producehlc.Timestamp{}?
Hm, that's somewhere between an invalid test case and the wrong result. curSet means that cur.ts has at one point been proven to be applied (by definition), so even if a later call to sidetransportAccess.get provides a smaller applied index, we still want to return cur.ts. That's actually an important part of the contract that prevents regressions.
aayushshah15
left a comment
There was a problem hiding this comment.
(but I'd appreciate it if you can clear my understanding about the thread below)
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
pkg/kv/kvserver/replica_closedts.go, line 343 at r1 (raw file):
curSet means that cur.ts has at one point been proven to be applied
oh, that makes sense.
even if a later call to sidetransportAccess.get provides a smaller applied index, we still want to return cur.ts
I'm still confused about why this is needed to prevent regressions though: sideTransportAccess.get() is always called with a replica's applied LAI (which can never regress, yes?).
nvb
left a comment
There was a problem hiding this comment.
TFTR!
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/replica_closedts.go, line 343 at r1 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
curSet means that cur.ts has at one point been proven to be applied
oh, that makes sense.
even if a later call to sidetransportAccess.get provides a smaller applied index, we still want to return cur.ts
I'm still confused about why this is needed to prevent regressions though:
sideTransportAccess.get()is always called with a replica's applied LAI (which can never regress, yes?).
You're correct that the applied LAI cannot regress. However, sidetransportAccess.get is not called under the replica mutex, so it's possible that the LAI passed to sidetransportAccess.get can regress between concurrent callers because the callers are not synchronized. But we stall want those callers to see a monotonic view of the applied side transport closed timestamp.
|
Build succeeded: |
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained
This commit strengthens the guarantees of the
sidetransportAccess, which now promises that the returned timestamp from its get method will never regress across sequential calls. This is made possible by retaining two closed timestamps in the access - acurclosed timestamp that is known to be applied and an optionalnextclosed timestamp that is not yet applied. Using two timestamps ensures that we never lose information about an applied closed timestamp, so we can make the stronger guarantee to users of the abstraction that returned timestamp will always increase monotonically across consecutive calls.This was causing flakiness in tests added in #68194. The property-based tests in that PR were asserting (in one form or another) that after a
QueryResolvedTimestamprequest returns a resolved timestamp, a subsequent read-only request at that timestamp will never block on replication (i.e. the closed timestamp) or on conflicting transactions when executed. These tests were very rarely finding this not to be the case. The reason for this was that it was possible for thesidetransportAccessto regress across two sequential calls.cc. @andreimatei, but not assigning because you're away.