Skip to content

kv: avoid regressions in closed timestamp sidetransportAccess#68313

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/closedTsRegress2
Aug 10, 2021
Merged

kv: avoid regressions in closed timestamp sidetransportAccess#68313
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/closedTsRegress2

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Aug 2, 2021

This commit strengthens the guarantees of the sidetransportAccess, which now promises that the returned timestamp from its get method will never regress across sequential calls. This is made possible by retaining two closed timestamps in the access - a cur closed timestamp that is known to be applied and an optional next closed timestamp that is not yet applied. Using two timestamps ensures that we never lose information about an applied closed timestamp, so we can make the stronger guarantee to users of the abstraction that returned timestamp will always increase monotonically across consecutive calls.

This was causing flakiness in tests added in #68194. The property-based tests in that PR were asserting (in one form or another) that after a QueryResolvedTimestamp request returns a resolved timestamp, a subsequent read-only request at that timestamp will never block on replication (i.e. the closed timestamp) or on conflicting transactions when executed. These tests were very rarely finding this not to be the case. The reason for this was that it was possible for the sidetransportAccess to regress across two sequential calls.

cc. @andreimatei, but not assigning because you're away.

@nvb nvb requested review from a team and aayushshah15 August 2, 2021 01:46
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

This commit strengthens the guarantees of the sidetransportAccess, which
now promises that the returned timestamp from its get method will never
regress across sequential calls. This is made possible by retaining two
closed timestamps in the access - a `cur` closed timestamp that is known
to be applied and an optional `next` closed timestamp that is not yet
applied. Using two timestamps ensures that we never lose information
about an applied closed timestamp, so we can make the stronger guarantee
to users of the abstraction that returned timestamp will always increase
monotonically across consecutive calls.

This was causing flakiness in tests added in cockroachdb#68194. The property-based
tests in that PR were asserting (in one form or another) that after a
`QueryResolvedTimestamp` request returns a resolved timestamp, a
subsequent read-only request at that timestamp will never block on
replication (i.e. the closed timestamp) or on conflicting transactions
when executed. These tests were very rarely finding this not to be the
case. The reason for this was that it was possible for the
`sidetransportAccess` to regress across two sequential calls.
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 9, 2021

Friendly ping @aayushshah15.

Copy link
Copy Markdown
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/replica_closedts.go, line 343 at r1 (raw file):

			// stored, so what we have stored is not usable. There's no point in going
			// to the receiver, as that one can only have an even higher LAI.
			return cur.ts

Does the following test case for TestSideTransportClosed look correct to you?

{
	name:      "current + next + receiver set, next not reached, receiver not reached",
	curSet:    true,
	nextSet:   true,
	recSet:    true,
	applied:   cur.lai - 1,
	expClosed: hlc.Timestamp{},
},

It produces cur.ts as the result, but shouldn't it produce hlc.Timestamp{}?


pkg/kv/kvserver/replica_closedts_internal_test.go, line 385 at r1 (raw file):

// TestSideTransportClosedMonotonic tests that sequential calls to
// sidetransportAccess.get return monotonically increasing timestamps.
func TestSideTransportClosedMonotonic(t *testing.T) {

nice test!

Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/replica_closedts.go, line 343 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Does the following test case for TestSideTransportClosed look correct to you?

{
	name:      "current + next + receiver set, next not reached, receiver not reached",
	curSet:    true,
	nextSet:   true,
	recSet:    true,
	applied:   cur.lai - 1,
	expClosed: hlc.Timestamp{},
},

It produces cur.ts as the result, but shouldn't it produce hlc.Timestamp{}?

Hm, that's somewhere between an invalid test case and the wrong result. curSet means that cur.ts has at one point been proven to be applied (by definition), so even if a later call to sidetransportAccess.get provides a smaller applied index, we still want to return cur.ts. That's actually an important part of the contract that prevents regressions.

Copy link
Copy Markdown
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: (but I'd appreciate it if you can clear my understanding about the thread below)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)


pkg/kv/kvserver/replica_closedts.go, line 343 at r1 (raw file):

curSet means that cur.ts has at one point been proven to be applied

oh, that makes sense.

even if a later call to sidetransportAccess.get provides a smaller applied index, we still want to return cur.ts

I'm still confused about why this is needed to prevent regressions though: sideTransportAccess.get() is always called with a replica's applied LAI (which can never regress, yes?).

Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/replica_closedts.go, line 343 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

curSet means that cur.ts has at one point been proven to be applied

oh, that makes sense.

even if a later call to sidetransportAccess.get provides a smaller applied index, we still want to return cur.ts

I'm still confused about why this is needed to prevent regressions though: sideTransportAccess.get() is always called with a replica's applied LAI (which can never regress, yes?).

You're correct that the applied LAI cannot regress. However, sidetransportAccess.get is not called under the replica mutex, so it's possible that the LAI passed to sidetransportAccess.get can regress between concurrent callers because the callers are not synchronized. But we stall want those callers to see a monotonic view of the applied side transport closed timestamp.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 10, 2021

Build succeeded:

@craig craig bot merged commit 679ee6e into cockroachdb:master Aug 10, 2021
@nvb nvb deleted the nvanbenschoten/closedTsRegress2 branch August 10, 2021 20:55
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants