Skip to content

spanconfig: introduce the ProtectedTSReader interface #75285

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
arulajmani:pts-reader
Mar 2, 2022
Merged

spanconfig: introduce the ProtectedTSReader interface #75285
craig[bot] merged 2 commits intocockroachdb:masterfrom
arulajmani:pts-reader

Conversation

@arulajmani
Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani commented Jan 21, 2022

See individual commits for details.

Release justification: low risk, high benefit changes to existing
functionality.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@arulajmani arulajmani force-pushed the pts-reader branch 2 times, most recently from 4f79636 to 7afc163 Compare January 24, 2022 20:20
@arulajmani arulajmani changed the title [WIP, DNM] spanconfig: introduce the ProtectedTSReader interface spanconfig: introduce the ProtectedTSReader interface Jan 24, 2022
@arulajmani arulajmani marked this pull request as ready for review January 24, 2022 20:21
@arulajmani arulajmani requested a review from a team as a code owner January 24, 2022 20:21
@irfansharif irfansharif removed their request for review February 10, 2022 22:14
Copy link
Copy Markdown
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had been on the back burner with all the other PTS related span config changes. Now that all those are in, I've dusted this off. Also changed the PTSReader to return a list of protection timestamps instead of encapsulating the "validity" logic based on our conversations in the pod.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @irfansharif, and @nvanbenschoten)

ClosedTimestampReceiver: ctReceiver,
ExternalStorage: externalStorage,
ExternalStorageFromURI: externalStorageFromURI,
ProtectedTimestampCache: protectedtsProvider,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we delete ProtectedTimestampCache?

Reconciler: reconciler,
Struct: reconciler.Metrics(),
Storage: storage,
Cache: cache,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we delete the Cache field?

@@ -40,6 +41,7 @@ type Provider interface {
Cache
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we delete Cache from here too?

protectionTimestamps, ts.readAt = r.store.protectedtsReader.GetProtectionTimestamps(ctx, sp)
earliestTS := hlc.Timestamp{}
for _, protectionTimestamp := range protectionTimestamps {
if protectionTimestamp.Less(earliestTS) && gcThreshold.LessEq(protectionTimestamp) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this condition is not required/incorrect?

continue
}
if earliestTS.IsEmpty() || protectionTimestamp.Less(earliestTS) {
earliestTS = protectionTimestamp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just assign to ts.earliestProtectionTimestamp and get rid of earliestTS?

Copy link
Copy Markdown
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @adityamaru from 5 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @irfansharif, and @nvanbenschoten)


pkg/kv/kvserver/replica_protected_timestamp.go, line 81 at r3 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

I believe this condition is not required/incorrect?

Yikes, not sure how that ended up here ... thanks for pointing it out!


pkg/kv/kvserver/replica_protected_timestamp.go, line 90 at r3 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

can we just assign to ts.earliestProtectionTimestamp and get rid of earliestTS?

To confirm, you're still saying we should clear out the old ts.earliestProtectionTimestamp field here by setting it to an empty timestamp above, right?

Looking at the diff, the old code didn't do that -- it should have, right?


pkg/kv/kvserver/protectedts/ptprovider/provider.go, line 68 at r3 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

can we delete the Cache field?

See above.


pkg/server/server.go, line 530 at r3 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

can we delete ProtectedTimestampCache?

Done.


pkg/kv/kvserver/protectedts/protectedts.go, line 41 at r3 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

can we delete Cache from here too?

We talked about this on offline, but capturing the conversation here as well -- I tried doing so but then I realized that the Cache is used by the GCJob as well. Thinking about this a bit more, adding the spanconfig.ProtectedTSReader on the provider wasn't correct -- the Provider is a SQL concept, whereas we expect the spanconfig.ProtectedTSReader to be a per-store thing going forward. To that end, instead of providing the spanconfig.ProtectedTSReader to KV via the Provider I think it makes more sense to instantiate one using the system tenant's ptcache.Cache (old substystem) and KVSubscriber (going forward). Made the change in server.go, PTAL.

The GC job realization above does lead to something we hadn't considered yet -- secondary tenants need to consult their protected TS information in their GC Job. We can't re-use the protectedts.Cache' as-is there. This is because the Cache works on spans (which secondary tenants never set, as we've moved to protecting schema objects). We'll need to consult something similar to the spanconfig.ProtectedTSStateReader', which we use in the SQLTranslator, instead.

As part of re-working the PTS subsystem to work with secondary tenants
we are dropping verification of protection records. None of the
existing users of `Verification` (currently only backup) would fail
non-destructively without it; this allows us to circumvent the
complexity involved in making `Verification` work in the new subsystem.

This patch removes code that was previously used to serve the
`AdminVerifyProtectedTimestamp` request; we instead vacuously
return true. This is to account for the {21.2, 22.1} mixed version case
where 22.1 binary nodes still need to be able to serve this request.
This can happen if the request is initiated on a 21.2 node and the
leaseholder of the range it is trying to verify is on a 22.1 node. By
always returning true we ensure the (backup) job upstream doesn't fail.
This is okay even if the PTS record being verified does not apply as
the failure mode is non-destructive (which is also why we're okay
removing the call to verification in 22.1).

This patch is required in preperation for the `ProtectedTSReader`.

Release note: None

Release justification: low risk, high benefit changes to existing
functionality.
@arulajmani arulajmani force-pushed the pts-reader branch 2 times, most recently from 873a101 to 17d97f4 Compare February 28, 2022 18:27
@adityamaru
Copy link
Copy Markdown
Contributor

The GC job realization above does lead to something we hadn't considered yet -- secondary tenants need to consult their protected TS information in their GC Job

Filed #77156 as a release blocker.

Copy link
Copy Markdown
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM but I'll defer to KV for a final stamp.

// Verify that it indeed did fail.
verifyErr := ptv.Verify(ctx, failedRec.ID.GetUUID())
require.Regexp(t, "failed to verify protection", verifyErr)
// replica can read it from. We then verify (below) that the failed record
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment seems truncated at the start

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@ajwerner might also want to take a look at this, as he was the original author of the protected timestamp system.

Reviewed 3 of 4 files at r2, 15 of 15 files at r4, 21 of 21 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @irfansharif)


pkg/spanconfig/spanconfig.go, line 406 at r5 (raw file):

type ProtectedTSReader interface {
	// GetProtectionTimestamps returns all protected timestamps that apply to the
	// any part of the given key span. The time at which this protection timestamp

"to the any"


pkg/spanconfig/spanconfigptsreader/adapter.go, line 27 at r5 (raw file):

// V1 of the protected timestamp subsystem only allowed protections to be set
// over spans belonging to the system tenant. These protections were stored in
// memory by ptcache.Cache. V2 of the subsystem allows protections to be set

Is the "in memory" qualifier here meaningful? These protected timestamps are still backed by a system table, which is no different than with V2, so what are you trying to indicate?

Would we accomplish the same thing by saying "These protections were cached on each node in the cluster by ptcache.Cache."?


pkg/spanconfig/spanconfigptsreader/adapter.go, line 38 at r5 (raw file):

//  subsystem, and we'd be able to get rid of this interface.
//
// TODO(arul): Embed the KVSubscriber here as well and actually encapsulate PTS

Since we plan to follow this PR up with this TODO, let's not embed the Cache in adapter. Instead, let's name the field and implement ProtectedTSReader explicitly. We don't want adapter to inherit all of Cache's methods anyway.


pkg/kv/kvserver/client_protectedts_test.go, line 230 at r4 (raw file):

	require.NoError(t, err)

	// replica can read it from. We then verify (below) that the failed record

Did this comment get cut off?


pkg/kv/kvserver/client_protectedts_test.go, line 274 at r4 (raw file):

}

func verifyProtectionTimestampExistsOnSpans(

Let's give this a comment.


pkg/kv/kvserver/protectedts/protectedts.go, line 41 at r3 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

We talked about this on offline, but capturing the conversation here as well -- I tried doing so but then I realized that the Cache is used by the GCJob as well. Thinking about this a bit more, adding the spanconfig.ProtectedTSReader on the provider wasn't correct -- the Provider is a SQL concept, whereas we expect the spanconfig.ProtectedTSReader to be a per-store thing going forward. To that end, instead of providing the spanconfig.ProtectedTSReader to KV via the Provider I think it makes more sense to instantiate one using the system tenant's ptcache.Cache (old substystem) and KVSubscriber (going forward). Made the change in server.go, PTAL.

The GC job realization above does lead to something we hadn't considered yet -- secondary tenants need to consult their protected TS information in their GC Job. We can't re-use the protectedts.Cache' as-is there. This is because the Cache works on spans (which secondary tenants never set, as we've moved to protecting schema objects). We'll need to consult something similar to the spanconfig.ProtectedTSStateReader', which we use in the SQLTranslator, instead.

Do we have an issue tracking the GC job changes we'll need to land for this release?

This patch introduces the `spanconfig.ProtectedTSReader` interface.
It is inteded to replace the `protectedts.Cache` as KVs integration
point for querying protectedTS metadata in the v2 of the protectedTS
subsystem.

For now, the `protectedts.Cache` is the only implementor of it. We will
have the `KVSubscriber` implement it as well once we start shipping
protected timestamp information on span configurations.

The `ProtectedTSReader` interface is also inteded to serve as an adapter
interface between v1 and v2 of the PTS subsystem. In particular, for
v21.2, we will consult both the `protectedts.Cache` and `KVSubscriber`
for PTS information. The logic here will be gated behind a call to
`GetProtectionTimetamps`, which is the only method this interface
provides.

Release note: None

Release justification: low risk, high benefit changes to existing
functionality.
Copy link
Copy Markdown
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @irfansharif, and @nvanbenschoten)


pkg/spanconfig/spanconfigptsreader/adapter.go, line 27 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is the "in memory" qualifier here meaningful? These protected timestamps are still backed by a system table, which is no different than with V2, so what are you trying to indicate?

Would we accomplish the same thing by saying "These protections were cached on each node in the cluster by ptcache.Cache."?

Done.


pkg/spanconfig/spanconfigptsreader/adapter.go, line 38 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Since we plan to follow this PR up with this TODO, let's not embed the Cache in adapter. Instead, let's name the field and implement ProtectedTSReader explicitly. We don't want adapter to inherit all of Cache's methods anyway.

Done. "embed" wasn't the right phrasing for the TODO above either.


pkg/kv/kvserver/protectedts/protectedts.go, line 41 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we have an issue tracking the GC job changes we'll need to land for this release?

Aditya filed #77156 and we've marked it a release blocker.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @irfansharif)

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Though only gave it a cursory review.

Reviewed 3 of 4 files at r2, 15 of 15 files at r4, 14 of 21 files at r5, 4 of 4 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif)

@arulajmani
Copy link
Copy Markdown
Collaborator Author

TFTRs! CI failure is a flake, merging.

bors r=adityamaru,nvanbenschoten,ajwerner

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 1, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 2, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 2, 2022

Build failed:

@arulajmani
Copy link
Copy Markdown
Collaborator Author

Flakes, plus didn't realize the PR description needs a release justification in addition to the commit messages. Let's try again!

bors r=adityamaru,nvanbenschoten,ajwerner

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 2, 2022

Build succeeded:

@craig craig bot merged commit 1cc1725 into cockroachdb:master Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants