spanconfig: introduce the ProtectedTSReader interface #75285
spanconfig: introduce the ProtectedTSReader interface #75285craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
4f79636 to
7afc163
Compare
7afc163 to
087f3a0
Compare
087f3a0 to
e556a7d
Compare
arulajmani
left a comment
There was a problem hiding this comment.
This had been on the back burner with all the other PTS related span config changes. Now that all those are in, I've dusted this off. Also changed the PTSReader to return a list of protection timestamps instead of encapsulating the "validity" logic based on our conversations in the pod.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @irfansharif, and @nvanbenschoten)
e556a7d to
c58cdaa
Compare
pkg/server/server.go
Outdated
| ClosedTimestampReceiver: ctReceiver, | ||
| ExternalStorage: externalStorage, | ||
| ExternalStorageFromURI: externalStorageFromURI, | ||
| ProtectedTimestampCache: protectedtsProvider, |
There was a problem hiding this comment.
can we delete ProtectedTimestampCache?
| Reconciler: reconciler, | ||
| Struct: reconciler.Metrics(), | ||
| Storage: storage, | ||
| Cache: cache, |
There was a problem hiding this comment.
can we delete the Cache field?
| @@ -40,6 +41,7 @@ type Provider interface { | |||
| Cache | |||
There was a problem hiding this comment.
can we delete Cache from here too?
| protectionTimestamps, ts.readAt = r.store.protectedtsReader.GetProtectionTimestamps(ctx, sp) | ||
| earliestTS := hlc.Timestamp{} | ||
| for _, protectionTimestamp := range protectionTimestamps { | ||
| if protectionTimestamp.Less(earliestTS) && gcThreshold.LessEq(protectionTimestamp) { |
There was a problem hiding this comment.
I believe this condition is not required/incorrect?
| continue | ||
| } | ||
| if earliestTS.IsEmpty() || protectionTimestamp.Less(earliestTS) { | ||
| earliestTS = protectionTimestamp |
There was a problem hiding this comment.
can we just assign to ts.earliestProtectionTimestamp and get rid of earliestTS?
c58cdaa to
1ae4de8
Compare
arulajmani
left a comment
There was a problem hiding this comment.
Dismissed @adityamaru from 5 discussions.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @irfansharif, and @nvanbenschoten)
pkg/kv/kvserver/replica_protected_timestamp.go, line 81 at r3 (raw file):
Previously, adityamaru (Aditya Maru) wrote…
I believe this condition is not required/incorrect?
Yikes, not sure how that ended up here ... thanks for pointing it out!
pkg/kv/kvserver/replica_protected_timestamp.go, line 90 at r3 (raw file):
Previously, adityamaru (Aditya Maru) wrote…
can we just assign to ts.earliestProtectionTimestamp and get rid of
earliestTS?
To confirm, you're still saying we should clear out the old ts.earliestProtectionTimestamp field here by setting it to an empty timestamp above, right?
Looking at the diff, the old code didn't do that -- it should have, right?
pkg/kv/kvserver/protectedts/ptprovider/provider.go, line 68 at r3 (raw file):
Previously, adityamaru (Aditya Maru) wrote…
can we delete the Cache field?
See above.
pkg/server/server.go, line 530 at r3 (raw file):
Previously, adityamaru (Aditya Maru) wrote…
can we delete
ProtectedTimestampCache?
Done.
pkg/kv/kvserver/protectedts/protectedts.go, line 41 at r3 (raw file):
Previously, adityamaru (Aditya Maru) wrote…
can we delete Cache from here too?
We talked about this on offline, but capturing the conversation here as well -- I tried doing so but then I realized that the Cache is used by the GCJob as well. Thinking about this a bit more, adding the spanconfig.ProtectedTSReader on the provider wasn't correct -- the Provider is a SQL concept, whereas we expect the spanconfig.ProtectedTSReader to be a per-store thing going forward. To that end, instead of providing the spanconfig.ProtectedTSReader to KV via the Provider I think it makes more sense to instantiate one using the system tenant's ptcache.Cache (old substystem) and KVSubscriber (going forward). Made the change in server.go, PTAL.
The GC job realization above does lead to something we hadn't considered yet -- secondary tenants need to consult their protected TS information in their GC Job. We can't re-use the protectedts.Cache' as-is there. This is because the Cache works on spans (which secondary tenants never set, as we've moved to protecting schema objects). We'll need to consult something similar to the spanconfig.ProtectedTSStateReader', which we use in the SQLTranslator, instead.
As part of re-working the PTS subsystem to work with secondary tenants
we are dropping verification of protection records. None of the
existing users of `Verification` (currently only backup) would fail
non-destructively without it; this allows us to circumvent the
complexity involved in making `Verification` work in the new subsystem.
This patch removes code that was previously used to serve the
`AdminVerifyProtectedTimestamp` request; we instead vacuously
return true. This is to account for the {21.2, 22.1} mixed version case
where 22.1 binary nodes still need to be able to serve this request.
This can happen if the request is initiated on a 21.2 node and the
leaseholder of the range it is trying to verify is on a 22.1 node. By
always returning true we ensure the (backup) job upstream doesn't fail.
This is okay even if the PTS record being verified does not apply as
the failure mode is non-destructive (which is also why we're okay
removing the call to verification in 22.1).
This patch is required in preperation for the `ProtectedTSReader`.
Release note: None
Release justification: low risk, high benefit changes to existing
functionality.
873a101 to
17d97f4
Compare
Filed #77156 as a release blocker. |
adityamaru
left a comment
There was a problem hiding this comment.
Nice! LGTM but I'll defer to KV for a final stamp.
| // Verify that it indeed did fail. | ||
| verifyErr := ptv.Verify(ctx, failedRec.ID.GetUUID()) | ||
| require.Regexp(t, "failed to verify protection", verifyErr) | ||
| // replica can read it from. We then verify (below) that the failed record |
There was a problem hiding this comment.
nit: comment seems truncated at the start
nvb
left a comment
There was a problem hiding this comment.
@ajwerner might also want to take a look at this, as he was the original author of the protected timestamp system.
Reviewed 3 of 4 files at r2, 15 of 15 files at r4, 21 of 21 files at r5, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @irfansharif)
pkg/spanconfig/spanconfig.go, line 406 at r5 (raw file):
type ProtectedTSReader interface { // GetProtectionTimestamps returns all protected timestamps that apply to the // any part of the given key span. The time at which this protection timestamp
"to the any"
pkg/spanconfig/spanconfigptsreader/adapter.go, line 27 at r5 (raw file):
// V1 of the protected timestamp subsystem only allowed protections to be set // over spans belonging to the system tenant. These protections were stored in // memory by ptcache.Cache. V2 of the subsystem allows protections to be set
Is the "in memory" qualifier here meaningful? These protected timestamps are still backed by a system table, which is no different than with V2, so what are you trying to indicate?
Would we accomplish the same thing by saying "These protections were cached on each node in the cluster by ptcache.Cache."?
pkg/spanconfig/spanconfigptsreader/adapter.go, line 38 at r5 (raw file):
// subsystem, and we'd be able to get rid of this interface. // // TODO(arul): Embed the KVSubscriber here as well and actually encapsulate PTS
Since we plan to follow this PR up with this TODO, let's not embed the Cache in adapter. Instead, let's name the field and implement ProtectedTSReader explicitly. We don't want adapter to inherit all of Cache's methods anyway.
pkg/kv/kvserver/client_protectedts_test.go, line 230 at r4 (raw file):
require.NoError(t, err) // replica can read it from. We then verify (below) that the failed record
Did this comment get cut off?
pkg/kv/kvserver/client_protectedts_test.go, line 274 at r4 (raw file):
} func verifyProtectionTimestampExistsOnSpans(
Let's give this a comment.
pkg/kv/kvserver/protectedts/protectedts.go, line 41 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
We talked about this on offline, but capturing the conversation here as well -- I tried doing so but then I realized that the Cache is used by the GCJob as well. Thinking about this a bit more, adding the
spanconfig.ProtectedTSReaderon the provider wasn't correct -- theProvideris a SQL concept, whereas we expect thespanconfig.ProtectedTSReaderto be a per-store thing going forward. To that end, instead of providing thespanconfig.ProtectedTSReaderto KV via theProviderI think it makes more sense to instantiate one using the system tenant'sptcache.Cache(old substystem) andKVSubscriber(going forward). Made the change inserver.go, PTAL.The GC job realization above does lead to something we hadn't considered yet -- secondary tenants need to consult their protected TS information in their GC Job. We can't re-use the
protectedts.Cache' as-is there. This is because the Cache works on spans (which secondary tenants never set, as we've moved to protecting schema objects). We'll need to consult something similar to thespanconfig.ProtectedTSStateReader', which we use in the SQLTranslator, instead.
Do we have an issue tracking the GC job changes we'll need to land for this release?
This patch introduces the `spanconfig.ProtectedTSReader` interface. It is inteded to replace the `protectedts.Cache` as KVs integration point for querying protectedTS metadata in the v2 of the protectedTS subsystem. For now, the `protectedts.Cache` is the only implementor of it. We will have the `KVSubscriber` implement it as well once we start shipping protected timestamp information on span configurations. The `ProtectedTSReader` interface is also inteded to serve as an adapter interface between v1 and v2 of the PTS subsystem. In particular, for v21.2, we will consult both the `protectedts.Cache` and `KVSubscriber` for PTS information. The logic here will be gated behind a call to `GetProtectionTimetamps`, which is the only method this interface provides. Release note: None Release justification: low risk, high benefit changes to existing functionality.
arulajmani
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @irfansharif, and @nvanbenschoten)
pkg/spanconfig/spanconfigptsreader/adapter.go, line 27 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is the "in memory" qualifier here meaningful? These protected timestamps are still backed by a system table, which is no different than with V2, so what are you trying to indicate?
Would we accomplish the same thing by saying "These protections were cached on each node in the cluster by ptcache.Cache."?
Done.
pkg/spanconfig/spanconfigptsreader/adapter.go, line 38 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Since we plan to follow this PR up with this TODO, let's not embed the
Cacheinadapter. Instead, let's name the field and implementProtectedTSReaderexplicitly. We don't wantadapterto inherit all ofCache's methods anyway.
Done. "embed" wasn't the right phrasing for the TODO above either.
pkg/kv/kvserver/protectedts/protectedts.go, line 41 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we have an issue tracking the GC job changes we'll need to land for this release?
Aditya filed #77156 and we've marked it a release blocker.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r6, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @irfansharif)
ajwerner
left a comment
There was a problem hiding this comment.
Though only gave it a cursory review.
Reviewed 3 of 4 files at r2, 15 of 15 files at r4, 14 of 21 files at r5, 4 of 4 files at r6, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif)
|
TFTRs! CI failure is a flake, merging. bors r=adityamaru,nvanbenschoten,ajwerner |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build failed: |
|
Flakes, plus didn't realize the PR description needs a release justification in addition to the commit messages. Let's try again! bors r=adityamaru,nvanbenschoten,ajwerner |
|
Build succeeded: |
See individual commits for details.
Release justification: low risk, high benefit changes to existing
functionality.