multitenant: add AdminRelocateRange capability#98483
multitenant: add AdminRelocateRange capability#98483craig[bot] merged 4 commits intocockroachdb:masterfrom ecwall:multitenant_admin_transfer_lease
Conversation
knz
left a comment
There was a problem hiding this comment.
Don't forget to update the tests reporting failing in CI.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 7 of 7 files at r3, 7 of 7 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
pkg/multitenant/tenantcapabilities/capabilities.go line 101 at r3 (raw file):
// // https://github.com/cockroachdb/cockroach/issues/96736. var DefaultCapabilities TenantCapabilities
tip: make this a function (func DefaultCapabilities() (res TenantCapabilities) { return res } to ensure that no code can modify this global variable by accident. This is useful because Go doesn't have the const keyword.
pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.go line 20 at r3 (raw file):
func init() { tenantcapabilities.DefaultCapabilities = &TenantCapabilities{}
Why do you need this?
pkg/sql/opt_exec_factory.go line 2041 at r4 (raw file):
fromStoreID tree.TypedExpr, ) (exec.Node, error) { if relocateSubject == tree.RelocateLease {
Do you expect we'll use separate capabilities to relocate replicas and leases? Why?
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
pkg/multitenant/tenantcapabilities/capabilities.go line 101 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
tip: make this a function (
func DefaultCapabilities() (res TenantCapabilities) { return res }to ensure that no code can modify this global variable by accident. This is useful because Go doesn't have theconstkeyword.
Sorry I forgot this is an interface. So this instead:
func DefaultCapabilities{} TenantCapabilities { return defaultCaps }
var defaultCaps TenantCapabilities
// RegisterDefaultCapabilities will be called from the pb package.
func RegisterDefaultCapabilities(caps TenantCapabilities) { defaultCaps = caps }
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
pkg/multitenant/tenantcapabilities/capabilities.go line 109 at r3 (raw file):
var builder strings.Builder builder.WriteByte('{') isFirst := true
tip to remove this isFirst thing
space := ""
builder.WriteByte('{'}
for _, capID := range ... {
...
if value != defaultValue {
builder.WriteString(space)
...
space = " "
}
}
ecwall
left a comment
There was a problem hiding this comment.
Updated.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)
pkg/multitenant/tenantcapabilities/capabilities.go line 101 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Sorry I forgot this is an interface. So this instead:
func DefaultCapabilities{} TenantCapabilities { return defaultCaps } var defaultCaps TenantCapabilities // RegisterDefaultCapabilities will be called from the pb package. func RegisterDefaultCapabilities(caps TenantCapabilities) { defaultCaps = caps }
Updated.
pkg/multitenant/tenantcapabilities/capabilities.go line 109 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
tip to remove this
isFirstthingspace := "" builder.WriteByte('{'} for _, capID := range ... { ... if value != defaultValue { builder.WriteString(space) ... space = " " } }
Done.
pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.go line 20 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Why do you need this?
tenantcapabilitiespb depends on tenantcapabilities so this is to prevent a dependency cycle.
I made this change ATM to reduce test churn, but this code will be revisited when I work on actually supporting default capability values in #96736.
pkg/sql/opt_exec_factory.go line 2041 at r4 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Do you expect we'll use separate capabilities to relocate replicas and leases? Why?
They have different endpoints. Isn't there a capability per endpoint or is this an exception?
knz
left a comment
There was a problem hiding this comment.
Reviewed 5 of 16 files at r5, 4 of 7 files at r7, 9 of 9 files at r8, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
pkg/sql/opt_exec_factory.go line 2041 at r4 (raw file):
Previously, ecwall (Evan Wall) wrote…
They have different endpoints. Isn't there a capability per endpoint or is this an exception?
IMHO capabilities should be per "user-facing feature". In this case, ALTER RANGE RELOCATE is just "one feature", wether you move replicas or leases.
knz
left a comment
There was a problem hiding this comment.
As discussed: it may be valuable to group the various Relocate operations under a single capability to simplify operational overhead.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
ecwall
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)
pkg/sql/opt_exec_factory.go line 2041 at r4 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
IMHO capabilities should be per "user-facing feature". In this case, ALTER RANGE RELOCATE is just "one feature", wether you move replicas or leases.
I changed all of these to use one can_admin_relocate_range capability:
RELOCATE LEASE -> AdminTransferLease
RELOCATE VOTERS -> AdminChangeReplicas
RELOCATE NONVOTERS -> AdminChangeReplicas
EXPERIMENTAL_RELOCATE LEASE -> AdminTransferLease
EXPERIMENTAL_RELOCATE VOTERS -> AdminRelocateRange
EXPERIMENTAL_RELOCATE NONVOTERS -> AdminRelocateRange
knz
left a comment
There was a problem hiding this comment.
Reviewed 10 of 10 files at r9.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 133 at r9 (raw file):
// TODO(ecwall): The following should also be authorized via specific capabilities. kvpb.AdminMerge: noCapCheckNeeded,
NB: it's absolutely essential we either restrict AdminMerge to a capability or change this to onlySystemTenant. Having secondary tenants use AdminMerge directly without verification is dangerous.
(You can do this in this PR or another followup PR)
ecwall
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 133 at r9 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
NB: it's absolutely essential we either restrict AdminMerge to a capability or change this to
onlySystemTenant. Having secondary tenants use AdminMerge directly without verification is dangerous.(You can do this in this PR or another followup PR)
I'll do this in a follow up because I have to familiarize myself with AdminMerge first. I am not sure I have a test case for it in multitenant_admin_function_test.go.
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 133 at r9 (raw file):
Previously, ecwall (Evan Wall) wrote…
I'll do this in a follow up because I have to familiarize myself with
AdminMergefirst. I am not sure I have a test case for it inmultitenant_admin_function_test.go.
You cannot trigger it from SQL so that is why you didn't see it before. But it can be used by end-users via the SendKVBatch HTTP API.
|
bors r=knz |
|
Build failed (retrying...): |
|
Random drive-by comment: I see that CI failed on |
|
bors r- |
|
Canceled. |
I think it may be some non-deterministic behavior that needs to be accounted for in this PR. I will update the tests to be more lenient. |
arulajmani
left a comment
There was a problem hiding this comment.
Mostly looks good. Few minor comments that we should address before merging, however.
Reviewed 1 of 1 files at r1, 15 of 15 files at r15, 1 of 1 files at r16, 7 of 7 files at r17, 11 of 12 files at r18.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @knz, and @rafiss)
pkg/multitenant/tenantcapabilities/capabilities.go line 169 at r16 (raw file):
// because excessive KV range scatter can overwhelm the storage // cluster. CanAdminScatter CapabilityID = iota + MinCapabilityID // can_admin_scatter
This doesn't feel idiomatic. Can we please revert this change?
pkg/multitenant/tenantcapabilities/capabilities.go line 99 at r17 (raw file):
// defaultCaps is the default state of capabilities. // TODO(ewall): Redesign as part of //
nit: remove this new line.
pkg/multitenant/tenantcapabilities/capabilities.go line 100 at r17 (raw file):
// TODO(ewall): Redesign as part of // // https://github.com/cockroachdb/cockroach/issues/96736.
Let's create a new issue for this with a proposed redesign. This is mostly for me, and other readers, who would want to follow along with what you may have in mind when you say redesign. If you don't want to create an issue here, you could consider adding more words to the TODO with what you have in mind.
pkg/multitenant/tenantcapabilities/capabilities.go line 109 at r17 (raw file):
// longer match DefaultCapabilities. This is different from // TenantCapabilities.String which only prints non-zero value fields. func AlteredCapabilitiesString(capabilities TenantCapabilities) string {
Let's make this a testing only function and put it in the testutils package instead.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 75 at r17 (raw file):
// CanAdminSplit and CanAdminScatter have special handling here for tests // that rely on them defaulting to true. // TODO(ewall): Redesign as part of
Again, add more words about what you mean here by redesign.
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/testdata/rangefeed_errors line 93 at r17 (raw file):
ten=11 altered-cap={} ten=12 altered-cap={can_admin_split:true} ten=50 altered-cap={}
Instead of printing empty capabilities, let's print something like {default}. Look at PrintSpanConfigDiffedAgainstDefaults and how it's used in datadriven tests for inspiration.
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher_test.go line 228 at r17 (raw file):
lastUpdateTS = ts.Clock().Now() case "get-altered-capabilities":
Let's continue calling the directive get-capabilities. However, the fact that we're diffing against default, should be highlighted in the DSL description in the comment.
knz
left a comment
There was a problem hiding this comment.
Thanks the structure of this PR looks fine to me now. I have a suggestion to clarify the capability ID numbering, see below.
Reviewed 5 of 16 files at r14, 4 of 6 files at r29, 11 of 12 files at r30, 1 of 1 files at r31.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
pkg/multitenant/tenantcapabilities/capabilities.go line 169 at r16 (raw file):
Previously, ecwall (Evan Wall) wrote…
Previously the hard coded
1value being passed toStringToEnumValueMapandEnumValuerepresented the number of_'s at the start of this const block.Using
MinCapabilityIDin both places shows this relationship whereas the previous code did not indicate one at all.Do you have an idiomatic way of doing this?
This looks fine to me now.
pkg/multitenant/tenantcapabilities/capabilities.go line 223 at r31 (raw file):
// MaxCapabilityID is the value of the maximum CapabilityID. MaxCapabilityID CapabilityID = iota + MinCapabilityID - 1
You could call this constant just NumCapabilityIDs (without an = assignment)
Then use NumCapabilityIDs-MinCapabilityID in various places below.
knz
left a comment
There was a problem hiding this comment.
Thanks
Reviewed 16 of 16 files at r32, 2 of 2 files at r33, 6 of 6 files at r34, 12 of 12 files at r35, 1 of 1 files at r36, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
ecwall
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)
pkg/multitenant/tenantcapabilities/capabilities.go line 223 at r31 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
You could call this constant just
NumCapabilityIDs(without an=assignment)Then use
NumCapabilityIDs-MinCapabilityIDin various places below.
I think that only works if MinCapabilityID is 0. I was trying to come up with a general purpose way of having MinCapabilityID point to the lowest value and MaxCapabilityID point to the highest without requiring manual changes when capabilities are added or removed.
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
pkg/multitenant/tenantcapabilities/capabilities.go line 223 at r31 (raw file):
Previously, ecwall (Evan Wall) wrote…
I think that only works if
MinCapabilityIDis 0. I was trying to come up with a general purpose way of havingMinCapabilityIDpoint to the lowest value andMaxCapabilityIDpoint to the highest without requiring manual changes when capabilities are added or removed.
With all due respect, this complexity here (and the risks associated with having people thinking about when to use + 1 vs - 1) is a lot of complexity just to enable this stringToCapabilityIDMap below. This is not even particularly "interesting" because the real solution would be to have stringer generate the mapping for us - so we wouldn't even need to implement CapabilityIDFromString ourselves.
I recommend you find a way forward that clarifies this and removes footguns, and see if one of the possible solutions would be to hand-code the full map for now and also make a plan to customize stringer in the future.
arulajmani
left a comment
There was a problem hiding this comment.
@ecwall I checked out your PR, and seems like this is the source of your dependency cycle:
cockroach/pkg/multitenant/tenantcapabilities/capabilities.go
Lines 131 to 133 in de3a5e5
This feels off -- this is production code, but we're calling into a testing only function to format an Entry. I don't see why a commit that's trying to reduce test churn is also changing how the struct is printed elsewhere, such as in logs.
If you wanted to get rid of this dependency, you could write a Print function for the Entry struct in tenantcapabilitiestestutils that formats things using this diffing method like you want just for tests.
All this being said, if @knz is okay with the way this code looks, I'll step aside.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
ecwall
left a comment
There was a problem hiding this comment.
I moved this function to tenantcapabilitiestestutils.AlteredCapabilitiesString and modified the test code to customize the output instead of relying on Update.String(), Entry.String().
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)
pkg/multitenant/tenantcapabilities/capabilities.go line 223 at r31 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
With all due respect, this complexity here (and the risks associated with having people thinking about when to use
+ 1vs- 1) is a lot of complexity just to enable thisstringToCapabilityIDMapbelow. This is not even particularly "interesting" because the real solution would be to havestringergenerate the mapping for us - so we wouldn't even need to implementCapabilityIDFromStringourselves.I recommend you find a way forward that clarifies this and removes footguns, and see if one of the possible solutions would be to hand-code the full map for now and also make a plan to customize
stringerin the future.
Talked about this offline - I will revert this for now and look at using a code generator to make the map and slice instead of using stringer + these min/max constants.
Though stringer already exists, it seems to be the wrong tool to use here if this calculation of min/max is too convoluted.
knz
left a comment
There was a problem hiding this comment.
Reviewed 2 of 14 files at r41, 2 of 12 files at r42, 1 of 1 files at r43.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
Thanks for making the change. |
Exclude capabilities from `tenant_builtins` logic test. These tests do not set or depend on capabilities and they are tested elsewhere. Release note: None
Change `watcher_test` to only output capabilities that were altered for the test. Release note: None
Fixes #98467 1) Add a new `tenantcapabilities.AdminRelocateRange` capability. 2) Check capability in `Authorizer.HasCapabilityForBatch` for `AdminChangeReplicas`, `AdminRelocateRange`, `AdminTransferLease`. 3) Remove unused `SkipSQLSystemTentantCheck`. Release note: None
Ignore ranges that can change between test runs in result verification. Release note: None
|
bors r=knz |
|
Build failed (retrying...): |
|
the function TestExperimentalRelocateNonVoters flaked in CI. If we're not 100% sure about it, i'd rather have it skipped before it causes CI failures for others. |
|
Build succeeded: |
98617: sql: fix data race in MemberOfWithAdminOption r=rafiss a=andyyang890 This patch fixes a race condition in MemberOfWithAdminOption by using a fresh transaction within the singleflight call. Fixes #95642 Fixes #96539 Release note: None 98801: multitenant: unskip multitenant_admin_function_test r=knz a=ecwall Fixes #95201 This test was flaking because it was verifying exact range start keys which are non-deterministic. This was fixed as part of #98483 which allows any value for the range start key and can now be unskipped. Release note: None Co-authored-by: Andy Yang <yang@cockroachlabs.com> Co-authored-by: Evan Wall <wall@cockroachlabs.com>
Fixes #98467
tenantcapabilities.AdminRelocateRangecapability.Authorizer.HasCapabilityForBatchforAdminChangeReplicas,AdminRelocateRange,AdminTransferLease.SkipSQLSystemTentantCheck.Release note: None