Skip to content

multitenant: add AdminRelocateRange capability#98483

Merged
craig[bot] merged 4 commits intocockroachdb:masterfrom
ecwall:multitenant_admin_transfer_lease
Mar 15, 2023
Merged

multitenant: add AdminRelocateRange capability#98483
craig[bot] merged 4 commits intocockroachdb:masterfrom
ecwall:multitenant_admin_transfer_lease

Conversation

@ecwall
Copy link
Copy Markdown
Contributor

@ecwall ecwall commented Mar 13, 2023

Fixes #98467

  1. Add a new tenantcapabilities.AdminRelocateRange capability.
  2. Check capability in Authorizer.HasCapabilityForBatch for
    AdminChangeReplicas, AdminRelocateRange, AdminTransferLease.
  3. Remove unused SkipSQLSystemTentantCheck.

Release note: None

@ecwall ecwall requested review from a team as code owners March 13, 2023 12:43
@ecwall ecwall requested review from arulajmani, knz, mgartner and rafiss and removed request for a team and mgartner March 13, 2023 12:43
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update the tests reporting failing in CI.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 7 of 7 files at r3, 7 of 7 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)


pkg/multitenant/tenantcapabilities/capabilities.go line 101 at r3 (raw file):

//
//	https://github.com/cockroachdb/cockroach/issues/96736.
var DefaultCapabilities TenantCapabilities

tip: make this a function (func DefaultCapabilities() (res TenantCapabilities) { return res } to ensure that no code can modify this global variable by accident. This is useful because Go doesn't have the const keyword.


pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.go line 20 at r3 (raw file):

func init() {
	tenantcapabilities.DefaultCapabilities = &TenantCapabilities{}

Why do you need this?


pkg/sql/opt_exec_factory.go line 2041 at r4 (raw file):

	fromStoreID tree.TypedExpr,
) (exec.Node, error) {
	if relocateSubject == tree.RelocateLease {

Do you expect we'll use separate capabilities to relocate replicas and leases? Why?

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)


pkg/multitenant/tenantcapabilities/capabilities.go line 101 at r3 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

tip: make this a function (func DefaultCapabilities() (res TenantCapabilities) { return res } to ensure that no code can modify this global variable by accident. This is useful because Go doesn't have the const keyword.

Sorry I forgot this is an interface. So this instead:

func DefaultCapabilities{} TenantCapabilities { return defaultCaps }
var defaultCaps TenantCapabilities

// RegisterDefaultCapabilities will be called from the pb package.
func RegisterDefaultCapabilities(caps TenantCapabilities) { defaultCaps = caps }

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)


pkg/multitenant/tenantcapabilities/capabilities.go line 109 at r3 (raw file):

	var builder strings.Builder
	builder.WriteByte('{')
	isFirst := true

tip to remove this isFirst thing

space := ""
builder.WriteByte('{'}
for _, capID := range ... {
   ...
   if value !=  defaultValue {
       builder.WriteString(space)
       ...
       space = " "
   }
}

Copy link
Copy Markdown
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)


pkg/multitenant/tenantcapabilities/capabilities.go line 101 at r3 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Sorry I forgot this is an interface. So this instead:

func DefaultCapabilities{} TenantCapabilities { return defaultCaps }
var defaultCaps TenantCapabilities

// RegisterDefaultCapabilities will be called from the pb package.
func RegisterDefaultCapabilities(caps TenantCapabilities) { defaultCaps = caps }

Updated.


pkg/multitenant/tenantcapabilities/capabilities.go line 109 at r3 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

tip to remove this isFirst thing

space := ""
builder.WriteByte('{'}
for _, capID := range ... {
   ...
   if value !=  defaultValue {
       builder.WriteString(space)
       ...
       space = " "
   }
}

Done.


pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.go line 20 at r3 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Why do you need this?

tenantcapabilitiespb depends on tenantcapabilities so this is to prevent a dependency cycle.

I made this change ATM to reduce test churn, but this code will be revisited when I work on actually supporting default capability values in #96736.


pkg/sql/opt_exec_factory.go line 2041 at r4 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Do you expect we'll use separate capabilities to relocate replicas and leases? Why?

They have different endpoints. Isn't there a capability per endpoint or is this an exception?

@ecwall ecwall requested a review from knz March 13, 2023 13:53
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 16 files at r5, 4 of 7 files at r7, 9 of 9 files at r8, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)


pkg/sql/opt_exec_factory.go line 2041 at r4 (raw file):

Previously, ecwall (Evan Wall) wrote…

They have different endpoints. Isn't there a capability per endpoint or is this an exception?

IMHO capabilities should be per "user-facing feature". In this case, ALTER RANGE RELOCATE is just "one feature", wether you move replicas or leases.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed: it may be valuable to group the various Relocate operations under a single capability to simplify operational overhead.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)

Copy link
Copy Markdown
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)


pkg/sql/opt_exec_factory.go line 2041 at r4 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

IMHO capabilities should be per "user-facing feature". In this case, ALTER RANGE RELOCATE is just "one feature", wether you move replicas or leases.

I changed all of these to use one can_admin_relocate_range capability:

RELOCATE LEASE -> AdminTransferLease
RELOCATE VOTERS -> AdminChangeReplicas
RELOCATE NONVOTERS -> AdminChangeReplicas
EXPERIMENTAL_RELOCATE LEASE -> AdminTransferLease
EXPERIMENTAL_RELOCATE VOTERS -> AdminRelocateRange
EXPERIMENTAL_RELOCATE NONVOTERS -> AdminRelocateRange

@ecwall ecwall changed the title multitenant: add AdminChangeReplicas capability multitenant: add AdminRelocateRange capability Mar 13, 2023
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 133 at r9 (raw file):

	// TODO(ecwall): The following should also be authorized via specific capabilities.
	kvpb.AdminMerge: noCapCheckNeeded,

NB: it's absolutely essential we either restrict AdminMerge to a capability or change this to onlySystemTenant. Having secondary tenants use AdminMerge directly without verification is dangerous.

(You can do this in this PR or another followup PR)

Copy link
Copy Markdown
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 133 at r9 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

NB: it's absolutely essential we either restrict AdminMerge to a capability or change this to onlySystemTenant. Having secondary tenants use AdminMerge directly without verification is dangerous.

(You can do this in this PR or another followup PR)

I'll do this in a follow up because I have to familiarize myself with AdminMerge first. I am not sure I have a test case for it in multitenant_admin_function_test.go.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 133 at r9 (raw file):

Previously, ecwall (Evan Wall) wrote…

I'll do this in a follow up because I have to familiarize myself with AdminMerge first. I am not sure I have a test case for it in multitenant_admin_function_test.go.

You cannot trigger it from SQL so that is why you didn't see it before. But it can be used by end-users via the SendKVBatch HTTP API.

@ecwall
Copy link
Copy Markdown
Contributor Author

ecwall commented Mar 13, 2023

bors r=knz

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 13, 2023

Build failed (retrying...):

@yuzefovich
Copy link
Copy Markdown
Member

Random drive-by comment: I see that CI failed on TestRelocateVoters on this PR and seems related, is this an old flake or a new flake being introduced by this PR?

@ecwall
Copy link
Copy Markdown
Contributor Author

ecwall commented Mar 13, 2023

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 13, 2023

Canceled.

@ecwall
Copy link
Copy Markdown
Contributor Author

ecwall commented Mar 13, 2023

Random drive-by comment: I see that CI failed on TestRelocateVoters on this PR and seems related, is this an old flake or a new flake being introduced by this PR?

I think it may be some non-deterministic behavior that needs to be accounted for in this PR. I will update the tests to be more lenient.

Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. Few minor comments that we should address before merging, however.

Reviewed 1 of 1 files at r1, 15 of 15 files at r15, 1 of 1 files at r16, 7 of 7 files at r17, 11 of 12 files at r18.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @knz, and @rafiss)


pkg/multitenant/tenantcapabilities/capabilities.go line 169 at r16 (raw file):

	// because excessive KV range scatter can overwhelm the storage
	// cluster.
	CanAdminScatter CapabilityID = iota + MinCapabilityID // can_admin_scatter

This doesn't feel idiomatic. Can we please revert this change?


pkg/multitenant/tenantcapabilities/capabilities.go line 99 at r17 (raw file):

// defaultCaps is the default state of capabilities.
// TODO(ewall): Redesign as part of
//

nit: remove this new line.


pkg/multitenant/tenantcapabilities/capabilities.go line 100 at r17 (raw file):

// TODO(ewall): Redesign as part of
//
//	https://github.com/cockroachdb/cockroach/issues/96736.

Let's create a new issue for this with a proposed redesign. This is mostly for me, and other readers, who would want to follow along with what you may have in mind when you say redesign. If you don't want to create an issue here, you could consider adding more words to the TODO with what you have in mind.


pkg/multitenant/tenantcapabilities/capabilities.go line 109 at r17 (raw file):

// longer match DefaultCapabilities. This is different from
// TenantCapabilities.String which only prints non-zero value fields.
func AlteredCapabilitiesString(capabilities TenantCapabilities) string {

Let's make this a testing only function and put it in the testutils package instead.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 75 at r17 (raw file):

			// CanAdminSplit and CanAdminScatter have special handling here for tests
			// that rely on them defaulting to true.
			// TODO(ewall): Redesign as part of

Again, add more words about what you mean here by redesign.


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/testdata/rangefeed_errors line 93 at r17 (raw file):

ten=11 altered-cap={}
ten=12 altered-cap={can_admin_split:true}
ten=50 altered-cap={}

Instead of printing empty capabilities, let's print something like {default}. Look at PrintSpanConfigDiffedAgainstDefaults and how it's used in datadriven tests for inspiration.


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/watcher_test.go line 228 at r17 (raw file):

				lastUpdateTS = ts.Clock().Now()

			case "get-altered-capabilities":

Let's continue calling the directive get-capabilities. However, the fact that we're diffing against default, should be highlighted in the DSL description in the comment.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks the structure of this PR looks fine to me now. I have a suggestion to clarify the capability ID numbering, see below.

Reviewed 5 of 16 files at r14, 4 of 6 files at r29, 11 of 12 files at r30, 1 of 1 files at r31.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)


pkg/multitenant/tenantcapabilities/capabilities.go line 169 at r16 (raw file):

Previously, ecwall (Evan Wall) wrote…

Previously the hard coded 1 value being passed to StringToEnumValueMap and EnumValue represented the number of _'s at the start of this const block.

Using MinCapabilityID in both places shows this relationship whereas the previous code did not indicate one at all.

Do you have an idiomatic way of doing this?

This looks fine to me now.


pkg/multitenant/tenantcapabilities/capabilities.go line 223 at r31 (raw file):

	// MaxCapabilityID is the value of the maximum CapabilityID.
	MaxCapabilityID CapabilityID = iota + MinCapabilityID - 1

You could call this constant just NumCapabilityIDs (without an = assignment)

Then use NumCapabilityIDs-MinCapabilityID in various places below.

@ecwall ecwall requested a review from a team as a code owner March 15, 2023 01:03
@ecwall ecwall requested review from rhu713 and removed request for a team and rhu713 March 15, 2023 01:03
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Reviewed 16 of 16 files at r32, 2 of 2 files at r33, 6 of 6 files at r34, 12 of 12 files at r35, 1 of 1 files at r36, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)

Copy link
Copy Markdown
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)


pkg/multitenant/tenantcapabilities/capabilities.go line 223 at r31 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

You could call this constant just NumCapabilityIDs (without an = assignment)

Then use NumCapabilityIDs-MinCapabilityID in various places below.

I think that only works if MinCapabilityID is 0. I was trying to come up with a general purpose way of having MinCapabilityID point to the lowest value and MaxCapabilityID point to the highest without requiring manual changes when capabilities are added or removed.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)


pkg/multitenant/tenantcapabilities/capabilities.go line 223 at r31 (raw file):

Previously, ecwall (Evan Wall) wrote…

I think that only works if MinCapabilityID is 0. I was trying to come up with a general purpose way of having MinCapabilityID point to the lowest value and MaxCapabilityID point to the highest without requiring manual changes when capabilities are added or removed.

With all due respect, this complexity here (and the risks associated with having people thinking about when to use + 1 vs - 1) is a lot of complexity just to enable this stringToCapabilityIDMap below. This is not even particularly "interesting" because the real solution would be to have stringer generate the mapping for us - so we wouldn't even need to implement CapabilityIDFromString ourselves.

I recommend you find a way forward that clarifies this and removes footguns, and see if one of the possible solutions would be to hand-code the full map for now and also make a plan to customize stringer in the future.

Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ecwall I checked out your PR, and seems like this is the source of your dependency cycle:

func (u Entry) String() string {
return fmt.Sprintf("ten=%v cap=%v", u.TenantID, TestingAlteredCapabilitiesString(u.TenantCapabilities))
}

This feels off -- this is production code, but we're calling into a testing only function to format an Entry. I don't see why a commit that's trying to reduce test churn is also changing how the struct is printed elsewhere, such as in logs.

If you wanted to get rid of this dependency, you could write a Print function for the Entry struct in tenantcapabilitiestestutils that formats things using this diffing method like you want just for tests.

All this being said, if @knz is okay with the way this code looks, I'll step aside.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)

@arulajmani arulajmani dismissed their stale review March 15, 2023 13:43

see comment above.

Copy link
Copy Markdown
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this function to tenantcapabilitiestestutils.AlteredCapabilitiesString and modified the test code to customize the output instead of relying on Update.String(), Entry.String().

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, and @rafiss)


pkg/multitenant/tenantcapabilities/capabilities.go line 223 at r31 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

With all due respect, this complexity here (and the risks associated with having people thinking about when to use + 1 vs - 1) is a lot of complexity just to enable this stringToCapabilityIDMap below. This is not even particularly "interesting" because the real solution would be to have stringer generate the mapping for us - so we wouldn't even need to implement CapabilityIDFromString ourselves.

I recommend you find a way forward that clarifies this and removes footguns, and see if one of the possible solutions would be to hand-code the full map for now and also make a plan to customize stringer in the future.

Talked about this offline - I will revert this for now and look at using a code generator to make the map and slice instead of using stringer + these min/max constants.

Though stringer already exists, it seems to be the wrong tool to use here if this calculation of min/max is too convoluted.

@ecwall ecwall requested a review from arulajmani March 15, 2023 15:24
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 14 files at r41, 2 of 12 files at r42, 1 of 1 files at r43.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)

@arulajmani
Copy link
Copy Markdown
Collaborator

I moved this function to tenantcapabilitiestestutils.AlteredCapabilitiesString and modified the test code to customize the output instead of relying on Update.String(), Entry.String().

Thanks for making the change.

ecwall added 4 commits March 15, 2023 12:23
Exclude capabilities from `tenant_builtins` logic test. These tests do not set
or depend on capabilities and they are tested elsewhere.

Release note: None
Change `watcher_test` to only output capabilities that were altered for the
test.

Release note: None
Fixes #98467

1) Add a new `tenantcapabilities.AdminRelocateRange` capability.
2) Check capability in `Authorizer.HasCapabilityForBatch` for
   `AdminChangeReplicas`, `AdminRelocateRange`, `AdminTransferLease`.
3) Remove unused `SkipSQLSystemTentantCheck`.

Release note: None
Ignore ranges that can change between test runs in result verification.

Release note: None
@ecwall
Copy link
Copy Markdown
Contributor Author

ecwall commented Mar 15, 2023

bors r=knz

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 15, 2023

Build failed (retrying...):

@knz
Copy link
Copy Markdown
Contributor

knz commented Mar 15, 2023

the function TestExperimentalRelocateNonVoters flaked in CI. If we're not 100% sure about it, i'd rather have it skipped before it causes CI failures for others.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 15, 2023

Build succeeded:

@craig craig bot merged commit 0f743af into cockroachdb:master Mar 15, 2023
craig bot pushed a commit that referenced this pull request Mar 17, 2023
98617: sql: fix data race in MemberOfWithAdminOption r=rafiss a=andyyang890

This patch fixes a race condition in MemberOfWithAdminOption
by using a fresh transaction within the singleflight call.

Fixes #95642
Fixes #96539

Release note: None

98801: multitenant: unskip multitenant_admin_function_test r=knz a=ecwall

Fixes #95201

This test was flaking because it was verifying exact range start keys
which are non-deterministic.

This was fixed as part of #98483 which allows any value for the range start key
and can now be unskipped.

Release note: None

Co-authored-by: Andy Yang <yang@cockroachlabs.com>
Co-authored-by: Evan Wall <wall@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

multitenant: add AdminRelocateRange capability

5 participants