rfcs: new RFC on tenant capabilities#85954
Conversation
688631e to
e715cfc
Compare
bcd3007 to
e6ceebc
Compare
26affec to
3e62039
Compare
irfansharif
left a comment
There was a problem hiding this comment.
Should the ability to set certain zone config fields vs. not for a tenant's own data also be expressible as a "tenant capability"? (I'm thinking of https://cockroachlabs.atlassian.net/browse/CRDB-16706.)
| authoritative for the cluster-wide system tables*. | ||
|
|
||
| We will achieve this by introducing a new key in the system config | ||
| span, `SystemTenantID`. Its initial value will be 1. |
There was a problem hiding this comment.
How do we control which tenant is allowed to write to this key? Can there only be one?
knz
left a comment
There was a problem hiding this comment.
Should the ability to set certain zone config fields vs. not for a tenant's own data also be expressible as a "tenant capability"?
Yes, that might be the right move. Can you remind us why we put this restriction in place to start with?
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @dt, @irfansharif, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 315 at r1 (raw file):
How do we control which tenant is allowed to write to this key?
A capability :)
Clarified in text.
3e62039 to
e942601
Compare
There was a problem hiding this comment.
We disabled secondary tenant zone configs in 22.1 because we're using 22.1 binaries in serverless where tenants are relatively low-trust compared to other fully managed deployments. You can imagine a tenant declaring gc.ttlseconds to be arbitrarily high and accrue a ton of garbage, or set a replication factor proportional to the number of nodes in the underlying cluster, all without getting billed for it. #77344 is relevant; there are many modes through which a low-trust tenant could induce KV to do a lot of work (that issues outlines a few) so we defaulted to not allowing access to any of it. It's also not clear if serverless users will ever have access to the full gamut of zone configs (as opposed to some "fixed menu" of things). It's also not clear what things we'd like to allow in managed environments (part of why I'm asking whether they can be thought of as "capabilities" and entrusted to each cluster's operator).
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @dt, @knz, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 315 at r1 (raw file):
Previously, knz (kena) wrote…
How do we control which tenant is allowed to write to this key?
A capability :)
Clarified in text.
Could you clarify if there can only be one possible writer to this state at any given point in time? This section of the proposal is gives me pause. Not around who gets to write to this key and the defacto be responsible for "hosting" cluster state, but what exactly it means for "state to be transferred across tenants". Take system.span_configurations for example: if another system tenant (B) were to assume ownership of this state from a previous system tenant (A), and KV points to B's system.span_configurations, how does the state get transferred from what A had? The level of indirection to point KV to either A's or B's state for this table is only one part of the puzzle, for such transitions each system needs to be aware of this transition to reason about the invariants they're looking to provide. For span configs specifically, if all the SQL pods observe a change in state it previously knew of (it asks KV for the contents of system.span_configurations) due to an ownership transition (A -> B), things will fail -- as the SQL pods continually emit "diffs" over recent changes from a view of KV state it last knew of, and SQL state it's trying to move towards (look at spanconfig.Reconciler).
Introducing this level of indirection, and possible change in state (unless we guarantee that it doesn't happen -- are we proposing mechanisms for this?), forces subsystems to have reason about this. I'm not sure if other systems are affected in this same way.
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @dt, @irfansharif, @knz, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 315 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Could you clarify if there can only be one possible writer to this state at any given point in time? This section of the proposal is gives me pause. Not around who gets to write to this key and the defacto be responsible for "hosting" cluster state, but what exactly it means for "state to be transferred across tenants". Take
system.span_configurationsfor example: if another system tenant (B) were to assume ownership of this state from a previous system tenant (A), and KV points to B'ssystem.span_configurations, how does the state get transferred from what A had? The level of indirection to point KV to either A's or B's state for this table is only one part of the puzzle, for such transitions each system needs to be aware of this transition to reason about the invariants they're looking to provide. For span configs specifically, if all the SQL pods observe a change in state it previously knew of (it asks KV for the contents ofsystem.span_configurations) due to an ownership transition (A -> B), things will fail -- as the SQL pods continually emit "diffs" over recent changes from a view of KV state it last knew of, and SQL state it's trying to move towards (look at spanconfig.Reconciler).Introducing this level of indirection, and possible change in state (unless we guarantee that it doesn't happen -- are we proposing mechanisms for this?), forces subsystems to have reason about this. I'm not sure if other systems are affected in this same way.
These are all great questions, which do need an answer -- but not inside the scope of this RFC.
We have a separate project post-v23.1 to organize this transition (jira: CRDB-14551) and this will deserve its own RFC.
For now, I will propagate your questions here and on the jira epic, and say we're going to find answers at the next design round.
I believe we can introduce a capability model regardless of the resolution here, simply by stating that this particular capability will need special treatment later.
irfansharif
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @dt, @knz, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 315 at r1 (raw file):
Previously, knz (kena) wrote…
These are all great questions, which do need an answer -- but not inside the scope of this RFC.
We have a separate project post-v23.1 to organize this transition (jira: CRDB-14551) and this will deserve its own RFC.For now, I will propagate your questions here and on the jira epic, and say we're going to find answers at the next design round.
I believe we can introduce a capability model regardless of the resolution here, simply by stating that this particular capability will need special treatment later.
That makes sense. In the short term (23.1) it sounds like this is a particular capability ("hosts KV configuration state") that's effectively hardcoded to today's system tenant, even though we're introducing a framework to extend that to other tenants down the line (and before doing, we'd have to answer such questions). Is that a fair summary?
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @dt, @irfansharif, @knz, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 315 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
That makes sense. In the short term (23.1) it sounds like this is a particular capability ("hosts KV configuration state") that's effectively hardcoded to today's system tenant, even though we're introducing a framework to extend that to other tenants down the line (and before doing, we'd have to answer such questions). Is that a fair summary?
yes, this is a good summary. I'll reuse it in the text :)
andrewbaptist
left a comment
There was a problem hiding this comment.
I generally agree with this thinking from a short-term migration to a unified architecture. I worry about trying to understand the burden of implementing these changes vs the benefit. My understanding is the do-nothing alternative would allow current customers to do everything they need through the system tenant already. There is likely a lot I still don't understand about how this all works today and the barriers, but I would recommend starting with a more limited set of parameters for tenants that we know they will keep long term. It is much easier to expand capabilities rather than remove them.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @dt, @irfansharif, @knz, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 16 at r2 (raw file):
An example capability would be the ability of a tenant to manually relocate ranges to particular nodes. (Currently secondary tenants do
This specific example (relocate range) seems like a bad one long-term. Relocate range can "fight" the allocator if done incorrectly, and has a high chance of causing a lot of internal work at potentially no cost (not sure how this would be billed) especially if a tenant moves ranges and then they later move back. I'm not sure this is a capability that we should ever expose to tenants. In a perfect future world, we would deprecate relocate range completely (for all cockroach users) and instead provide the capability through range policies that are also taken into account by the allocator.
docs/RFCS/20220810_tenant_capabilities.md line 92 at r2 (raw file):
- the ability to query cluster-wide timeseries, such as node-node latencies.
Should this be done through cockroach directly long term? It seems like providing a "monitoring" service (possibly grafana) for tenants would make more sense.
docs/RFCS/20220810_tenant_capabilities.md line 94 at r2 (raw file):
latencies. - configuring certain cluster setting that have cluster-wide impact, such as the KV rebalance rate.
This is a setting that we hope to deprecate in the future. Ideally by running "as fast as possible without impacting foreground latency". There likely are other settings that will always matter here, so this is still valid. Short term I fully agree we need this, however I'm not sure that we need to expose this to tenants.
docs/RFCS/20220810_tenant_capabilities.md line 96 at r2 (raw file):
such as the KV rebalance rate. - manually placing replicas/leaseholders on specific nodes via the ALTER RANGE statement.
Again something that we would like to deprecate longer term if possible, but short term we need.
docs/RFCS/20220810_tenant_capabilities.md line 195 at r2 (raw file):
| `RANGE_MOVE(RangeSpec...)` | Relocate or scatter replicas or leaseholders for the specified ranges to specific nodes (see above for range spec. A tenant's own ranges are not implicitly included.) | | `RANGE_SPLIT(RangeSpec...)` | Manual split of specified ranges (see above for range spec. A tenant's own ranges are not implicitly included.) | | `SET_SYSTEM_CLUSTER_SETTING(Settings...)` | Set the specified SystemOnly cluster settings. The capability may include a range of allowed values for each setting. |
This one in particular is problematic due to its power. There are settings that can bork a cluster if set incorrectly. However, determining valid ranges for each setting is a Herculean effort, that will almost certainly end up being too restrictive. We could take the approach that tenants have restrictions and admins don't, but then we might force the use of the system tenant more than we would like.
knz
left a comment
There was a problem hiding this comment.
Thanks for the initial review.
My understanding is the do-nothing alternative would allow current customers to do everything they need through the system tenant already
-
This is indeed the case, and we will recommend they do that as the default situation.
However, we also want to bring our self-hosted customers into the multi-tenancy world, and we'd like to avoid kicking and screaming, so we also want to give them the tools to get "advanced tenants".
Also there's another angle to look at this, outlined in the section "Advanced security: smaller-scope SRE access": we can use secondary tenants with one-off capabilities to create a sandbox for SREs to do one operation without giving them capabilities to do other things. -
don't forget the difference between CC Dedicated and CC Serverless. In Dedicated, we want to let customers utilize their cluster the way they want, without resource constraints (they get to decide). For those, it's pretty important to let them do certain things they may have legitimate use for, such as range pre-splits and eliminate hotspots by scattering leases of pre-split ranges ahead of time.
I would recommend starting with a more limited set of parameters for tenants that we know they will keep long term
The list in the RFC is limited to the capabilities we already know we'll need to get to feature parity on CC Dedicated.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @dhartunian, @dt, @irfansharif, @knz, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 16 at r2 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
This specific example (relocate range) seems like a bad one long-term. Relocate range can "fight" the allocator if done incorrectly, and has a high chance of causing a lot of internal work at potentially no cost (not sure how this would be billed) especially if a tenant moves ranges and then they later move back. I'm not sure this is a capability that we should ever expose to tenants. In a perfect future world, we would deprecate relocate range completely (for all cockroach users) and instead provide the capability through range policies that are also taken into account by the allocator.
See my comment at top.
We can always deprecate features when we offer something better.
docs/RFCS/20220810_tenant_capabilities.md line 92 at r2 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Should this be done through cockroach directly long term? It seems like providing a "monitoring" service (possibly grafana) for tenants would make more sense.
Correct. But we are building the multi-tenancy roadmap without dependencies on other projects that have "loose" timelines, so that we can be moderately certain we can do something within a cycle.
While we're in the process of developing a new monitoring service, its delivery (especially to self-hosted customers) is still uncertain, and in the meantime we need a solution for DB Console.
docs/RFCS/20220810_tenant_capabilities.md line 94 at r2 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
This is a setting that we hope to deprecate in the future. Ideally by running "as fast as possible without impacting foreground latency". There likely are other settings that will always matter here, so this is still valid. Short term I fully agree we need this, however I'm not sure that we need to expose this to tenants.
Let's have the mechanism, and let deployments decide whether they use it or not. They don't have to.
docs/RFCS/20220810_tenant_capabilities.md line 195 at r2 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
This one in particular is problematic due to its power. There are settings that can bork a cluster if set incorrectly. However, determining valid ranges for each setting is a Herculean effort, that will almost certainly end up being too restrictive. We could take the approach that tenants have restrictions and admins don't, but then we might force the use of the system tenant more than we would like.
This is opt-in. If not used, there's no way to customize the settings.
e942601 to
5c1733f
Compare
5c1733f to
b4c5388
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @dhartunian, @dt, @irfansharif, @knz, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 200 at r3 (raw file):
| `GET_SYSTEM_ZONE_CONFIG(RangeSpec...)` | Get the zone configs for non-tenant keyspaces (range type can be "meta", "timeseries", "liveness", etc. Refers to `SHOW ZONE CONFIGURATION FOR RANGE ...`). | | `SET_SYSTEM_ZONE_CONFIG(RangeSpec...)` | Set the zone configs for non-tenant keyspaces (see above, refers to `ALTER RANGE ... SET ZONE CONFIGURATION`). | | `GET_TENANT_ZONE_CONFIG(TenantSpec)` | Get zone configs for other tenants than self (can be specific tenant names/IDs or "all"). |
These two capabilities feel different than the others. They're the only ones that deal with visibility into other tenants besides the current tenant. That is different than a capability that either 1) grants privileged access to a tenant's own keyspace (e.g. range splits), or 2) grants privileged access to a shared piece of infrastructure (e.g. timeseries access, liveness access, cluster-wide backup). As a result, they directly address tenancy.
Are these needed for the transparent shift from non-multitenant to single-tenant unified architecture?
docs/RFCS/20220810_tenant_capabilities.md line 321 at r3 (raw file):
in the system config span
The "system config span" refers to the first 10 SQL tables in the system tenant. If we're using this SystemTenantID key to locate the system tenant then we have a bootstrapping problem, so this will need to be stored outside of SQL.
knz
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @dhartunian, @dt, @irfansharif, @knz, @nvanbenschoten, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 200 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
These two capabilities feel different than the others. They're the only ones that deal with visibility into other tenants besides the current tenant. That is different than a capability that either 1) grants privileged access to a tenant's own keyspace (e.g. range splits), or 2) grants privileged access to a shared piece of infrastructure (e.g. timeseries access, liveness access, cluster-wide backup). As a result, they directly address tenancy.
Are these needed for the transparent shift from non-multitenant to single-tenant unified architecture?
At the time of this writing, there was a jira epic from the serverless team requesting the ability to set an arbitrary tenant's zcfgs from the system tenant. However, it's not there any more (apparently there's another way). So we can remove this capability from the proposal.
docs/RFCS/20220810_tenant_capabilities.md line 321 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
in the system config span
The "system config span" refers to the first 10 SQL tables in the system tenant. If we're using this
SystemTenantIDkey to locate the system tenant then we have a bootstrapping problem, so this will need to be stored outside of SQL.
Sorry apparently I don't have the right terminology.
How do you call the span that includes /system/nodeliveness and /system/range-idgen?
that's where we need this to go.
b4c5388 to
d6f3b27
Compare
irfansharif
left a comment
There was a problem hiding this comment.
I've had a re-read and this structure LGTM. It would be worth spelling out exactly how revocation takes effect -- we write some new proto to system.tenants with updated info, but how do stores know about it? How do they learn about updates generally -- rangefeed on that table? What guarantees are we looking to provide? But those details don't need to be fully hashed out now.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @dhartunian, @dt, @irfansharif, @knz, @nvanbenschoten, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 200 at r3 (raw file):
there was a jira epic from the serverless team requesting the ability to set an arbitrary tenant's zcfgs from the system tenant.
I believe this was #85179, which is done.
|
|
||
| We will introduce tenant capabilities via a new field in the `info` | ||
| column of `system.tenants`. For this we will use richer data | ||
| structures than simple booleans. |
There was a problem hiding this comment.
Should we add a new table as opposed to a field?
| This configuration variable needs to be a system config key, and not a | ||
| cluster setting, because we need its value before we can set up the | ||
| cluster settings subsystem. |
There was a problem hiding this comment.
What is a "system config key" and what do you mean "before we can set up the cluster settings subsytem?
| user's ability to write to the *current tenant's* `system.tenants` | ||
| table. See next section. | ||
|
|
||
| ### Deciding which tenant is responsible for hosting cluster configuration |
There was a problem hiding this comment.
It's not clear to me why this RFC is engaging with the problem posed in this seciton. Can you spell out why we need to couple this work to separate the "administrative tenant configuration" with any of the other work?
| capability system is in place, we can have different teams consider | ||
| the capabilities they need separately from each other. | ||
|
|
||
| ## Mechanisms |
There was a problem hiding this comment.
One detail I don't see is how tenant processes observe their own capabilities. It's assumed below but there are no specifics.
knz
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andrewbaptist, @dhartunian, @dt, @irfansharif, @nvanbenschoten, and @stevendanna)
docs/RFCS/20220810_tenant_capabilities.md line 178 at r4 (raw file):
Previously, ajwerner wrote…
Should we add a new table as opposed to a field?
You tell me. What is your thinking here?
docs/RFCS/20220810_tenant_capabilities.md line 212 at r4 (raw file):
Previously, ajwerner wrote…
One detail I don't see is how tenant processes observe their own capabilities. It's assumed below but there are no specifics.
Correct at the time of this writing I wasn't sure this would be required. We've found out it would. It would get a new RPC on the tenant connector.
docs/RFCS/20220810_tenant_capabilities.md line 302 at r4 (raw file):
Previously, ajwerner wrote…
It's not clear to me why this RFC is engaging with the problem posed in this seciton. Can you spell out why we need to couple this work to separate the "administrative tenant configuration" with any of the other work?
I was hoping to outline a future where all traffic goes through the capability system, including the definition of which tenant serves the role of system tenant. In that case, there's a chicken-and-egg problem to solve. But this work is premature and does not need to be covered here. I'll remove it.
docs/RFCS/20220810_tenant_capabilities.md line 328 at r4 (raw file):
Previously, ajwerner wrote…
What is a "system config key" and what do you mean "before we can set up the cluster settings subsytem?
system config key = where we store meta0 and the bootstrap keys etc.
"before we set up cluster settings" - because at this point we don't even know which tenant keyspace to pull the cluster settings from.
But again this consideration belongs to a future where the choice of tenant that plays the role of "system tenant" is configurable, and so maybe it should be a different RFC.
d6f3b27 to
5f9238b
Compare
knz
left a comment
There was a problem hiding this comment.
I have updated the RFC to reflect the current implementation. In particular I have removed the bit that talks about routing the system administration responsibility to another tenant than the system tenant, as this will be covered by later work.
how do stores know about it?
We've added a cache refreshed via a rangefeed. Mentioned in RFC.
What guarantees are we looking to provide?
Eventual consistency.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andrewbaptist, @dhartunian, @dt, @irfansharif, @nvanbenschoten, and @stevendanna)
5f9238b to
2d62b67
Compare
Release note: None
2d62b67 to
5ebb1bc
Compare
|
bors r=arulajmani |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build succeeded: |
RFC text here: https://github.com/knz/cockroach/blob/20220810-rfc-tenant-capabilities/docs/RFCS/20220810_tenant_capabilities.md
Release note: None
Epic: CRDB-18503