-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: limit the number of kv spans configurable by a tenant #70555
Description
#66348 outlined a scheme to support zone configs for secondary tenants. Since the unit of what we can configure in KV is a Range, tenants now being able to set zone configs grants them the ability to induce an arbitrary number of splits in KV -- something we want to put guardrails against (see RFC for more details).
There's some discussion elsewhere for how we could achieve this, copying over one in particular:
We could have each pod lease out a local quota, counting against the tenant's global limit. For N pods of a tenant, and a limit L, we could either lease out partial quotas L/N, or have a single pod lease out L entirely and require all DDLs/zcfgs to through that one pod. The scheme below describes the general case of each pod getting some fractional share of the total quota.
- When each pod considers a DDL/zcfg txn, before committing it will attempt to acquire a quota from its local leased limit. The quota acquired is an over-estimate of how much the txn will cost (in terms of additional splits induced); we cannot under-estimate as KV could then later reject the committed zcfg.
- If there's insufficient quota, we'll abort the txn and error out accordingly.
- If there's sufficient quota, we'll commit the txn and rely on the eventual reconciliation to reliably succeed. We'll queue up a work item for the for the reconciliation job, marking alongside it the lease expiration of the quota it was accepted with.
- When the local pool runs out of quota and has to request/lease more, it needs to ensure that all the txns that have been committed have been reconciled. We don't want it to be the case that a quota is leased, allocated to an operation in the work queue, it stays in the work queue past its lease expiration date, and we're able to lease the "same" quota again and queue up a second work item that will now get rejected. We need to make sure that when we lease a limit at t, there are no unprocessed work items with leased tokens expiring before t.
- Whenever the work item is processed, the amount we over-estimated by could either be returned to the local quota pool (with the same expiration ts), or we could simply drop it entirely and rely on the local quota pool to eventually re-acquire it from the global pool.
The RFC also captures a possible schema we could use on the host tenant to store these limits:
CREATE TABLE system.spans_configurations_per_tenant (
tenant_id INT,
max_span_configurations INT,
)The actual RPCs/interfaces are TBD. Probably we also want the ability to configure these limits on a per-tenant basis.
Epic CRDB-10563
Jira issue: CRDB-10116