-
Notifications
You must be signed in to change notification settings - Fork 4.1k
server: allow users to run conformance reports for their schemas #100004
Description
Is your feature request related to a problem? Please describe.
Background
Users can configure various properties, such as quorum size, number of non-voters, data placement etc. on schema objects. They can either do so directly, via zone configurations, or indirectly using multi-region abstractions (which are internally translated to zone configurations). However, the effects of such changes are asynchronous.
The zone configurations table lives in a tenant's keyspace. At a high level, once a zone configuration is committed, it must be converted to a SpanConfig and reconciled to KV (where it lives in the system.span_configurations table). Doing so entails hydrating the zone configuration (by walking up its inheritance chain) to convert it to a SpanConfig . This SpanConfig is then linked with the keyspan associated with the schema object, and persisted in KV using an RPC.
All KV nodes maintain an in-memory, incremental view over system.span_configurations. Once a KV node receives a SpanConfig update, ranges that overlap with the update's keyspans are pushed through various queues (e.g. Split, Merge, Replicate). It's these queues that are responsible for taking action to fulfill what user intention.
SpanConfigBounds
Until very recently, the async application of user specified configurations was only a matter of time. This changed with the introduction of SpanConfigBounds. SpanConfigBounds were motivated by a desire to disallow secondary tenants unfettered access to multi-region features (or zone configurations) in deployments where operators desire such control (read: serverless).
SpanConfigBounds allow operators the ability to declare bounds on (almost) all SpanConfig fields at a per-tenant level. These only work for secondary tenants. Operators can use SpanConfigBounds to override tenant reconciled span configurations by "clamping" any or all fields. For example, operators are able to do things like constrain a tenant to specific region(s) regardless of what the tenant requested. They can also do so retro-actively, after a tenant has successfully committed and reconciled such configurations.
Describe the solution you'd like
Arguably, users care more about when their data is in conformance, as opposed to to a promise that it eventually will be. With the introduction of SpanConfigBounds, tenants no longer have the latter either. This elevates the need to make point-in-time conformance easily observable.
Conveniently, we have a lot of pieces to provide such conformance reports already built. We just need to possibly enhance it, stitch things together, and provide a mechanism to consume such information. This issue asks to do exactly that.
Specifically, users should be able to run conformance reports that gives them information about which table(s)/index(es) are in violation of their zone configurations. There should also be 2 variations -- one that takes SpanConfigBounds in account and another that doesn't. This will allow users to discriminate between cases where all they need to do is "just wait" and cases that will never be satisfied.
High level sketch
We already have a conformance reporter. However, it doesn't give the caller a point-in-time snapshot -- this may need to change if we want stronger guarantees when stitching the report back with SQL state.
Note: this Reporter is not to be confused by the other Reporter in kvserver/reports/reporter.go. This latter construct is older, deprecated and we don't see too much future into it any more. It should be removed. (#100180)
The Reporter also doesn't know about SpanConfigBounds yet. We should extend it to return a list of SpanConfigs that fail bound checks in its response. This might simply be about giving the reporter a handle to the BoundsReader to get it access to a tenant's Bounds and calling Check() on it.
The tenant (SQL) is the only thing that has access to both:
- What timestamp its reconciled up till.
- How keyspans map back to schema objects (a reverse translation of sorts)
As such, the tenant would be responsible for taking the contents of a SpanConfigConformanceReport (which only associates raw keys to a conformance status) and mapping it back to which tables/indexes are in violation (if any).
I'm not sure what the best way to consume such information is -- maybe a new endpoint users can query? Or, better yet, we can build some sort of DB console page using? Alternatively, we could run such a thing periodically and maybe increment some metrics.
Jira issue: CRDB-26176
Epic CRDB-26686
As part of addressing this, we should make sure to delete FIXMEIDONTKNOWWHICHCODECTOUSE usage (introduced as part of #48123).