server: meta-issue to organize part of the work in CRDB-26691

This issue covers 4 related threads of work in epic CRDB-26691:

- improve the handling of cluster setting overrides
  - in particular we want to fix #96512
- making tenants aware of their capabilities, so they can advise SQL users for UX improvements
  - will be needed to make ALTER CONFIGURE ZONE capability-aware
- making tenants aware of their service mode, and make their servers self-abort when the service mode is cancelled.
  - issues  #93145 and  #83650
  - original draft PR in #96144 but we want a better approach
- inject SQL into tenants for config purposes and perhaps other use cases
  - this is a followup to the fix for issue #94856

### Architectural overview / tech strategy

We will choose a common, shared architecture for all 4 threads of work, which will reduce the amount of work overall by sharing a big chunk of technology between them:

1. on each KV node, tenant-specific configuration will be loaded up from system tables into a single in-memory data structure
   - each tenant ID mapped to its own config
   - this will integrate data from "settingswatcher" and "capwatcher" and will be extended with service mode and injected SQL
   - whenever we do not have a rangefeed approach yet, we will _prototype_ the solution using SQL polling with the expectation of rangefeeds to be added later (as an optimization)
2. we will use a **single** streaming RPC to propagate changes to the in-memory data structure to the  tenant connector on each tenant. We will extend `TenantSettings` and its result payload for this.
3. we will modify the connector startup logic to wait on the initial config before reporting that startup has completed
   (this will add a tiny extra startup latency to SQL pods, but we have sign off from the Serverless team on that -- it will not incur cross-region latency since the retrieved data is already in RAM in each KV node)

### Work threads

The following list is the *logical* description of work, but the work items are actually shared across each group of tasks (see summary of work at the bottom).

A. Group capabilities and setting overrides into a single data structure (this is an infra dependency for the work below)
   1. gather rangefeeds from system.tenant_settings and system.tenants to the same place
   2. link up the notification channel (already defined for setting overrides) to updates in system.tenants

B. Propagate capabilities to tenants
   1. Define RPC response payload
   2. Use notif channel (dependency on A.2)  in streaming RPC server handler
   3. Receive data in tenant connector, cache it there
   4. Define API for use by SQL
   5. **Objective** Notify team of availability of capabilities tenant-side and educate on API use

C. Propagate Service mode to tenants
  1. Decode service mode from system.tenants rangefeed conditionally (the column was added recently)
  2. Load up the value in the in-memory data structure (dependency on A.1)
  3. Define RPC response payload
  4. Use notif channel (dependency on A.2) in streaming RPC server handler 
  5. Receive data in tenant connector, cache it there
  6. Define API for use by SQL server initialization
  7. **Objective** rebase #96144 on top of this thread and get it merged

D. Propagate Injected SQL
   1. **Goal 1** Define basic data model for injected SQL using Go structs, merge initial "config profiles" PR for use by rest of team (this will de-risk  #94856)
   2. Add injected SQL fields to in-memory data structure (not populated yet)
   3. Define RPC response payload
   4. Use notif channel (dependency on A.2) in streaming RPC server handler 
   5. Receive data in tenant connector, cache it there
   6. Define system tables and hook up to data structure via polling loop, leave TODOs and file issue to add rangefeed later
   8. **Objective** rebase #98380 on top of this thread and get it merged

E. "Exec response" API - eventually needed for multiregion serverless and other purposes
   1. Add response fields to in-memory data structure
   2. Define new unary RPC for use by tenants
   3. Filter injected SQL upon load based on already-complete payloads
   4. Define a new system table to contain responses
   5. Write-through from the response cache to the system table
   6. Also poll from system table to refresh cache (leave TODO and file issue to add rangefeed later)
   7. **Objective** follow-up on #98380 to add a response callback upon job completion

### Actual work (execution)

~X1. Extract subset of #94856 that produces initial config for system tenant (D.1)~
X2. Define common data structure and notification mechanism (A.1, A.2, C.2, D.2, E.1)
~X3. Extend TenantSettings response payload to support more payload types (B.1, C.3, D.3)~
~X4. Extend TenantSettings RPC handler to propagate more data upon change notifications (B.2, C.4, D.4)~
~X5. Extend tenant connector to receive more data from TenantSettings (B.3, C.5, D.5)~
~X6. Advertise availability of caps tenant-side to team (B.5)~
X7. Define new system tables and poll from them to refresh in-memory cache (D.6, E.4, E.5, E.6)
~X8. Rebase #98380 to define injected SQL, close  #94856  and related RFC (D.8)~
X9. Rebase #96144, close #93145 and  #83650. -- **in progress**
X10. Implement ExecResponse API, write-through to system table and filter injected SQL (E.2, E.3)
X11. Followup to #98380  to ping back storage cluster via ExecResponse API on job completion (E.7)

Jira issue: CRDB-25263

Epic CRDB-26691

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: meta-issue to organize part of the work in CRDB-26691 #98431

Architectural overview / tech strategy

Work threads

Actual work (execution)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: meta-issue to organize part of the work in CRDB-26691 #98431

Description

Architectural overview / tech strategy

Work threads

Actual work (execution)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions