Skip to content

server: meta-issue to organize part of the work in CRDB-26691 #98431

@knz

Description

@knz

This issue covers 4 related threads of work in epic CRDB-26691:

Architectural overview / tech strategy

We will choose a common, shared architecture for all 4 threads of work, which will reduce the amount of work overall by sharing a big chunk of technology between them:

  1. on each KV node, tenant-specific configuration will be loaded up from system tables into a single in-memory data structure
    • each tenant ID mapped to its own config
    • this will integrate data from "settingswatcher" and "capwatcher" and will be extended with service mode and injected SQL
    • whenever we do not have a rangefeed approach yet, we will prototype the solution using SQL polling with the expectation of rangefeeds to be added later (as an optimization)
  2. we will use a single streaming RPC to propagate changes to the in-memory data structure to the tenant connector on each tenant. We will extend TenantSettings and its result payload for this.
  3. we will modify the connector startup logic to wait on the initial config before reporting that startup has completed
    (this will add a tiny extra startup latency to SQL pods, but we have sign off from the Serverless team on that -- it will not incur cross-region latency since the retrieved data is already in RAM in each KV node)

Work threads

The following list is the logical description of work, but the work items are actually shared across each group of tasks (see summary of work at the bottom).

A. Group capabilities and setting overrides into a single data structure (this is an infra dependency for the work below)

  1. gather rangefeeds from system.tenant_settings and system.tenants to the same place
  2. link up the notification channel (already defined for setting overrides) to updates in system.tenants

B. Propagate capabilities to tenants

  1. Define RPC response payload
  2. Use notif channel (dependency on A.2) in streaming RPC server handler
  3. Receive data in tenant connector, cache it there
  4. Define API for use by SQL
  5. Objective Notify team of availability of capabilities tenant-side and educate on API use

C. Propagate Service mode to tenants

  1. Decode service mode from system.tenants rangefeed conditionally (the column was added recently)
  2. Load up the value in the in-memory data structure (dependency on A.1)
  3. Define RPC response payload
  4. Use notif channel (dependency on A.2) in streaming RPC server handler
  5. Receive data in tenant connector, cache it there
  6. Define API for use by SQL server initialization
  7. Objective rebase server: honor and validate the service mode for SQL pods #96144 on top of this thread and get it merged

D. Propagate Injected SQL

  1. Goal 1 Define basic data model for injected SQL using Go structs, merge initial "config profiles" PR for use by rest of team (this will de-risk server: separate config defaults for serverless vs dedicated/SH secondary tenants #94856)
  2. Add injected SQL fields to in-memory data structure (not populated yet)
  3. Define RPC response payload
  4. Use notif channel (dependency on A.2) in streaming RPC server handler
  5. Receive data in tenant connector, cache it there
  6. Define system tables and hook up to data structure via polling loop, leave TODOs and file issue to add rangefeed later
  7. Objective rebase cli,server: initial cluster configuration via job injection #98380 on top of this thread and get it merged

E. "Exec response" API - eventually needed for multiregion serverless and other purposes

  1. Add response fields to in-memory data structure
  2. Define new unary RPC for use by tenants
  3. Filter injected SQL upon load based on already-complete payloads
  4. Define a new system table to contain responses
  5. Write-through from the response cache to the system table
  6. Also poll from system table to refresh cache (leave TODO and file issue to add rangefeed later)
  7. Objective follow-up on cli,server: initial cluster configuration via job injection #98380 to add a response callback upon job completion

Actual work (execution)

X1. Extract subset of #94856 that produces initial config for system tenant (D.1)
X2. Define common data structure and notification mechanism (A.1, A.2, C.2, D.2, E.1)
X3. Extend TenantSettings response payload to support more payload types (B.1, C.3, D.3)
X4. Extend TenantSettings RPC handler to propagate more data upon change notifications (B.2, C.4, D.4)
X5. Extend tenant connector to receive more data from TenantSettings (B.3, C.5, D.5)
X6. Advertise availability of caps tenant-side to team (B.5)
X7. Define new system tables and poll from them to refresh in-memory cache (D.6, E.4, E.5, E.6)
X8. Rebase #98380 to define injected SQL, close #94856 and related RFC (D.8)
X9. Rebase #96144, close #93145 and #83650. -- in progress
X10. Implement ExecResponse API, write-through to system table and filter injected SQL (E.2, E.3)
X11. Followup to #98380 to ping back storage cluster via ExecResponse API on job completion (E.7)

Jira issue: CRDB-25263

Epic CRDB-26691

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions