-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql,*: make some or all system tables LOCALITY GLOBAL #63365
Description
Is your feature request related to a problem? Please describe.
This idea has come up a few times recently and it seems worthwhile to centralize the discussion somewhere. Most recently #36160 (comment). It occurs to me that several other big problems towards which we've considered investing considerable engineering efforts could also be mitigated or solved.
Today's virtual tables are powered by an in-memory cache of all descriptors. The latency requirements to evict from such a cache makes it infeasible. If the data were local and low-latency, then it's plausible to implement these tables in a streaming fashion. This memory overhead today has not been much of a concern given other bottlenecks which generally make creating a schema of a problematic size unlikely (#63206).
Another consideration which only just occurred to me is the commit-to-emit latency of CHANGEFEEDs. The dominant source of latency in CHANGEFEED is waiting to "prove" the schema for a row (#36289). In the past we have explored leasing protocols by which changefeeds might coordinate with / hold off schema changes and thus be free to emit rows so long as they have a lease. This approach was demonstrated to work and is, on some level, viable. However, it's far from trivial and would even further complicate transactional schema changes. If the system.descriptor table were a global table, a resolved timestamp corresponding to the present could be emitted to each node hosting a CHANGEFEED around the time that rows are written. This does reveal another interesting problem that CHANGEFEEDs are going to need to deal with is that rows committed in the future due to being part of a transaction touching a global table are likely to block rows due to non-global tables. That can be mitigated using some buffering.
Describe the solution you'd like
The table I'm most interested in making global is system.descriptor. This, today, would mean making the whole system config span global. It seems plausible one day to break up that whole concept once we have a new zone configuration architecture.
One thing I haven't thought through is what happens to the system.lease table and its relevant protocols if the leasing transaction needs to interact with writes which carried synthetic timestamps.
Jira issue: CRDB-6547
Epic CRDB-33032