-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvflowcontrol,admission: productionize replication admission control #98703
Copy link
Copy link
Closed
Labels
A-admission-controlC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV TeamKV Team
Description
Is your feature request related to a problem? Please describe.
Tracking issue to productionize #95563 and rolling it out into the wild (enabled by default, made safe-to-opt-into for production clusters):
- Merge #98308, which integrate various
kvflowcontrol,admissioncomponents end-to-end gated by cluster settings.-
Support a "flow token tracking only" mode where we do end-to-end flow control token tracking but don't actually block at admit time due to lack of requisite flow tokens. It'll let us look at production systems and understand that we are losing performance isolation due to a lack of write flow control. - Support a "flow tokens only for elastic traffic" mode, to use flow control only for elastic traffic (index backfills, etc).
-
Backport as disabled-entirely to 23.1 release branch.
-
- Add randomized/integration testing to verify we don't leak flow tokens, leakage that could result in complete write throughput collapse. We want to test all the interactions listed here, which include the raft transport stream breaking, nodes crashing, followers being paused/unpaused, caught up via snapshots or post-restart log appends, leaseholder/leadership changes, prolonged leaseholder != leader scenarios, replicas being GC-ed, command reproposals, lossy raft transport, ranges splitting/merging, log truncations, and raft membership changes.
- Add roachtest(s) to quantify the impact of index backfills with and without replication admission control, and make sure we don't regress.
- Enable "flow tokens for {regular,elastic} traffic" on 23.2 master.
- Monitor and address CI fallout for two-ish weeks on master.
Backport any bug fixes to 23.1 (where it's disabled by default). - Roll out the "flow tokens only for elastic traffic" to test-only/POC 23.1 clusters for actual clients, on CC or otherwise.
-
Roll out the "flow token tracking only" mode (described above) to 23.1 CC clusters.
Jira issue: CRDB-25455
Epic CRDB-25348
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-admission-controlC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV TeamKV Team