-
Notifications
You must be signed in to change notification settings - Fork 4.1k
admission: add integration tests #89208
Copy link
Copy link
Open
Labels
A-admission-controlC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-admission-controlAdmission ControlAdmission Control
Description
This is a tracking issue for roachtests we want to introduce to validate existing/new AC machinery (subsuming #85469 which I forgot existed). They'll typically codify manual experiments that have been useful in developing said machinery. These roachtests should demonstrate performance isolation (throughput, latency) in the face of:
- Snapshots (roachtests: introduce admission-control/snapshot-overload #89191 is a first pass; the general solution space is tracked in admission,kvserver: subject snapshot ingestion to admission control #80607)
- Backups (something admission,kvserver: introduce an elastic cpu limiter #86638 supposedly helps with)
- Rangefeed catch up scans can be CPU intensive and aren't accounted for in AC (roachtest: introduce admission-control/elastic-cdc #89656, integrated in kv,rangefeed: integrate catchup scans with elastic cpu #89709)
- Multiple tenants running multiple workloads (merging multitenant: re-enable admission control fairness tests #89721 to start off)
Next:
- Index backfills (admission: roachtest-ify index/column backfills impacting foreground traffic #83826; admission: investigate TPC-E online index creation problem #85641; roachtest: Index overload automation of TPC-E #90005)
- Equivalents for CREATE INDEX, DROP INDEX.
- ttl,admission: reduce performance impact of large row-level TTL jobs #98722. Large TTL job running (https://github.com/cockroachlabs/support/issues/1961, https://cockroachdb.zendesk.com/agent/tickets/15684, https://github.com/cockroachlabs/support/issues/1628, https://github.com/cockroachlabs/support/issues/2050)
-
INSERT INTO newtable SELECT * FROM oldtablewhere oldtable is ~1.5TB. In https://github.com/cockroachlabs/support/issues/2102 we saw the large insert cause a lot of open write intents, and intent resolution is not subject to AC. We saw this peg CPU at 100% and cause IO token exhaustion. See admission: make intent resolution subject to admission control #97108. (https://github.com/cockroachlabs/support/issues/2102, https://github.com/cockroachlabs/support/issues/2237, https://github.com/cockroachlabs/support/issues/2249, https://github.com/cockroachlabs/support/issues/2240) - Large volume MVCC GC work (copied from kvserver: pacing/admission control for mvcc gc #82955). https://github.com/cockroachlabs/support/issues/2263.
- kvflowcontrol,admission: use flow control during raft log catchup post node-restart #98710. Nodes restarting observe a large volume of follower writes as part of raft log catchup, taking IO tokens away from leaseholder writes on the restarted node. https://github.com/cockroachlabs/support/issues/2287, https://github.com/cockroachlabs/support/issues/2304, https://github.com/cockroachlabs/support/issues/1980. Test repro: kv: Test to measure slowdown after a node restart #95161.
Later:
- Large restores which can saturate disk bandwidth entirely, starving out foreground read/write traffic
- Primary key change (https://github.com/cockroachlabs/support/issues/2318)
- Drop a large index (reproducing https://github.com/cockroachlabs/support/issues/1927). Test repro: roachtest: add admission-control/database-drop #104051.
- [DNM] tests: add qos roachtest #80112
- Large volume of follower writes (merging roachtest: add admission/follower-overload #81516, https://cockroachdb.zendesk.com/agent/tickets/16901)
- cdc,roachtest: add test with changefeeds over a large number of ranges #95236 (https://github.com/cockroachlabs/support/issues/2036)
- Investigate changefeed restarts (https://github.com/cockroachlabs/support/issues/2034)
- Workload with very high concurrency, to understand slot adjustment behavior (kvserver,cdc: high rangefeed count leads to scheduler overload #96395 (comment), looking at metrics added in admission: CPU metrics for high concurrency scenarios #96495).
- Large
IMPORTs can cause IO token exhaustion + LSM inversion, and delays of 10+s in admission wait queues (https://github.com/cockroachlabs/support/issues/2156). This might be related to admission: lack of intra-tenant prioritization for IO work #95678. - Integrate the SQL Stats Compaction job with elastic CPU control
- admission: clearrange test induces IO overload starving out foreground traffic #104862
For some of these, we'll want variants that hit CPU and IO saturation separately. We would also like a multi-workload test with varying priorities, or originating from different tenants (e.g. NormalPri reads/writes and BulkNormalPri work from another tenant). We also want library functions in roachtests to better experimentation/tests: #89978.
Jira issue: CRDB-20126
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-admission-controlC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-admission-controlAdmission ControlAdmission Control