-
Notifications
You must be signed in to change notification settings - Fork 4.1k
admission: metrics and tests for evaluating improvements #85469
Description
Many of the improvements to admission control in the past have fixed an obvious gap e.g. lack of awareness of a bottleneck resource, incorrect accounting for how many resources are consumed etc.
As we evolve admission control, there are improvements that have the risk to cause regressions in peak throughput, isolation (both throughput and latency). Our current approach to run custom experiments for each PR and post results (graphs) with reproduction steps is not sufficient going forward.
We need a set of tests that setup particular cluster configurations, and one or multiple workloads. Most interesting behavior in admission control requires multiple workloads, of differing importance (e.g. regular sql reads/writes and bulk work of various kinds), or different tenants. With the introduction of elastic work, even a single workload can be affected in terms of throughput, since we may leave resources under-utilized to avoid risking higher latency for regular work. We also need summary metrics for the test run, so we don't have to examine graphs: e.g. throughput, throughput variance, latency percentiles etc. for each workload.
This is an umbrella issue that we can close after we've developed an initial set of tests that we think give reasonable coverage for the current admission control implementation.
Jira issue: CRDB-18257