admission: metrics and tests for evaluating improvements

Many of the improvements to admission control in the past have fixed an obvious gap e.g. lack of awareness of a bottleneck resource, incorrect accounting for how many resources are consumed etc.
As we evolve admission control, there are improvements that have the risk to cause regressions in peak throughput, isolation (both throughput and latency). Our current approach to run custom experiments for each PR and post results (graphs) with reproduction steps is not sufficient going forward.

We need a set of tests that setup particular cluster configurations, and one or multiple workloads. Most interesting behavior in admission control requires multiple workloads, of differing importance (e.g. regular sql reads/writes and bulk work of various kinds), or different tenants. With the introduction of elastic work, even a single workload can be affected in terms of throughput, since we may leave resources under-utilized to avoid risking higher latency for regular work. We also need summary metrics for the test run, so we don't have to examine graphs: e.g. throughput, throughput variance, latency percentiles etc. for each workload.

This is an umbrella issue that we can close after we've developed an initial set of tests that we think give reasonable coverage for the current admission control implementation.

@irfansharif @tbg 

Jira issue: CRDB-18257

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission: metrics and tests for evaluating improvements #85469

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

admission: metrics and tests for evaluating improvements #85469

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions