Skip to content

admission: metrics and tests for evaluating improvements #85469

@sumeerbhola

Description

@sumeerbhola

Many of the improvements to admission control in the past have fixed an obvious gap e.g. lack of awareness of a bottleneck resource, incorrect accounting for how many resources are consumed etc.
As we evolve admission control, there are improvements that have the risk to cause regressions in peak throughput, isolation (both throughput and latency). Our current approach to run custom experiments for each PR and post results (graphs) with reproduction steps is not sufficient going forward.

We need a set of tests that setup particular cluster configurations, and one or multiple workloads. Most interesting behavior in admission control requires multiple workloads, of differing importance (e.g. regular sql reads/writes and bulk work of various kinds), or different tenants. With the introduction of elastic work, even a single workload can be affected in terms of throughput, since we may leave resources under-utilized to avoid risking higher latency for regular work. We also need summary metrics for the test run, so we don't have to examine graphs: e.g. throughput, throughput variance, latency percentiles etc. for each workload.

This is an umbrella issue that we can close after we've developed an initial set of tests that we think give reasonable coverage for the current admission control implementation.

@irfansharif @tbg

Jira issue: CRDB-18257

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-admission-controlC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions