Skip to content

admission: CPU metrics for high concurrency scenarios #96495

@sumeerbhola

Description

@sumeerbhola

We have encountered scenarios with a large number of goroutines, which often causes an increase in the runnable goroutines, while the mean CPU utilization stays low (sometimes as low as 25%). Since there are non-zero runnable goroutines, at very short time scales of a few ms CPU utilization must be 100%. Since admission control (AC) samples the runnable goroutine count every 1ms, in order to react to such short time scales, we do see some drop in the slot count in some of these cases, and at the same time queueing in the AC queues. The concern that comes up when seeing such queueing is whether AC is making the situation worse in its attempt to shift some queueing from the goroutine scheduler into the AC queue. Note that since admission.kv_slot_adjuster.overload_threshold is set to 32, AC does allow for significant queuing in the goroutine scheduler too, in an attempt to be work conserving.

We try to answer two questions:
Q1. Should such scenarios be considered unreasonable and be fixed outside AC. There are 2 cases we have seen:

Q2. Given that these scenarios are sometimes reasonable, can we add metrics to answer the concern mentioned earlier regarding whether AC is making the situation worse.

The slot mechanism is imposing a max concurrency. If the max concurrency leaves some CPU idle, because enough of the admitted work is blocked (contention or IO), while we have work queued in AC, the AC queueing is not work conserving. We can try to sample this at 1ms intervals the way we sample numRunnableGoroutines.
If AC is indeed work conserving, AC queueing while the CPU "seems underutilized" is not happening, since the CPU is fully utilized when there is queueing in AC.

Jira issue: CRDB-24153

Metadata

Metadata

Assignees

Labels

A-admission-controlC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions