admission: token bucket in kvStoreTokenGranter should be replenished every 1ms

`kvStoreTokenGranter` keeps track of tokens corresponding to "L0 bandwidth" (single value computed by `ioLoadListener` based on flush bandwidth into L0 and compaction bandwidth out of L0), and disk bandwidth (also computed by `ioLoadListener` and used for elastic work). The `ioLoadListener` computes tokens for a 15s interval and then doles them out to `kvStoreTokenGranter` at 250ms intervals. This 250ms interval is problematic in that it can result in high latency and prevents latency isolation (does not prevent throughput isolation).

As a simple example, consider a scenario where each request needs 1 byte token, and there are 1000 tokens added every 250ms. There is a uniform arrival rate of 2000 high priority requests/s, so 500 requests uniformly distributed over 250ms. And a uniform arrival rate of 10,000/s of low priority requests, so 2500 requests uniformly distributed over 250ms. There are more than enough tokens to fully satisfy the high priority tokens (they use only 50% of the tokens), but not enough for the low priority requests. Ignore the fact that the latter will result in indefinite queue growth in the admission control `WorkQueue`. At a particular 250ms tick, the token bucket will go from 0 tokens to 1000 tokens. Any queued high priority requests will be immediately granted their token, until there are no queued high priority requests. Then since there are always a large number of low priority requests waiting, they will be granted until 0 tokens remain. Now we have a 250ms duration until the next replenishment and 0 tokens, so any high priority requests arriving will have to wait. The maximum wait time is 250ms.

If replenishment was running at 1ms intervals, the maximum wait time would be 1ms, which is probably good enough for latency isolation even for transactions with many statements and BatchRequests (each of which could see that 1ms latency increase). There are two things to keep in mind when making this change:
- We cannot run 1ms ticks for unloaded systems: We have tried before for goroutine scheduler runnable monitoring -- see comment at https://github.com/cockroachdb/cockroach/blob/1b4aa43142cc105d59492f2e56ec99bb76a0ef33/pkg/util/goschedstats/runnable.go#L57-L68. A simple solution in our case would be run at the usual 250ms when there are unlimited tokens, and 1ms otherwise.
- The replenishment logic clamps the tokens at the increment value: see https://github.com/cockroachdb/cockroach/blob/1b4aa43142cc105d59492f2e56ec99bb76a0ef33/pkg/util/admission/granter.go#L546-L549 and https://github.com/cockroachdb/cockroach/blob/1b4aa43142cc105d59492f2e56ec99bb76a0ef33/pkg/util/admission/granter.go#L558-L560. This is to avoid accumulating unused tokens which would allow for a huge burst later (we do not want that). There are 2 risks with doing this with token replenishment at 1ms intervals.
  - The tokens given for a 1ms interval may not be enough to even admit a single request: This is not an issue in our implementation since the granter will hand out tokens as long as there are > 0 tokens, and will let the token count go negative.
  - Wasted tokens with bursty workloads: If the traffic has bursty behavior at time scales that are slightly larger than 1ms, say 10ms of no traffic and then 1ms of burst, then the tokens added during the 10ms of no traffic will be wasted, because of this clamping, and we will admit less. We should simply use the previous replenishment interval as the burst multiplier. That is, when adding `t` tokens at a 1ms interval, allow the total tokens to go up to `250*t`.


Jira issue: CRDB-21298

Epic CRDB-25469

	// We sample the number of runnable goroutines once per samplePeriodShort or
	// samplePeriodLong (if the system is underloaded). Using samplePeriodLong can
	// cause sluggish response to a load spike, from the perspective of
	// RunnableCountCallback implementers (admission control), so it is not ideal.
	// We support this behavior only because we have observed 5-10% of cpu
	// utilization on CockroachDB nodes that are doing no other work, even though
	// 1ms polling (samplePeriodShort) is extremely cheap. The cause may be a poor
	// interaction with processor idle state
	// https://github.com/golang/go/issues/30740#issuecomment-471634471. See
	// #66881.
	const samplePeriodShort = time.Millisecond
	const samplePeriodLong = 250 * time.Millisecond

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission: token bucket in kvStoreTokenGranter should be replenished every 1ms #91509

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if sg.availableIOTokens > tokens {
	// Clamp to tokens.
	sg.availableIOTokens = tokens
	}

	if sg.elasticDiskBWTokensAvailable > tokens {
	sg.elasticDiskBWTokensAvailable = tokens
	}

admission: token bucket in kvStoreTokenGranter should be replenished every 1ms #91509

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions