-
Notifications
You must be signed in to change notification settings - Fork 4.1k
admission: ioLoadListener compaction token calculation is too abrupt #91519
Description
ioLoadListener calculates multiple types of tokens, one of which is based on compaction bandwidth out of L0. Compaction bandwidth capacity out of L0 is hard to predict.
- Pebble may not be using all the compaction concurrency available. And if Pebble were to use all the compaction concurrency (which itself may be variable in the future), it is hard to know how much more will be given to L0, since there is sophisticated scoring happening in the level compaction decision making. Note, this is unlike flushes where we do have a dedicated concurrency of 1 and do make predictions based on idle time.
- Related to the scoring, the allocation of compaction capacity to L0 can vary.
For these reasons we have used a measurement based approach with exponential smoothing, where the measurements are taken only when we know there is some backlog, so all compactions ought to be running. At a high level I think we can continue with this approach. The problem is that we have abrupt behavior:
above an unhealthy threshold (actually a predicate defined by a disjunction sublevel-count > L or file-count > F), we use compaction bandwidth (C), to allocate C/2 tokens. Below the unhealthy threshold, the token count is infinity.
This results in bursty admission behavior where we go over the threshold, restrict tokens for a few intervals (each interval is 15s long), and then go below the threshold and have unlimited tokens and admit everything, which again puts us above the threshold. It is typical to see something like 2-3 intervals about the threshold anfd then 1 interval below. This is bad but the badness is somewhat restricted because (a) the admitted requests have to evaluate which steals time away from the admitting logic, (b) our typical workloads don't have huge concurrency so the waiting requests are limited by this concurrency.
With replication admission control we will make this worse by doing logical admission of all the waiting requests when we switch from above the threshold to below, causing another big fan-in burst (https://docs.google.com/document/d/1iCfSlTO0P6nvoGC6sLGB5YqMOcO047CMpREgG_NSLCw/edit#heading=h.sw7pci2vwkk3).
Instead we should switch to a piece-wise linear function for defining the tokens. Let us define a sub-level count threshold L and a file-count threshold F that we would like to be roughly stable at under overload. Say L=10 and F=500. These are half the current defaults of 20 and 1000 since (a) the current thresholds are higher than what we would like to sustain at, (b) we will keep the current C/2 logic at 2L and 2F. Regardless, L and F are configurable.
Then we define a score = max(sublevel-count/L, file-count/F). The compaction token function is:
- score < 1 : unlimited
- score in [1, 2): tokens = -C/2 x score + 3C/2
This means C tokens when score=1, and will linearly decrease to C/2 tokens when the score is 2. - score >= 2: tokens = C/2
Jira issue: CRDB-21299
Epic CRDB-25469