Skip to content

admission: byte tokens for store admission #79092

@sumeerbhola

Description

@sumeerbhola

Admission control is not aware of the bytes that will be written by each work item. It does compute tokens based on the bytes being written to L0 and compacted out of L0, but turns those into a token-per-work-item by using the mean bytes written per work. Estimates are fine for small writes but proper accounting for larger writes (like AddSSTable, and in the future for range snapshot application) is preferable, since it avoids spikes of over-admission, later compensated by under-admission.
Also, we should be properly accounting for how many of the ingested bytes were added to L0.

We plan to:

  • Enhance admission control logic to compute byte tokens for store writes. Those requests that provide their byte size (which should be all large requests) will consume these tokens. Estimates will be used for two purposes (a) requests that don't provide their byte size, for which the estimate will be used to decide how many tokens to consume (b) computing the fraction of an ingest request that will end up in L0 (to adjust the token consumption for an ingest request). Just like token estimation, these estimates are continually adjusted based on stats (at every 15s interval).
  • After the admitted work is done, each ingest request will also provide information on how many bytes were added to L0 (this will need a small Pebble change db: make DB.Ingest return bytes ingested into L0  pebble#1600), so that the token consumption can be fixed and we have data for future estimates.

#75120 contains a WIP PR with this change.

Deficiencies:
The overload threshold at which we want to start constraining writes could be different for user-facing and background operations (like index backfills): By sharing a single queue and a single source of tokens for that queue we also share the same overload thresholds. This is probably not an immediate problem since rocksdb.ingest_backpressure.l0_file_count_threshold and admission.l0_sub_level_count_overload_threshold both default to 20. There is a way to address this in the future via a hierarchical token bucket scheme: the admission.ioLoadListener would produce high_overload_tokens and low_overload_tokens where the background operations have to consume both, while foreground operations only use the former (we are trying to do something similar for cpu slots for background activities like concurrent compression threads in Pebble compactions).

This issue is split off from the broader #75066

Jira issue: CRDB-14556

Epic CRDB-14607

Metadata

Metadata

Assignees

Labels

A-admission-controlA-storageRelating to our storage engine (Pebble) on-disk storage.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-storageStorage Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions