Skip to content

admission: clearrange test induces IO overload starving out foreground traffic #104862

@irfansharif

Description

@irfansharif

Is your feature request related to a problem? Please describe.

I see the following, running:

roachtest run 'clearrange/checks=true/rangeTs=false' --cockroach ~/cockroach-fixed --cluster irfansharif-clearrange-flowcontrol --debug-always --port 8081
image

This shouldn't be as impactful as it is. Few notes from internal discussions:
  • [Oleg] When we drop tables/indices we should have a range tombstone dropped over the full range of keys. It would also add a gc hint onto range state indicating that this range is no more. Subsequently, when gc time is reached, once gc will hit first range with such hint it will try to find all ranges with the hint and enqueue them with high priority meaning, there's an alternative pacing delay for them. When doing GC, if range has hint and all data is covered with range tombstone it will issue clear range pebble op to remove all data. The reason for increased pace (smaller delay) for hinted ranges is to reduce compaction costs. Tests shown that if we have usual pacing, gc of deleted tables could take long time while pebble is constantly compacting because of cleared ranges. And if we issue clear range requests faster, then compaction per sec stays the same, but time compactions are run is significantly reduced.
  • [Sumeer] It’s possible that compactions in pebble are trying to prioritize range dels. So L0 is not being compacted, and we don’t have L0 tokens to grant to AC work queues.

Jira issue: CRDB-28757

Metadata

Metadata

Assignees

Labels

A-admission-controlA-storageRelating to our storage engine (Pebble) on-disk storage.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-storageStorage Team

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions