storage/engine: surface compaction health in admin UI

TL;DR: hook up time series recording of L0 file count and estimated_pending_compaction_bytes.

In debugging bulk-ingestion slowdowns, we eventually figured out that lots of out ingestions were being back pressured due to L0 file count, however beyond that, that on the node where L0 was filling up, files were going to L0 because compaction was unable to keep up and actually empty the lower levels.

One thing that might have made it easier to diagnose would have been time-series data revealing the climbing compaction debt (`estimated_pending_compaction_bytes`) on the node that eventually became the problem, as well as the L0 file count that actually caused the back pressure. Being able to see these correlated with the other metrics might have helped us zero in on it sooner.

Interval compaction bytes would also be nice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage/engine: surface compaction health in admin UI #41265

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

storage/engine: surface compaction health in admin UI #41265

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions