Skip to content

storage/engine: surface compaction health in admin UI #41265

@dt

Description

@dt

TL;DR: hook up time series recording of L0 file count and estimated_pending_compaction_bytes.

In debugging bulk-ingestion slowdowns, we eventually figured out that lots of out ingestions were being back pressured due to L0 file count, however beyond that, that on the node where L0 was filling up, files were going to L0 because compaction was unable to keep up and actually empty the lower levels.

One thing that might have made it easier to diagnose would have been time-series data revealing the climbing compaction debt (estimated_pending_compaction_bytes) on the node that eventually became the problem, as well as the L0 file count that actually caused the back pressure. Being able to see these correlated with the other metrics might have helped us zero in on it sooner.

Interval compaction bytes would also be nice.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions