-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage/engine: surface compaction health in admin UI #41265
Description
TL;DR: hook up time series recording of L0 file count and estimated_pending_compaction_bytes.
In debugging bulk-ingestion slowdowns, we eventually figured out that lots of out ingestions were being back pressured due to L0 file count, however beyond that, that on the node where L0 was filling up, files were going to L0 because compaction was unable to keep up and actually empty the lower levels.
One thing that might have made it easier to diagnose would have been time-series data revealing the climbing compaction debt (estimated_pending_compaction_bytes) on the node that eventually became the problem, as well as the L0 file count that actually caused the back pressure. Being able to see these correlated with the other metrics might have helped us zero in on it sooner.
Interval compaction bytes would also be nice.