-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv,storage,ui: add better Compaction metrics #46389
Description
...and possibly delete the Compactor Queue as it exists today.
tl;dr: The Compaction Queue chart we expose through our UI is not a very useful chart to be looking at, and we could do better.
The Compactor is the mechanism we have in place today that allows us to suggest compactions, on demand, to the underlying storage engine. We typically make use of this when we know we are generating a lot of garbage (for instance when a store accepts a bunch of new replicas that overlap with existing ones during a decommissioning process). The Compactor periodically goes through received suggestions and instructs RocksDB/Pebble to compact data on disk, as appropriate. Note that this is not strictly necessary, RocksDB/Pebble will carry out compactions over time as needed, the Compactor exists to proactively reclaim space when possible.
The Compaction Queue graph, perhaps confusingly, records the view of the world as seen by the Compactor, not as seen by RocksDB/Pebble. So a "suggestion" to the compactor is recorded in queued bytes (as seen in the UI at the time of writing). It's only when the compactor oversees the processing of what was suggested to it, does it decrement from the queued bytes metric. It does not periodically poll the underlying storage engine to reflect what RocksDB/Pebble thinks this value should be (say, "estimated reclaimable space"), it's only recording the state of the suggestions received thus far. This does not seem to be a useful metric to be tracking. It also only updates the metric on demand when it receives new compaction suggestions. It also does not react to changing cluster settings pertaining to the Compactor (compactor.{max_record_age,threshold_{bytes,{available,used}_fraction}}).
In https://github.com/cockroachlabs/support/issues/385 we observed a supposedly "wedged" compaction graph which was in fact simply out of date, and not updating itself as it hadn't received any compaction suggestions for some time. Because all the suggested compactions were fractured/small, and thus inactionable, the graph persistedly displayed a high queued bytes amount.
For the reasons above, I think what we want is closer to #41265 and #43965, possibly exposing rocksdb.estimated-pending-compaction as a first class UI citizen instead (and/or the Pebble equivalent). The Compaction Queue graph, as it stands today, offers no visibility into anything we would be interested in (and is also usually of date).
As for the removal of the Compactor Queue in its entirety, I think it was introduced as an attempt to reclaim garbage on demand/control RocksDB compaction behavior, but I'm not sure if (a) we need such a thing, and (b) it's effective at doing said thing. Seems to me if we have problems around garbage reclamation, we should be addressing them at the storage layer, not at KV.
We currently persist received suggestions if we're unable to act on them immediately, in the hope that future suggestions over larger intervals can be merged alongside it. I'm unsure if this happens often, or if it does, when. Suggestions are also deleted after 24hrs (coming back to (b), the effectiveness of it all).
Jira issue: CRDB-5091