Skip to content

storage: benchmark and optimize ComputeStatsForRange #84544

@erikgrinaker

Description

@erikgrinaker

As with other reads, ComputeStatsForRange now has to take MVCC range tombstones into account as well, which comes with a performance penalty. Many of the optimizations in #82559 and #83049 will also apply here, but we should see if there are additional optimization we can do.

One optimization that comes to mind is to construct the iterator internally, with appropriate bounds and options. Setting precise bounds would avoid a key comparison as described in #41899, yielding a 30% improvement. However, we sometimes need to compute stats for SSTs (using e.g. NewPebbleMemSSTIterator), which might preclude constructing the iterator internally.

We should also get rid of Engine.ComputeStats() while we're at it, and migrate all callers to use ComputeStatsForRange. However, one benefit of Engine.ComputeStats() is that it respects spanset assertions. We should instead consider forcing MVCC iterators to specify both a lower and upper bound, and do spanset assertions on NewMVCCIterator instead.

This needs to be benchmarked against release-22.1.

Jira issue: CRDB-17722

Epic CRDB-2624

Metadata

Metadata

Assignees

Labels

A-storageRelating to our storage engine (Pebble) on-disk storage.C-performancePerf of queries or internals. Solution not expected to change functional behavior.T-storageStorage Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions