db: support range keys in delete-only compactions#1750
db: support range keys in delete-only compactions#1750nicktrav merged 2 commits intocockroachdb:masterfrom
Conversation
d2f8d24 to
5cce5f3
Compare
jbowens
left a comment
There was a problem hiding this comment.
looking good, thanks for doing this
Reviewed 1 of 6 files at r1, 8 of 8 files at r3, 4 of 5 files at r4, all commit messages.
Reviewable status: 8 of 9 files reviewed, 6 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)
table_stats.go line 313 at r3 (raw file):
defer iter.Close() var compactionHints []deleteCompactionHint // We iterate over the defragmented range tombstones, which ensures
nit: s/defragmented range tombstones/defragmented range tombstones and range key deletions/
table_stats.go line 338 at r3 (raw file):
return nil, err } stats.RangeDeletionsBytesEstimate += estimate
here, and above in the level == numLevels - 1 conditional, I think we only want to populate RangeDeletionsBytesEstimate if s contains any keys of kind RANGEDEL.
RangeDeletionsBytesEstimate is the estimate of bytes within [start, end) in point key blocks, or if tables are wholly contained, the entirety of the sstable. We want to avoid totaling this if there's only a RANGEKEYDEL over the span.
We should also leave a comment that if s only contains RANGEDEL and the span wholly contains a table containing range keys, this estimate will be off, mistakenly including the size of the range key block. That's expected to be okay, because range keys are expected to be rare and not contribute significantly to sstable size.
table_stats.go line 584 at r3 (raw file):
// corresponding to the largest and smallest sequence numbers encountered across // the range deletes and range keys deletes that comprised the merged spans. type tableRangedDeletionIter struct {
nit: do we need this type now? or could we return the merging iter as an opaque FragmentIterator?
table_stats.go line 589 at r3 (raw file):
// newTableRangeDeletionIter returns a new tableRangedDeletionIter. func newTableRangeDeletionIter(
nit: can we call this something that makes it a little clear that it's range deletions and range key deletions? newTableRangedDeletionIter is better, maybe newCombinedDeletionKeyspanIter or something is even better? The term 'range deletion' is just so ubiquitous referring specifically to RANGEDEL that anything close to it is likely to be confused.
table_stats.go line 674 at r3 (raw file):
// The separate iters for the range dels and range keys are wrapped in a // merging iter to join the keyspaces into a single keyspace. mIter := &keyspan.MergingIter{}
nit: now that we have MergingIter.AddLevel, we can avoid the allocations of appending to the []FragmentIterator slice by Init-ing the MergingIter higher up, and directly calling AddLevel when we find a non-nil rangedel or range key iterator.
nicktrav
left a comment
There was a problem hiding this comment.
Reviewable status: 5 of 9 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)
table_stats.go line 313 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: s/defragmented range tombstones/defragmented range tombstones and range key deletions/
Done.
table_stats.go line 338 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
here, and above in the
level == numLevels - 1conditional, I think we only want to populateRangeDeletionsBytesEstimateifscontains any keys of kindRANGEDEL.
RangeDeletionsBytesEstimateis the estimate of bytes within[start, end)in point key blocks, or if tables are wholly contained, the entirety of the sstable. We want to avoid totaling this if there's only aRANGEKEYDELover the span.We should also leave a comment that if
sonly containsRANGEDELand the span wholly contains a table containing range keys, this estimate will be off, mistakenly including the size of the range key block. That's expected to be okay, because range keys are expected to be rare and not contribute significantly to sstable size.
Turns out there was a bug here that wasn't calculating stats for tables with just range keys. Fixed and added an additional test case.
table_stats.go line 584 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: do we need this type now? or could we return the merging iter as an opaque
FragmentIterator?
Good call. The new function now returns the interface.
table_stats.go line 589 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: can we call this something that makes it a little clear that it's range deletions and range key deletions?
newTableRangedDeletionIteris better, maybenewCombinedDeletionKeyspanIteror something is even better? The term 'range deletion' is just so ubiquitous referring specifically toRANGEDELthat anything close to it is likely to be confused.
I think I'd just typo'd originally - intention was for newTableRangedDeletionIter. That highlights the problem, so let's go with newCombinedDeletionKeyspanIter.
table_stats.go line 674 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: now that we have
MergingIter.AddLevel, we can avoid the allocations of appending to the[]FragmentIteratorslice byInit-ing theMergingIterhigher up, and directly callingAddLevelwhen we find a non-nil rangedel or range key iterator.
Nice! Updated.
itsbilal
left a comment
There was a problem hiding this comment.
Reviewable status: 5 of 9 files reviewed, 4 unresolved discussions (waiting on @jbowens and @nicktrav)
table_stats.go line 313 at r3 (raw file):
defer iter.Close() var compactionHints []deleteCompactionHint // We iterate over the defragmented range tombstones, which ensures
This would be range key deletions too, right? Not just range tombstones?
testdata/table_stats line 264 at r3 (raw file):
# Ingest a table with range keys. # TODO(travers): Once range keys in batches are flushed to tables, this testcase
This should be the case now, right?
testdata/table_stats_deletion_iter line 57 at r3 (raw file):
build a-z:{(#0,RANGEKEYDEL) (#0,RANGEKEYSET,@1,foo) (#0,RANGEKEYUNSET,@2)}
Maybe we should have a case where the RANGEKEYDEL has a lower sequence number than the others to confirm this is the Filter working and not the deletion shadowing the others?
4c656db to
45aeb2c
Compare
nicktrav
left a comment
There was a problem hiding this comment.
Rebased on master to pick up the recent compaction changes, and pushed a few more commits addressing feedback. I'm also doing a burn of the metamorphic tests.
I'll wait for one more LGTM before squashing, landing, and putting this one to bed ... finally.
Reviewable status: 1 of 9 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @jbowens)
table_stats.go line 313 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
This would be range key deletions too, right? Not just range tombstones?
Updated.
testdata/table_stats line 264 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
This should be the case now, right?
Yeah - though I have a bunch of other test cases like this (that use the ingestion hack) that I'd like to clean up. I might address these in a follow-up, to avoid the extra noise.
testdata/table_stats_deletion_iter line 57 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Maybe we should have a case where the RANGEKEYDEL has a lower sequence number than the others to confirm this is the Filter working and not the deletion shadowing the others?
Good call. Added.
jbowens
left a comment
There was a problem hiding this comment.
Reviewed 1 of 7 files at r10, 2 of 5 files at r11, 1 of 2 files at r14, 2 of 2 files at r16, 1 of 1 files at r17, 1 of 1 files at r18, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @itsbilal and @nicktrav)
compaction.go line 1849 at r18 (raw file):
} // compactionHintFromKeys returns a deleteCompactionHintType give a slice of
nit: s/give/given/
table_stats.go line 477 at r18 (raw file):
// the RangeDeletionsBytesEstimate statistic and can't populate table stats // from just the properties. The table stats collector goroutine will populate // the stats.
nit: mind adjusting this comment to explicitly call out the need for the table stats collector to trigger delete-only compactions? It seems like there's two separate reasons we can't compute stats here:
- If a table has range deletions we can't calculate
RangeDeletionsBytesEstimatewithout scanning the LSM, and we need the table stats collector to do that. - Additionally, if a table has range deletions or range key deletions, we need the table stats collector to run on the file so that it can look for opportunities for delete-only compactions.
Add a new `keyspan.FragmentIterator` implementation that iterates over the range deletion and range key blocks of a single table. The iterator returns merged spans from the table corresponding to "ranged deletions". A "ranged deletion" span contains keys that fall into one of three cases: - the span contains only range deletions, - the span contains only range key deletions, or - the span contains a mixture of both range deletions and range key deletions. Under the hood, the iterator first defragments the range deletion and range key deletion blocks separately, using the table's respective `keyspan.FragmentIterator`s. This defragmentation process also filters out non-deletion key types (i.e. range key SET and UNSETS). The two separate iterators are then combined into a `keyspan.MergingIter`. This new `tableRangedDeletionIter` will be used to construct spans necessary for calculating deletion hints on a table that takes range keys into account. Previously, only range deletion spans were considered. For table that do not contain range key deletes, the iterator returns the same set of spans that the existing logic would have returned.
Currently, when constructing delete-only compactions hints, only range delete spans are considered. With the introduction of range keys, the current hint construction logic is incorrect. Specifically, a hint constructed from a range delete may completely cover a table containing range keys. Currently, this table would be marked as eligible for deletion. However, the table should only be deleted if the range keys are also deleted. Make use of the `tableRangedDeletionIter` for construction of delete-only compaction hints. A new `deleteCompactionHintType` type is used to distinguish between hints generated by a range delete (which may not delete tables containing range keys), hints generated from a range key delete (which may not delete tables containing point keys), and hints generated from a span with both a range delete and range key delete (which may delete any covering table).
45aeb2c to
80efae1
Compare
nicktrav
left a comment
There was a problem hiding this comment.
Squashed all the fixup commits. I'll address the test cases in a follow-up.
TFTRs!
Reviewable status: 4 of 9 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @jbowens)
compaction.go line 1849 at r18 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: s/give/given/
Done.
table_stats.go line 477 at r18 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: mind adjusting this comment to explicitly call out the need for the table stats collector to trigger delete-only compactions? It seems like there's two separate reasons we can't compute stats here:
- If a table has range deletions we can't calculate
RangeDeletionsBytesEstimatewithout scanning the LSM, and we need the table stats collector to do that.- Additionally, if a table has range deletions or range key deletions, we need the table stats collector to run on the file so that it can look for opportunities for delete-only compactions.
Done!
NOTE: This PR depends on #1759.Two commits in this patch - the first defines a new composite iterator, the second puts it to use.
This is a re-do of #1656, rebased on latest master to pick up various interfaces changes, and also squash all of the commits from that PR. All feedback from the original PR has been addressed.
TODOs:
db: add tableRangedDeletionIter for iteration of ranged deletion spans
Add a new
keyspan.FragmentIteratorimplementation that iterates overthe range deletion and range key blocks of a single table. The iterator
returns merged spans from the table corresponding to "ranged deletions".
A "ranged deletion" span contains keys that fall into one of three
cases:
deletions.
Under the hood, the iterator first defragments the range deletion and
range key deletion blocks separately, using the table's respective
keyspan.FragmentIterators. This defragmentation process also filtersout non-deletion key types (i.e. range key SET and UNSETS). The two
separate iterators are then combined into a
keyspan.MergingIter.This new
tableRangedDeletionIterwill be used to construct spansnecessary for calculating deletion hints on a table that takes range
keys into account. Previously, only range deletion spans were
considered.
For table that do not contain range key deletes, the iterator returns
the same set of spans that the existing logic would have returned.
db: consider range keys in delete-only compactions
Currently, when constructing delete-only compactions hints, only range
delete spans are considered. With the introduction of range keys, the
current hint construction logic is incorrect. Specifically, a hint
constructed from a range delete may completely cover a table containing
range keys. Currently, this table would be marked as eligible for
deletion. However, the table should only be deleted if the range keys
are also deleted.
Make use of the
tableRangedDeletionIterfor construction ofdelete-only compaction hints. A new
deleteCompactionHintTypetype isused to distinguish between hints generated by a range delete (which may
not delete tables containing range keys), hints generated from a range
key delete (which may not delete tables containing point keys), and
hints generated from a span with both a range delete and range key
delete (which may delete any covering table).