compaction: use InterleavingIter to interleave rangedels#1659
compaction: use InterleavingIter to interleave rangedels#1659jbowens wants to merge 2 commits intocockroachdb:masterfrom
Conversation
172d060 to
09041fe
Compare
itsbilal
left a comment
There was a problem hiding this comment.
Reviewed 14 of 14 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens)
compaction.go line 366 at r1 (raw file):
// Referenced by `compactionIter` which uses it to check whether keys are deleted. rangeDelFrag keyspan.Fragmenter // The range deletion tombstone interleaving iterator that interleaves range
Nit: probably worth mentioning here that this iterator is the outermost returned iterator if there are any rangedels.
compaction_iter.go line 458 at r1 (raw file):
i.iterKey, i.iterValue = i.iter.Next() // We should never see an exclusive sentinel in the compaction input. if i.iterKey != nil && i.iterKey.IsExclusiveSentinel() {
If I understand this correctly, we had to remove this because every key will appear like an exclusive sentinel, correct?
nicktrav
left a comment
There was a problem hiding this comment.
Reviewed 14 of 14 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @jbowens)
compaction_iter.go line 495 at r1 (raw file):
switch key.Kind() { case InternalKeyKindRangeDelete: // Range tombstones are always interleaved at the infinite sequence
This didn't immediately sink in. Could the first sentence end with something like ", and therefore a range tombstone (if present) will be the first key encountered.". This would underscore the fact that we should never see a range del in the switch statement.
testdata/compaction_iter line 476 at r2 (raw file):
c-d#3 d-e#2 e-f#1
The last fragment is gone? What's the explanation there?
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)
compaction_iter.go line 458 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
If I understand this correctly, we had to remove this because every key will appear like an exclusive sentinel, correct?
Yeah, that's right.
testdata/compaction_iter line 476 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
The last fragment is gone? What's the explanation there?
We don't get as far through the input tombstones with the same number of nexts, a consequence of the MergingIter fragmenting before the range deletions are interleaved. In the old test file, we encountered all three unfragmented tombstones a-d, b-e and c-f. Now we encounter the fragments a-b, b-c, c-d, d-e. A subsequent next would surface e-f.
nicktrav
left a comment
There was a problem hiding this comment.
Reviewed 4 of 5 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @itsbilal and @jbowens)
testdata/compaction_iter line 476 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
We don't get as far through the input tombstones with the same number of
nexts, a consequence of theMergingIterfragmenting before the range deletions are interleaved. In the old test file, we encountered all three unfragmented tombstonesa-d,b-eandc-f. Now we encounter the fragmentsa-b,b-c,c-d,d-e. A subsequent next would surfacee-f.
Ah. So those fragments aren't in the fragmenter yet as they haven't been interleaved. And given that tombstones flushes the current state of the fragmenter, we don't see it. 👍
What do you think about either having one test case that "nexts" through all fragments for completeness, or updating the test cases where we removed e-f in this PR to add an additional "next" operation so that e-f is emitted? Or just a comment. No strong opinion.
Previously, compactions used a temporary `keyspan.InternalIteratorShim` type to insert range tombstones into the compaction iterator's internal iterator. These range tombstones were inserted at their start user key and sequence number. This change adjusts the compaction iterator to use the InterleavingIter type to interleave range tombstones at the their start keys at the infinite sequence number. The compaction loop continues to be responsible for adding these tombstones to the range deletion fragmenter. Since range deletions are interleaved at the infinite sequence number, this change also resolves some inconsistent behavior at range deletion start keys. For example, previously compactions would not combine a MERGE with a later SET or MERGE at the same sequence number as a range tombstone beginning at the same user key. Previously the compaction would preserve both unmerged keys as if a snapshot had prevented the merge because the interleaved RANGEDELs were treated as beginning a new snapshot stripe. The same oddity occurred when evaluating whether SETs should become SETWITHDELs. Future work may consider removing compaction's use of the fragmenter altogether, since incoming range deletions are already fragmented. The compaction loop would need to consult the InterleavingIter to know when pending range deletions should be flushed, and logic to truncate tombstones to sstable boundaries would still be required.
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: 9 of 16 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @nicktrav)
testdata/compaction_iter line 476 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Ah. So those fragments aren't in the fragmenter yet as they haven't been interleaved. And given that
tombstonesflushes the current state of the fragmenter, we don't see it. 👍What do you think about either having one test case that "nexts" through all fragments for completeness, or updating the test cases where we removed
e-fin this PR to add an additional "next" operation so thate-fis emitted? Or just a comment. No strong opinion.
Updated to emit e-f in this instance. I want to preserve a combination of cases that exhaust input tombstones and case that do not.
nicktrav
left a comment
There was a problem hiding this comment.
Did another pass of the latest revs. 👍
Reviewed 7 of 7 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @itsbilal and @sumeerbhola)
|
This was rewritten and completed in #3219. |
Previously, compactions used a temporary
keyspan.InternalIteratorShimtype toinsert range tombstones into the compaction iterator's internal iterator. These
range tombstones were inserted at their start user key and sequence number.
This change adjusts the compaction iterator to use the InterleavingIter type to
interleave range tombstones at the their start keys at the infinite sequence
number. The compaction loop continues to be responsible for adding these
tombstones to the range deletion fragmenter.
Since range deletions are interleaved at the infinite sequence number, this
change also resolves some inconsistent behavior at range deletion start keys.
For example, previously compactions would not combine a MERGE with a later SET
or MERGE at the same sequence number as a range tombstone beginning at the same
user key. Previously the compaction would preserve both unmerged keys as if a
snapshot had prevented the merge because the interleaved RANGEDELs were treated
as beginning a new snapshot stripe. The same oddity occurred when evaluating
whether SETs should become SETWITHDELs.
Future work may consider removing compaction's use of the fragmenter
altogether, since incoming range deletions are already fragmented. The
compaction loop would need to consult the InterleavingIter to know when pending
range deletions should be flushed, and logic to truncate tombstones to sstable
boundaries would still be required.