internal/manifest: Add L0SubLevels methods to pick compactions by itsbilal · Pull Request #615 · cockroachdb/pebble

itsbilal · 2020-04-08T16:57:45Z

This change adds methods to L0SubLevels to help pick, score,
and generate L0 -> LBase, and L0 -> L0 compactions, based on
information captured in the data structure about L0 sublevels.

Also adds associated datadriven unit tests, and a benchmark.

Covers a large part of #563. Thanks Sumeer for his work, most of
this was written by him.

First commit is #614 , which this PR builds on top of.

petermattis · 2020-04-08T16:57:52Z

This change is

itsbilal · 2020-04-15T21:58:35Z

This PR is WIP until #614 is merged.

itsbilal · 2020-04-23T16:13:17Z

This is ready for a review now. It's still meatier than I'd like (just under 1000 lines of non-test code changes), but I've added comments and/or simplified whatever I thought was worth simplifying. Also added tests for all the methods.

Happy to explain anything if necessary. It might be worth reading the test file first, and all the comments there, plus the giant comment before PickBaseCompaction to get the general idea.

petermattis

I'll get started on reviewing this tomorrow.

Reviewable status: 0 of 3 files reviewed, all discussions resolved (waiting on @jbowens, @petermattis, and @sumeerbhola)

petermattis

This is going to take me longer to review than I thought. A small set of initial comments. I'll have more later.

Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 682 at r2 (raw file):

// sub-levels (older to younger). The intervals can be sparse wrt sub-levels.
// We observe that the system is typically under severe pressure in L0 during
// large imports where most files added to L0 are narrow and non-overlapping.

Technically, Pebble doesn't have imports. I understand this comment, but it might be good to use Pebble terminology such as ingestion.

internal/manifest/l0_sublevels.go, line 683 at r2 (raw file):

// We observe that the system is typically under severe pressure in L0 during
// large imports where most files added to L0 are narrow and non-overlapping.
// In that case we expect the rectangle represented in the above visualization

I initially read this as there being an actual visualization I could look at. Some ascii art may be helpful for understanding. It would be good to show visually good compactions vs bad compactions.

internal/manifest/l0_sublevels.go, line 686 at r2 (raw file):

// to be wide and short, and not too sparse (most intervals will have
// fileCount close to the sub-level count), which would make it amenable to
// concurrent L0 => Lbase compactions.

Nit: sometimes we use L0 -> Lbase, and sometimes L0 => Lbase. Let's use one arrow style for consistency. I believe -> is more common elsewhere in Pebble.

internal/manifest/l0_sublevels.go, line 722 at r2 (raw file):

// tall rectangles starting with the top of the stack (youngest files).
// Like the L0 => Lbase case we order the intervals using a heuristic and
// consider each in turn. The same comment about better heuristics and not

s/better/better L0 => Lbase/g?

petermattis

I made my way through most of the base compaction code. The intra-L0 code remains to be done.

After reading this code more, I wonder if interval is the right term to be using. I think partition would be a term more familiar to people in the context this is being used. That would lead to describing this approach here as "partitioned L0 sublevels". @sumeerbhola do you have strong opinions about this?

I'm also trying to wrap my head around my L0 has a different set of heuristics for picking compactions than L[1-6]. L[1-6] are also partitioned, but we don't explicitly maintain the partitions. Would it be worthwhile to do so? L[1-6] can only compact 2 adjacent levels, while L0 can compact N sublevels. Would it be worthwhile to allow N>2 compactions in L[1-6] (see #136). This isn't a comment on this PR as much as pondering where we see the compaction heuristics going long term. I'm wondering if there is something fundamental about L0 that motivates having separate compaction heuristics, or if the code here is a step towards eventual reunification of the heuristics for all of the levels.

Reviewable status: 0 of 3 files reviewed, 28 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 452 at r2 (raw file):

// TODO(bilal): Simplify away the debugging statements in this method, and make
// this a pure sanity checker.
func (s *L0SubLevels) checkCompaction(c *Level0CompactionFiles, isBase bool) error {

Is isBase == !c.isIntraL0?

internal/manifest/l0_sublevels.go, line 571 at r2 (raw file):

// if false, this is assumed to be an intra-L0 compaction. The specified
// compaction must be involving L0 SSTables.
func (s *L0SubLevels) UpdateStateForStartedCompaction(c *Level0CompactionFiles, isBase bool) error {

See other comment about isBase == !c.isIntraL0.

internal/manifest/l0_sublevels.go, line 581 at r2 (raw file):

		f.Compacting = true
		if !isBase {
			f.IsIntraL0Compacting = true

Rather than the conditional, I think this can be f.IsIntraL0Compacting = !isBase. Or perhaps f.IsIntraL0Compacting = c.isIntraL0.

internal/manifest/l0_sublevels.go, line 586 at r2 (raw file):

			interval := &s.orderedIntervals[i]
			if isBase {
				interval.isBaseCompacting = true

interval.isBaseCompacting = isBase

internal/manifest/l0_sublevels.go, line 630 at r2 (raw file):

	// to bootstrap this compaction candidate.
	seedIntervalStackDepthReduction int
	seedIntervalExtremeLevel        int

This fields could all use commentary. Some of them are obvious, but others such as this one are not.

internal/manifest/l0_sublevels.go, line 744 at r2 (raw file):

	//   to ongoing L0 => Lbase compactions. These are the ones with
	//   !isBaseCompacting && !intervalRangeIsBaseCompacting.
	// - pools[1]: Contains intervals that are !isBaseCompacting && intervalRangeIsBaseCompacting.

The pools structure seems like another level of scoring. Is there a reason not to incorporate this scoring into the per-interval score?

internal/manifest/l0_sublevels.go, line 754 at r2 (raw file):

	// For Intra-L0 compactions, we only use one pool for all intervals.
	pools := [][]intervalAndScore{{}}
	if !isIntraL0 {

There are a lot if isIntraL0 conditions in this method, which makes me think that it would be better as two separate methods: pickBaseCompaction and pickIntraCompaction. Update: looking at this more, I think we definitely want to do this separation. There is actually very little code that is shared by intra-L0 and L0->Lbase compaction picking.

internal/manifest/l0_sublevels.go, line 771 at r2 (raw file):

				continue
			}
			if interval.intervalRangeIsBaseCompacting {

Per my comment above about whether pools are actually needed: we could incorporate interval.intervalRangeIsBaseCompacting into the interval score using a constant that is larger than any depth we expect to encounter. Or we could extend intervalAndScore with additional fields that are used for sorting.

internal/manifest/l0_sublevels.go, line 785 at r2 (raw file):

	// are likely to choose the same seed file. Again this is just
	// to reduce wasted work.
	consideredIntervals := make([]bool, len(s.orderedIntervals))

There are a number of places in this code where a []bool slice is being used as a bit set. I think it might be worthwhile to define a type so that the loops to set bits can become function calls.

type bitSet []bool

func newBitSet(n int) bitSet {
  return make([]bool, n)
}

func (b *bitSet) markBit(i int) bool {
  (*b)[i] = true
}

func (b *bitSet) markBits(start, end int) bool {
  for i := start; i < end; i++ {
    (*b)[i] = true
  }
}
...

internal/manifest/l0_sublevels.go, line 797 at r2 (raw file):

				// in the highest sub-level.
				slIndex := len(interval.interval.subLevelAndFileList) - 1
				adjustedNonCompactingFileCount := interval.interval.topOfStackNonCompactingFileCount

It isn't obvious to me why we need to track topOfStackNonCompactingFileCount. Couldn't the loop below check f.Compacting and exit when it gets to a compacting file rather than returning an error?

internal/manifest/l0_sublevels.go, line 799 at r2 (raw file):

				adjustedNonCompactingFileCount := interval.interval.topOfStackNonCompactingFileCount
				for ; slIndex >= 0; slIndex-- {
					slf := interval.interval.subLevelAndFileList[slIndex]

It is strange to see interval.interval. Perhaps *fileInterval should be embedded in intervalAndScore rather than a named member.

internal/manifest/l0_sublevels.go, line 800 at r2 (raw file):

				for ; slIndex >= 0; slIndex-- {
					slf := interval.interval.subLevelAndFileList[slIndex]
					f = s.filesByAge[slf.fileIndex]

There is a ton of slice indexing in this code. I have a general concern about the correctness. I wonder if there is anything we could do to make this more obviously correct. For example, we could define a type for each different slice and define an index type and provide accessors which only accept that index type. It is possible I'm being paranoid about nothing, so feel free to push back against this suggestion.

internal/manifest/l0_sublevels.go, line 810 at r2 (raw file):

					// numbers than earliestUnflushedSeqNum cannot be in
					// the compaction.
					if f.LargestSeqNum >= earliestUnflushedSeqNum {

For readability, it might be worthwhile computing a FileMetadata.compactable bool when the version is created. This is something that could be pulled out into a separate smaller PR.

internal/manifest/l0_sublevels.go, line 867 at r2 (raw file):

					// in Lbase that have Compacting = true. If that's the case,
					// this compaction cannot be chosen.
					firstBaseIndex := sort.Search(len(baseFiles), func(i int) bool {

Is this equivalent to the overlaps() function? I suspect we could do:

lower, upper := overlaps(baseFiles[i], s.orderedIntervals[c.minIntervalIndex].startKey.key, s.orderedIntervals[c.maxIntervalIndex+1].startKey.key)

internal/manifest/l0_sublevels.go, line 905 at r2 (raw file):

	f *FileMetadata, intervalIndex int, minCompactionDepth int,
) *Level0CompactionFiles {
	cFiles := &Level0CompactionFiles{

Nit: in other methods (e.g. pickCompaction) we use c to refer to this variable. Let's do so here as well for consistency.

internal/manifest/l0_sublevels.go, line 917 at r2 (raw file):

	filesIncluded := make([]bool, len(s.filesByAge))
	filesIncluded[f.l0Index] = true
	cFiles.FilesIncluded = filesIncluded

Nit: You can initialize Level0CompactionFiles.FilesIncluded above:

cFiles := &Level0CompactionFiles{
  Files: ...
  FilesIncluded: make([]bool, len(s.filesByAge)
}
cFiles.FilesIncluded[f.l0Index] = true

internal/manifest/l0_sublevels.go, line 918 at r2 (raw file):

	filesIncluded[f.l0Index] = true
	cFiles.FilesIncluded = filesIncluded
	// The seed file captures all files in the next level that fall

The seed file is the lowest file in the seed interval, but the seed file may overlap adjacent intervals in which it is not the lowest file. If that is correct, I think it would be useful to call this out explicitly in this comment.

internal/manifest/l0_sublevels.go, line 933 at r2 (raw file):

	//                   -----
	// It may be better for performance to have a more rectangular
	// shape since the files being left behind will induce touch the

induce touch?

internal/manifest/l0_sublevels.go, line 938 at r2 (raw file):

	// shape we will be forced to pull in a file that is already
	// compacting. We assume that the performance concern is not a
	// practical issue.

This idea sounds very much like compaction.grow:

// grow grows the number of inputs at c.level without changing the number of
// c.level+1 files in the compaction, and returns whether the inputs grew. sm
// and la are the smallest and largest InternalKeys in all of the inputs.

I suspect we should do this and use the same logic as compaction.grow. It is fine to create a more rectangular shape as long as doing so doesn't increase the base width of the compaction. Let's change this part of the comment to a TODO and reference compaction.grow.

internal/manifest/l0_sublevels.go, line 979 at r2 (raw file):

		// size beyond a hard limit of 500mb, is criteria for rejecting this
		// candidate. This lets us prefer slow growths as we add files, while
		// still having a hard limit.

The logic here also seems to stop expanding if the last candidate is larger than minCompactionDepth and the new candidate contains more than 100mb in data.

internal/manifest/l0_sublevels.go, line 1014 at r2 (raw file):

			break
		}
		if cFiles.FilesIncluded[f.l0Index] || f.LargestSeqNum >= earliestUnflushedSeqNum {

Why is it kosher to skip over the file if f.LargestSeqNum >= earliestUnflushedSeqNum? I suspect this condition doesn't happen in practice, but if we had a file at a newer sublevel and we're looking for overlap in older sublevels and find a non-compactable file, I don't think we can ignore it.

internal/manifest/l0_sublevels.go, line 1017 at r2 (raw file):

			continue
		}
		if f.Compacting {

Nit: I would naturally think this check goes before cFiles.FilesIncluded.

internal/manifest/l0_sublevels.go, line 1145 at r2 (raw file):

// Level0CompactionFiles to cover all L0 files in the specified key interval,
// by calling extendCandidateToRectangle.
func (s *L0SubLevels) ExtendL0ForBaseCompactionTo(

Curious why this method is exported? Perhaps it will be used externally in a future PR.

internal/manifest/l0_sublevels.go, line 1178 at r2 (raw file):

// Best-effort attempt to make the compaction include more files in the
// rectangle defined by [minIntervalIndex, maxIntervalIndex] on the X axis and
// bounded on one side of the Y axis by candidate.seeIntervalExtremeLevel (the

Nit: s/seeInterval/seedInterval/g

itsbilal

Addressed almost all of your comments, except for the few around reducing indexing and slicing. Ready for another look. I'll put up the next PR very soon, with a WIP marker.

Reviewable status: 0 of 3 files reviewed, 23 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 452 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is isBase == !c.isIntraL0?

Done. Good catch!

internal/manifest/l0_sublevels.go, line 571 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

See other comment about isBase == !c.isIntraL0.

I made some changes in an upcoming PR that required me to change the function signature to UpdateStateForStartedCompaction(inputs [2][]*FileMetadata, isBase bool) . Since we don't pass an LCF struct anymore, it makes sense to thread isBase.

internal/manifest/l0_sublevels.go, line 581 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Rather than the conditional, I think this can be f.IsIntraL0Compacting = !isBase. Or perhaps f.IsIntraL0Compacting = c.isIntraL0.

Noted - will update the next PR with this. I moved this exact code out to compaction.go.

internal/manifest/l0_sublevels.go, line 586 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

interval.isBaseCompacting = isBase

Done.

internal/manifest/l0_sublevels.go, line 630 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This fields could all use commentary. Some of them are obvious, but others such as this one are not.

Done.

internal/manifest/l0_sublevels.go, line 682 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Technically, Pebble doesn't have imports. I understand this comment, but it might be good to use Pebble terminology such as ingestion.

Done.

internal/manifest/l0_sublevels.go, line 683 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I initially read this as there being an actual visualization I could look at. Some ascii art may be helpful for understanding. It would be good to show visually good compactions vs bad compactions.

Done.

internal/manifest/l0_sublevels.go, line 722 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/better/better L0 => Lbase/g?

Done.

internal/manifest/l0_sublevels.go, line 744 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The pools structure seems like another level of scoring. Is there a reason not to incorporate this scoring into the per-interval score?

Done.

internal/manifest/l0_sublevels.go, line 754 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

There are a lot if isIntraL0 conditions in this method, which makes me think that it would be better as two separate methods: pickBaseCompaction and pickIntraCompaction. Update: looking at this more, I think we definitely want to do this separation. There is actually very little code that is shared by intra-L0 and L0->Lbase compaction picking.

Done. Makes sense.

internal/manifest/l0_sublevels.go, line 771 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Per my comment above about whether pools are actually needed: we could incorporate interval.intervalRangeIsBaseCompacting into the interval score using a constant that is larger than any depth we expect to encounter. Or we could extend intervalAndScore with additional fields that are used for sorting.

Done. Incremented the score by the sublevel count.

internal/manifest/l0_sublevels.go, line 785 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

There are a number of places in this code where a []bool slice is being used as a bit set. I think it might be worthwhile to define a type so that the loops to set bits can become function calls.
type bitSet []bool

func newBitSet(n int) bitSet {
  return make([]bool, n)
}

func (b *bitSet) markBit(i int) bool {
  (*b)[i] = true
}

func (b *bitSet) markBits(start, end int) bool {
  for i := start; i < end; i++ {
    (*b)[i] = true
  }
}
...

Done.

internal/manifest/l0_sublevels.go, line 797 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It isn't obvious to me why we need to track topOfStackNonCompactingFileCount. Couldn't the loop below check f.Compacting and exit when it gets to a compacting file rather than returning an error?

It's useful when adding files to the pool; for an intra-L0 compaction, we can quickly exclude any interval with a lower topOfStackNonCompactingFileCount than minCompactionDepth.

internal/manifest/l0_sublevels.go, line 799 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It is strange to see interval.interval. Perhaps *fileInterval should be embedded in intervalAndScore rather than a named member.

I agree. I instead changed this by doing a for i := range pool { interval := pool[i].interval } instead.

internal/manifest/l0_sublevels.go, line 800 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

There is a ton of slice indexing in this code. I have a general concern about the correctness. I wonder if there is anything we could do to make this more obviously correct. For example, we could define a type for each different slice and define an index type and provide accessors which only accept that index type. It is possible I'm being paranoid about nothing, so feel free to push back against this suggestion.

I agree with you, there's a lot of slice indexing here. I'll think of a refactor along those lines.

internal/manifest/l0_sublevels.go, line 867 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is this equivalent to the overlaps() function? I suspect we could do:
lower, upper := overlaps(baseFiles[i], s.orderedIntervals[c.minIntervalIndex].startKey.key, s.orderedIntervals[c.maxIntervalIndex+1].startKey.key)

I thought about it, but the end case exclusivity and isLargest complicates using overlaps as-is. I prefer this way for readability.

internal/manifest/l0_sublevels.go, line 918 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The seed file is the lowest file in the seed interval, but the seed file may overlap adjacent intervals in which it is not the lowest file. If that is correct, I think it would be useful to call this out explicitly in this comment.

Yes, that's correct. Expanded.

internal/manifest/l0_sublevels.go, line 933 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

induce touch?

"overlaps with" is more conventional. Fixed.

internal/manifest/l0_sublevels.go, line 938 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This idea sounds very much like compaction.grow:
// grow grows the number of inputs at c.level without changing the number of
// c.level+1 files in the compaction, and returns whether the inputs grew. sm
// and la are the smallest and largest InternalKeys in all of the inputs.
I suspect we should do this and use the same logic as compaction.grow. It is fine to create a more rectangular shape as long as doing so doesn't increase the base width of the compaction. Let's change this part of the comment to a TODO and reference compaction.grow.

extendCandidateToRectangle does exactly that, actually. The comment is slightly out of date. Updated. Don't think we need to reference compaction.grow as a result now?

internal/manifest/l0_sublevels.go, line 979 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The logic here also seems to stop expanding if the last candidate is larger than minCompactionDepth and the new candidate contains more than 100mb in data.

Done. Clarified about that too.

internal/manifest/l0_sublevels.go, line 1014 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Why is it kosher to skip over the file if f.LargestSeqNum >= earliestUnflushedSeqNum? I suspect this condition doesn't happen in practice, but if we had a file at a newer sublevel and we're looking for overlap in older sublevels and find a non-compactable file, I don't think we can ignore it.

Good point - didn't think of it that way. Added that conditional to the "return false" case.

internal/manifest/l0_sublevels.go, line 1145 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Curious why this method is exported? Perhaps it will be used externally in a future PR.

Yep, that's exactly it. Used in the next PR in pickAuto.

sumeerbhola

Reviewed 1 of 2 files at r3.
Reviewable status: 1 of 3 files reviewed, 27 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 1014 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Good point - didn't think of it that way. Added that conditional to the "return false" case.

I'm not seeing why we can't ignore these files. Note that L0=>Lbase compactions pass MaxUint64 as the value of earliestUnflushedSeqNum so only intra-L0 compactions will encounter such files.

earliestUnflushedSeqNum is monotonically non-decreasing so none of the ignored files could have been intra-L0 compacted before.
The outputs from this compaction will have largest seqnum < earliestUnflushedSeqNum. So we can pretend that this compaction is happening as if the files with largest seqnum >= earliestUnflushedSeqNum do not exist in level 0 yet.

internal/manifest/l0_sublevels.go, line 622 at r3 (raw file):

	// The extreme level is the sublevel to which this compaction was seeded.
	// This means that sublevels [0, seedIntervalExtremeLevel] for base
	// compactions, and sublevels [seedIntervalExtremeLevel, MaxSublevel] for

where is MaxSublevel defined?

nit: "can have files in this compaction..."
(since it isn't necessary for them to have files).

internal/manifest/l0_sublevels.go, line 745 at r3 (raw file):

) (*Level0CompactionFiles, error) {
	// For LBase compactions, we consider intervals in a greedy manner in the following order:
	// - Inntervals that are unlikely to be blocked due

s/Inntervals/Intervals/

internal/manifest/l0_sublevels.go, line 766 at r3 (raw file):

			// Prioritize this interval by incrementing the score by the number
			// of sublevels.
			pool = append(pool, intervalAndScore{interval: interval, score: depth+sublevelCount})

this looks the opposite of what we want.
The intervals with intervalRangeIsBaseCompacting=true should be deprioritized.

internal/manifest/l0_sublevels.go, line 1206 at r3 (raw file):

// still be a correct compaction candidate.
func (s *L0SubLevels) extendCandidateToRectangle(
	minIntervalIndex int, maxIntervalIndex int, candidate *Level0CompactionFiles, isBase bool,

@petermattis since you were asking today, this method was the source of many of the logic bugs I fixed during my experiments, so you should scrutinize for flaws and possibilities for simplification

itsbilal

Reviewable status: 1 of 3 files reviewed, 27 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 622 at r3 (raw file):

Previously, sumeerbhola wrote…

where is MaxSublevel defined?

nit: "can have files in this compaction..."
(since it isn't necessary for them to have files).

Done.

internal/manifest/l0_sublevels.go, line 745 at r3 (raw file):

Previously, sumeerbhola wrote…

s/Inntervals/Intervals/

Done.

internal/manifest/l0_sublevels.go, line 766 at r3 (raw file):

Previously, sumeerbhola wrote…

this looks the opposite of what we want.
The intervals with intervalRangeIsBaseCompacting=true should be deprioritized.

Done. Whoopsie.

petermattis

Another batch of comments. Still need to get to the intra-L0 stuff.

Reviewable status: 0 of 3 files reviewed, 25 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 682 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Done.

Thanks. I believe that the import workloads don't create large ingestions, but large numbers of ingestions.

internal/manifest/l0_sublevels.go, line 797 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

It's useful when adding files to the pool; for an intra-L0 compaction, we can quickly exclude any interval with a lower topOfStackNonCompactingFileCount than minCompactionDepth.

With the current structure, you have to compute topOfStackNonCompactingFileCount for every interval, rather than only computing it when you're looking for an intra-L0 compaction. So I'm not sure that "quickly exclude" is actually true in practice.

internal/manifest/l0_sublevels.go, line 800 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I agree with you, there's a lot of slice indexing here. I'll think of a refactor along those lines.

Perhaps add a TODO to the top of this file.

internal/manifest/l0_sublevels.go, line 867 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I thought about it, but the end case exclusivity and isLargest complicates using overlaps as-is. I prefer this way for readability.

It isn't immediately obvious to me what the isLargest condition is doing. Do you have an example at end which shows what that is necessary?

internal/manifest/l0_sublevels.go, line 1014 at r2 (raw file):

Previously, sumeerbhola wrote…

I'm not seeing why we can't ignore these files. Note that L0=>Lbase compactions pass MaxUint64 as the value of earliestUnflushedSeqNum so only intra-L0 compactions will encounter such files.

earliestUnflushedSeqNum is monotonically non-decreasing so none of the ignored files could have been intra-L0 compacted before.

The outputs from this compaction will have largest seqnum < earliestUnflushedSeqNum. So we can pretend that this compaction is happening as if the files with largest seqnum >= earliestUnflushedSeqNum do not exist in level 0 yet.

I suspect the reason this is working before is because f.LargestSeqNum >= earliestUnflushedSeqNum can only occur in the most recent sublevel. And for the most recent sublevel, I think it is kosher for the input files to skip sstables. Which is odd, but ok. Consider the structure:

L0.3 a--d f---j l---p
L0.2 a--------------p

Now let's imaging that the sstable [f,j] is trigger this earliestUnflushedSeqNum condition. We can skip including [f,j] in the compaction if there are no earlier sstables included in the compaction that overlap [f,j]. Consider the reverse scenario, though:

L0.3 a--------------p
L0.2 a--d f---j l---p

If [a,p] is included in the compaction, then we can't not include [f,j] in the compaction. I suspect this scenario isn't possible, though.

internal/manifest/l0_sublevels.go, line 625 at r3 (raw file):

	// intra-L0 compactions, have files in this compaction and in the seed
	// interval.
	seedIntervalExtremeLevel        int

I wonder if it would be clearer to have separate seedMinLevel and seedMaxLevel fields. For base compactions, seedMinLevel == 0.

internal/manifest/l0_sublevels.go, line 649 at r3 (raw file):

// Adds the specified file to the LCF.
func (l *Level0CompactionFiles) addFile(f *FileMetadata) {
	l.FilesIncluded[f.l0Index] = true

markBit(f.l0Index)

internal/manifest/l0_sublevels.go, line 685 at r3 (raw file):

//
//    L0.1    d...g
//    L0.0  c..e  g..j o..s u..x

Thanks for the ascii diagrams.

Nit: We've used - rather than . for similar diagrams elsewhere (e.g. in range tombstone descriptions). I think it is worthwhile to be consistent here.

internal/manifest/l0_sublevels.go, line 694 at r3 (raw file):

//    L0.0  a................x
//
// In that case we expect the rectangle represented in the above good

I believe above good is referring to the first diagram, but I think you could also interpret this as referring to the diagram immediately above. Can you rephrase so this is clearer?

internal/manifest/l0_sublevels.go, line 755 at r3 (raw file):

	// and pick the best one. If microbenchmarks show that we can afford
	// this cost we can eliminate this heuristic.
	pool := make([]intervalAndScore, 0, len(s.orderedIntervals))

s/pool/scoredIntervals/g

internal/manifest/l0_sublevels.go, line 766 at r3 (raw file):

Previously, sumeerbhola wrote…

this looks the opposite of what we want.
The intervals with intervalRangeIsBaseCompacting=true should be deprioritized.

Ditto.

internal/manifest/l0_sublevels.go, line 865 at r3 (raw file):

	c.FilesIncluded.markBit(f.l0Index)
	// The seed file is in the lowest sublevel in the seed interval, but it may
	// overlap with other files that may be in even lower sublevels. For

Nit: s/files that may be in/files in/g (you've already qualified this phrase with may).

internal/manifest/l0_sublevels.go, line 903 at r3 (raw file):

	for ; slIndex < len(slfList); slIndex++ {
		sl := slfList[slIndex].subLevel
		f2 := s.filesByAge[slfList[slIndex].fileIndex]

Is f2.subLevel == sl? I wonder if we could replace []subLevelAndFile with files []int. That is, fileInterval would have a list of files and from those files we could determine the sublevel.

internal/manifest/l0_sublevels.go, line 920 at r3 (raw file):

		// Observed some compactions using > 1GB from L0 in an import
		// experiment. Very long running compactions
		// are not good, though sometimes unavoidable. There is a tradeoff here in

Can you expand on "not good"? Why are they not good?

internal/manifest/l0_sublevels.go, line 940 at r3 (raw file):

	if lastCandidate.seedIntervalStackDepthReduction >= minCompactionDepth {
		for i := range lastCandidate.FilesIncluded {
			lastCandidate.FilesIncluded[i] = false

Should add a bitSet.clearBit(i) method and use it here and below.

internal/manifest/l0_sublevels.go, line 968 at r3 (raw file):

			return false
		}
		if cFiles.FilesIncluded[f.l0Index] {

Should this check be moved inside addFile? I think there is somewhere else that you have a similar pattern of checking FilesIncluded and then calling addFile.

internal/manifest/l0_sublevels.go, line 983 at r3 (raw file):

	earliestUnflushedSeqNum uint64, minCompactionDepth int,
) (*Level0CompactionFiles, error) {
	pool := []intervalAndScore{}

s/pool/scoredIntervals/g

Also, var scoredIntervals []intervalAndScore is the idiomatic way to declare an empty slice.

internal/manifest/l0_sublevels.go, line 992 at r3 (raw file):

		// TODO(bilal): Is there a way to incorporate
		// topOfStackNonCompactingFileCount into the score?
		pool = append(pool, intervalAndScore{interval: interval, score: depth})

I wonder if intervalAndScore.interval should be an index. Then in the loop below we'd do interval := &s.orderedIntervals[pool[i].index]. This may make it clearer that we're just sorting the intervals.

sumeerbhola

Reviewable status: 0 of 3 files reviewed, 25 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 1014 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I suspect the reason this is working before is because f.LargestSeqNum >= earliestUnflushedSeqNum can only occur in the most recent sublevel. And for the most recent sublevel, I think it is kosher for the input files to skip sstables. Which is odd, but ok. Consider the structure:
L0.3 a--d f---j l---p
L0.2 a--------------p
Now let's imaging that the sstable [f,j] is trigger this earliestUnflushedSeqNum condition. We can skip including [f,j] in the compaction if there are no earlier sstables included in the compaction that overlap [f,j]. Consider the reverse scenario, though:
L0.3 a--------------p
L0.2 a--d f---j l---p
If [a,p] is included in the compaction, then we can't not include [f,j] in the compaction. I suspect this scenario isn't possible, though.

The examples above look like L0 => Lbase compactions, while this parameter comes into play only for intra-L0.
I agree that the parameter will exclude a contiguous sequence of files from the top of an interval (it may be more than 1 file).

petermattis

Reviewable status: 0 of 3 files reviewed, 25 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 1014 at r2 (raw file):
I was drawing up these examples assuming intra-L0 compactions (i.e. imagine there are sstables in L0.1 or lower which are undergoing L0->Lbase compaction).

I agree that the parameter will exclude a contiguous sequence of files from the top of an interval (it may be more than 1 file).

Ack. I now think this is kosher, as long as we haven't included overlapping files in newer sublevels. I'm not seeing anything which prohibits that.

petermattis

Ok, I've gotten through all of the code. The tests are a bit difficult to penetrate. I have a suggestion about making the inputs and outputs more visual. See the comment in the testdata file below.

Right now, the tests seem to utilize the main exported interfaces. I suspect we might want tests for some of the more intricate internal algorithms, such as extendCandidateToRectangle. I think this can be left to a follow-on PR, though perhaps add a TODO above extendCandidateToRectangle.

Reviewable status: 0 of 3 files reviewed, 39 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 1206 at r3 (raw file):

Previously, sumeerbhola wrote…

@petermattis since you were asking today, this method was the source of many of the logic bugs I fixed during my experiments, so you should scrutinize for flaws and possibilities for simplification

Ack.

internal/manifest/l0_sublevels.go, line 492 at r4 (raw file):

	}
	for _, f := range c.Files {
		if f.minIntervalIndex < fileIntervalsByLevel[f.subLevel].min {

I think it is more readable if the order of the operands in a conditional like this matches the order in the assignment:

if fileIntervalsByLevel[f.subLevel].min > f.minIntervalIndex {
	fileIntervalsByLevel[f.subLevel].min = f.minIntervalIndex
}

There are a few cases in this method which could be fixed.

internal/manifest/l0_sublevels.go, line 533 at r4 (raw file):

					f.LargestSeqNum)
			}
			if !includedFiles[f.l0Index] {

For a subsequent PR: this would be a good place to check that f is not required by the compaction. For example, if the compaction contains a newer sstable that overlaps f and siblings of f are included, we need to include f.

internal/manifest/l0_sublevels.go, line 612 at r4 (raw file):

// between candidate compactions (eg. fileBytes and
// seedIntervalStackDepthReduction).
type Level0CompactionFiles struct {

Nit: s/Level0/L0/g in order to be consistency with the naming of L0SubLevels.

internal/manifest/l0_sublevels.go, line 736 at r4 (raw file):

//    L0.2       | f..j|        r.t
//    L0.1   b.d |e...j|
//    L0.0  a..d | f..j| l..o p.....x

These examples are great.

Nit: the box is neat, though it has thrown off the alignment with the previous diagram. Perhaps add some spacing for the box in the previous diagram.

internal/manifest/l0_sublevels.go, line 741 at r4 (raw file):

//
// Note that running this compaction will mark the a..i file in Lbase as
// compacting, which would then block the b..d interval from being chosen for

Won't ExtendL0ForBaseCompactionTo cause the compaction to extend to pick up the sstable b..d? Perhaps the Lbase sstable should be c..i so this won't happen.

internal/manifest/l0_sublevels.go, line 1141 at r4 (raw file):

	}
	sl := f.subLevel
	cFiles.FilesIncluded.markBit(f.l0Index)

Is there a reason we're not using addFile here. The initialization of the seed file is almost identical to what is done with addFile with the exception of not adding to filesAdded. Was that omission intentional?

internal/manifest/l0_sublevels.go, line 1162 at r4 (raw file):

	// shape we will be forced to pull in a file that is already compacting.
	// We assume that the performance concern is not a practical issue.
	for currLevel := sl + 1; currLevel < len(s.Files); currLevel++ {

I think this could be integrated into the loop below. The first time through the loop, lastCandidate would be nil. On successful processing of the loop we allocate lastCandidate. If the loop terminates with lastCandidate == nil then we return nil.

I'm suggesting this because I had to remind myself what the presence of this loop was doing and the above comment is great, but was offscreen when I was looking at the similar loop below. There are other possibilities for code reorg which could make this clearer, this is just one suggestion.

internal/manifest/l0_sublevels.go, line 1274 at r4 (raw file):

) bool {
	candidate.preExtensionMinInterval = candidate.minIntervalIndex
	candidate.preExtensionMaxInterval = candidate.maxIntervalIndex

We don't seem to be modifying candidate.{min,max}IntervalIndex below, so I'm not clear on why we're saving away their values in preExtension{Min,Max}Interval. I could just be missing something.

internal/manifest/l0_sublevels.go, line 1289 at r4 (raw file):

		startLevel = 0
		increment = +1
		limitReached = func(sl int) bool {

Rather than a closure, could we define endLevel = candidate.seedIntervalExtremeLevel? Then the loop below becomes:

for sl := startLevel; sl != endLevel; sl += increment { ... }

internal/manifest/l0_sublevels.go, line 1356 at r4 (raw file):

		// We pick the longest sequence between firstIndex
		// and lastIndex of non-compacting files -- this is represented by
		// [candidateNonCompactingFirst, candidateNonCompactingLast].

This code could use a nice ascii diagram to explain what it is doing. My understanding is that it is trying to add additional files to the compaction, but has to ignore files that are currently compacting. When it encounters a compacting file it gets to decide which direction to shrink the interval span for use on the next level.

internal/manifest/l0_sublevels.go, line 1400 at r4 (raw file):

				candidateNonCompactingFirst = nonCompactingFirst
				candidateNonCompactingLast = last
				candidateHasAlreadyPickedFiles = currentRunHasAlreadyPickedFiles

My editor claims this assignment is not doing anything.

internal/manifest/l0_sublevels.go, line 1418 at r4 (raw file):

			f := files[index]
			if f.Compacting {
				panic(fmt.Sprintf("expected %s to not be compacting", f.FileNum))

We should probably plumb the logger here as it is preferable to Fatalf rather than panic as it results in cleaner error messages. panic can cause unusual errors if something doesn't clean up properly during the unwinding of the stack.

internal/manifest/testdata/l0_sublevels, line 98 at r4 (raw file):

seed interval: f-f

# SSTables 0001 and 0002 are optional additions to the above compaction, as they

Nit: the defacto format for sstable numbers is %06d. Nice to use that in the comments and tests for consistency.

internal/manifest/testdata/l0_sublevels, line 111 at r4 (raw file):

  0006:f.SET.4-g.SET.5
  0009:f.SET.10-i.SET.10
  0010:f.SET.11-g.SET.11

I think it would be worth the effort to define an input and output format for these tests that is more visual. See for example https://github.com/cockroachdb/pebble/blob/master/internal/rangedel/testdata/fragmenter.

I don't think we care about the file numbers here, but in order to get rid of them the output will have to include the boundaries for the sstables. To give you a concrete idea to start building on, what if we represented the input as an x-y grid:

L0.4      99
L0.3      8888
L0.2      777
L0.1     66
L0.0 1122 33
     abcdefghijklmnopqrstuvwxyz

Contiguous cells in a row would indicate an sstable. The number in the cell would indicate the seqnum. The above is intended to capture the setup described here.

We'd want some way to mark sstables as compacting, but that could be done via something like:

000008: intra_l0_compacting

The output rendering would need to indicate which sstables are compacting. I don't have a suggestion for that yet, but something is almost certainly doable.

I realize this would be a somewhat big lift, yet as it is these test cases are very hard to review. I think the effort will be worth it.

itsbilal

Reviewable status: 0 of 3 files reviewed, 36 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 797 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

With the current structure, you have to compute topOfStackNonCompactingFileCount for every interval, rather than only computing it when you're looking for an intra-L0 compaction. So I'm not sure that "quickly exclude" is actually true in practice.

Done. topOfStackNonCompactingFileCount is no longer a thing.

internal/manifest/l0_sublevels.go, line 800 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Perhaps add a TODO to the top of this file.

Done.

internal/manifest/l0_sublevels.go, line 867 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It isn't immediately obvious to me what the isLargest condition is doing. Do you have an example at end which shows what that is necessary?

The idea is to be able to make perfectly adjacent intervals out of inclusive start/end keys, as well as the interval after f.maxIntervalIndex always being a tight exclusive upper bound for any file.

In even a single sstable example (let's say it's a-b), you'll see that we generate two fileIntervals, one with startKey = {a, false} and another with startKey = {b, true} and the sstable would overlap with the first fileInterval only. Now if you were to add another sstable b-b, you'll have fileIntervals for {a, false}, {b, false}, {b, true}, with both SSTables overlapping the second one and only the first sstable in the first one. Now if a b-bb sstable comes in as well, it would bridge the gap over the {b, true} interval.

Some of the logic relies on there always being an upper bound fileInterval beyond any referenced maxIntervalIndexes. For instance, look at line 1411:

		if candidateNonCompactingFirst > firstIndex {
			minIntervalIndex = files[candidateNonCompactingFirst-1].maxIntervalIndex + 1
		}

internal/manifest/l0_sublevels.go, line 1014 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I was drawing up these examples assuming intra-L0 compactions (i.e. imagine there are sstables in L0.1 or lower which are undergoing L0->Lbase compaction).

I agree that the parameter will exclude a contiguous sequence of files from the top of an interval (it may be more than 1 file).

Ack. I now think this is kosher, as long as we haven't included overlapping files in newer sublevels. I'm not seeing anything which prohibits that.

Done.

internal/manifest/l0_sublevels.go, line 625 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I wonder if it would be clearer to have separate seedMinLevel and seedMaxLevel fields. For base compactions, seedMinLevel == 0.

That does seem cleaner. Done.

internal/manifest/l0_sublevels.go, line 649 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

markBit(f.l0Index)

Done.

internal/manifest/l0_sublevels.go, line 685 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Thanks for the ascii diagrams.

Nit: We've used - rather than . for similar diagrams elsewhere (e.g. in range tombstone descriptions). I think it is worthwhile to be consistent here.

Done.

internal/manifest/l0_sublevels.go, line 694 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I believe above good is referring to the first diagram, but I think you could also interpret this as referring to the diagram immediately above. Can you rephrase so this is clearer?

Done. Sounds good.

internal/manifest/l0_sublevels.go, line 755 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/pool/scoredIntervals/g

Done.

internal/manifest/l0_sublevels.go, line 903 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is f2.subLevel == sl? I wonder if we could replace []subLevelAndFile with files []int. That is, fileInterval would have a list of files and from those files we could determine the sublevel.

Yes it is. I've made that simplification. Nice suggestion.

internal/manifest/l0_sublevels.go, line 920 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can you expand on "not good"? Why are they not good?

Done.

internal/manifest/l0_sublevels.go, line 940 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Should add a bitSet.clearBit(i) method and use it here and below.

I added a clearAllBits instead, since that's the only use case we seem to be using.

internal/manifest/l0_sublevels.go, line 968 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Should this check be moved inside addFile? I think there is somewhere else that you have a similar pattern of checking FilesIncluded and then calling addFile.

Done. Added it as more of a guard against double-adds. This is really the only case where we can eliminate this conditional as a result of adding it to addFile; other cases either don't need it (as they're guaranteed to only add new files), or need to run other code in addition to addFile.

internal/manifest/l0_sublevels.go, line 983 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/pool/scoredIntervals/g

Also, var scoredIntervals []intervalAndScore is the idiomatic way to declare an empty slice.

Done.

internal/manifest/l0_sublevels.go, line 992 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I wonder if intervalAndScore.interval should be an index. Then in the loop below we'd do interval := &s.orderedIntervals[pool[i].index]. This may make it clearer that we're just sorting the intervals.

Done.

internal/manifest/l0_sublevels.go, line 492 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think it is more readable if the order of the operands in a conditional like this matches the order in the assignment:
if fileIntervalsByLevel[f.subLevel].min > f.minIntervalIndex {
	fileIntervalsByLevel[f.subLevel].min = f.minIntervalIndex
}
There are a few cases in this method which could be fixed.

Done.

internal/manifest/l0_sublevels.go, line 533 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

For a subsequent PR: this would be a good place to check that f is not required by the compaction. For example, if the compaction contains a newer sstable that overlaps f and siblings of f are included, we need to include f.

Yup. Can roll this into the TODO at the function signature.

internal/manifest/l0_sublevels.go, line 736 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

These examples are great.

Nit: the box is neat, though it has thrown off the alignment with the previous diagram. Perhaps add some spacing for the box in the previous diagram.

Done.

internal/manifest/l0_sublevels.go, line 741 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Won't ExtendL0ForBaseCompactionTo cause the compaction to extend to pick up the sstable b..d? Perhaps the Lbase sstable should be c..i so this won't happen.

Right, forgot about that. Added!

internal/manifest/l0_sublevels.go, line 1141 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is there a reason we're not using addFile here. The initialization of the seed file is almost identical to what is done with addFile with the exception of not adding to filesAdded. Was that omission intentional?

filesAdded was mostly a debug variable to denote files added after the initial seed. I guess it isn't too useful in that capacity anyway (and will go away when checkCompaction becomes simplified). Moved to using addFile.

internal/manifest/l0_sublevels.go, line 1162 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think this could be integrated into the loop below. The first time through the loop, lastCandidate would be nil. On successful processing of the loop we allocate lastCandidate. If the loop terminates with lastCandidate == nil then we return nil.

I'm suggesting this because I had to remind myself what the presence of this loop was doing and the above comment is great, but was offscreen when I was looking at the similar loop below. There are other possibilities for code reorg which could make this clearer, this is just one suggestion.

Done.

internal/manifest/l0_sublevels.go, line 1274 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

We don't seem to be modifying candidate.{min,max}IntervalIndex below, so I'm not clear on why we're saving away their values in preExtension{Min,Max}Interval. I could just be missing something.

We modify it in addFile.

internal/manifest/l0_sublevels.go, line 1289 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Rather than a closure, could we define endLevel = candidate.seedIntervalExtremeLevel? Then the loop below becomes:
for sl := startLevel; sl != endLevel; sl += increment { ... }

Done.

internal/manifest/l0_sublevels.go, line 1356 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This code could use a nice ascii diagram to explain what it is doing. My understanding is that it is trying to add additional files to the compaction, but has to ignore files that are currently compacting. When it encounters a compacting file it gets to decide which direction to shrink the interval span for use on the next level.

That's basically it. It gets to choose whether it should choose the left or right side of a compacting file at a given sublevel for expansion. The heuristic it uses is the longest run of non-compacting files at that sublevel (within the bounds returned by the previously iterated sublevel). Added some ascii art above the function signature.

internal/manifest/l0_sublevels.go, line 1400 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

My editor claims this assignment is not doing anything.

Right, this is after the loop so it doesn't matter.

internal/manifest/l0_sublevels.go, line 1418 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

We should probably plumb the logger here as it is preferable to Fatalf rather than panic as it results in cleaner error messages. panic can cause unusual errors if something doesn't clean up properly during the unwinding of the stack.

Added a TODO for this for now. I'll tackle this in the next PR, since it'll involve changing a couple function signatures.

internal/manifest/testdata/l0_sublevels, line 111 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think it would be worth the effort to define an input and output format for these tests that is more visual. See for example https://github.com/cockroachdb/pebble/blob/master/internal/rangedel/testdata/fragmenter.

I don't think we care about the file numbers here, but in order to get rid of them the output will have to include the boundaries for the sstables. To give you a concrete idea to start building on, what if we represented the input as an x-y grid:
L0.4      99
L0.3      8888
L0.2      777
L0.1     66
L0.0 1122 33
     abcdefghijklmnopqrstuvwxyz
Contiguous cells in a row would indicate an sstable. The number in the cell would indicate the seqnum. The above is intended to capture the setup described here.

We'd want some way to mark sstables as compacting, but that could be done via something like:
000008: intra_l0_compacting
The output rendering would need to indicate which sstables are compacting. I don't have a suggestion for that yet, but something is almost certainly doable.

I realize this would be a somewhat big lift, yet as it is these test cases are very hard to review. I think the effort will be worth it.

What if, instead of changing the input, we changed the debug string output to print in a more visual format, representing SSTs in a plot like that instead of listing all files in each level? I can see that delivering the same value while requiring less from the perspective of the test writer. I can make that change relatively easily and it wouldn't even require updating any of these tests that are already written. Thoughts?

itsbilal

I've added a TODO for the extendCandidateToRectangle tests. That method isn't completely untested (it's tested in some of the base / intra L0 compaction cases). The comments before each test case should help provide the motivation for each of them, as well as a quick summary of what's being tested. Thanks for the deep review, really appreciated!

Reviewable status: 0 of 3 files reviewed, 36 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

petermattis

Looking good. I think the main bit that remains is improving the testdata output to make it more visually obvious what the sublevel structure looks like and which compaction was picked.

Reviewable status: 0 of 3 files reviewed, 19 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 867 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The idea is to be able to make perfectly adjacent intervals out of inclusive start/end keys, as well as the interval after f.maxIntervalIndex always being a tight exclusive upper bound for any file.

In even a single sstable example (let's say it's a-b), you'll see that we generate two fileIntervals, one with startKey = {a, false} and another with startKey = {b, true} and the sstable would overlap with the first fileInterval only. Now if you were to add another sstable b-b, you'll have fileIntervals for {a, false}, {b, false}, {b, true}, with both SSTables overlapping the second one and only the first sstable in the first one. Now if a b-bb sstable comes in as well, it would bridge the gap over the {b, true} interval.

Some of the logic relies on there always being an upper bound fileInterval beyond any referenced maxIntervalIndexes. For instance, look at line 1411:
		if candidateNonCompactingFirst > firstIndex {
			minIntervalIndex = files[candidateNonCompactingFirst-1].maxIntervalIndex + 1
		}

Ack. Thanks for the explanation.

internal/manifest/l0_sublevels.go, line 938 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

extendCandidateToRectangle does exactly that, actually. The comment is slightly out of date. Updated. Don't think we need to reference compaction.grow as a result now?

The comment is better now, though I think it is still worthwhile noting that this is similar to what compaction.grow does for non-L0 compactions.

compaction.grow doesn't take it account FileMetadata.Compacting. I wonder if it should. I've filed an issue about that: #644

internal/manifest/l0_sublevels.go, line 1014 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Done.

Let's add some commentary here about why this is kosher. It won't be obvious to a future reader, and probably not even future us. I think you can steal some of the explanation from above.

internal/manifest/l0_sublevels.go, line 1274 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

We modify it in addFile.

Got it. Thanks for the pointer.

internal/manifest/l0_sublevels.go, line 1356 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

That's basically it. It gets to choose whether it should choose the left or right side of a compacting file at a given sublevel for expansion. The heuristic it uses is the longest run of non-compacting files at that sublevel (within the bounds returned by the previously iterated sublevel). Added some ascii art above the function signature.

Nice ascii art and commentary.

internal/manifest/l0_sublevels.go, line 1418 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Added a TODO for this for now. I'll tackle this in the next PR, since it'll involve changing a couple function signatures.

Ack. SGTM.

internal/manifest/l0_sublevels.go, line 148 at r5 (raw file):

	// All files in this interval, in increasing sublevel order. Indexes map
	// to s.filesByAge.

Something to clarify here is that files[i] does not correspond to sublevel i. Instead, we have to examine s.filesByAge[files[i]].subLevel. Let's call that out in the comment.

internal/manifest/l0_sublevels.go, line 738 at r5 (raw file):

//    L0.0 |a--d   f--j| l--o  p-----x
//
//    Lbase a-------i     m---------w

Nit: the endpoint i doesn't extend as far as it did in previous diagrams.

internal/manifest/l0_sublevels.go, line 752 at r5 (raw file):

//    L0.0 |a--d   f--j| l--o |p-----x|
//
//    Lbase a-------i     m---------w

Ditto about the endpoint i.

internal/manifest/l0_sublevels.go, line 796 at r5 (raw file):

//    L0.0 |a--d| | f--j| l--o |p-----x|
//         ------
//    Lbase a--------i     m---------w

Ditto about the endpoint i.

internal/manifest/l0_sublevels.go, line 946 at r5 (raw file):

	// on this compaction if it's chosen, at which point we would iterate
	// backward and choose those files.
	for currLevel := sl - 1; currLevel >= 0; currLevel-- {

This loop could be moved inside the loop below using a similar approach to what was done in intraL0CompactionUsingSeed.

internal/manifest/l0_sublevels.go, line 1128 at r5 (raw file):

	c.addFile(f)

	// The seed file captures all files in the higher level that fall in the

I think this comment should be moved down and attached to the loop which calls extendFiles.

internal/manifest/l0_sublevels.go, line 1279 at r5 (raw file):

//    Lbase a------------------i    m---------w
//
// This method then needs to choose between choosing the left side of L0.3 bb-c

s/between choosing/between/g

internal/manifest/testdata/l0_sublevels, line 111 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

What if, instead of changing the input, we changed the debug string output to print in a more visual format, representing SSTs in a plot like that instead of listing all files in each level? I can see that delivering the same value while requiring less from the perspective of the test writer. I can make that change relatively easily and it wouldn't even require updating any of these tests that are already written. Thoughts?

I had a similar thought after I wrote my comment. I think it is a good intermediate solution, perhaps even good enough for a long term solution. Part of the reason for the visual input was to ease the test writer's job.

itsbilal

Added a visualization below the describe outputs. Thanks!

If you're still finding it difficult to review the tests, try reading the comments in order from top to bottom. The cases at the top are the "base cases", and every change after that point changes the compaction in one subtle way to test some behaviour. Seeing things as a set of differences rather than as individual standalone test cases makes it a lot easier to understand.

Reviewable status: 0 of 3 files reviewed, 18 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 938 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The comment is better now, though I think it is still worthwhile noting that this is similar to what compaction.grow does for non-L0 compactions.

compaction.grow doesn't take it account FileMetadata.Compacting. I wonder if it should. I've filed an issue about that: #644

Done.

internal/manifest/l0_sublevels.go, line 148 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Something to clarify here is that files[i] does not correspond to sublevel i. Instead, we have to examine s.filesByAge[files[i]].subLevel. Let's call that out in the comment.

Done.

internal/manifest/l0_sublevels.go, line 752 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ditto about the endpoint i.

Done.

internal/manifest/l0_sublevels.go, line 796 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ditto about the endpoint i.

Done.

internal/manifest/l0_sublevels.go, line 946 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This loop could be moved inside the loop below using a similar approach to what was done in intraL0CompactionUsingSeed.

Done.

internal/manifest/l0_sublevels.go, line 1128 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think this comment should be moved down and attached to the loop which calls extendFiles.

Done.

internal/manifest/l0_sublevels.go, line 1279 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/between choosing/between/g

Done.

internal/manifest/testdata/l0_sublevels, line 111 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I had a similar thought after I wrote my comment. I think it is a good intermediate solution, perhaps even good enough for a long term solution. Part of the reason for the visual input was to ease the test writer's job.

Done.

petermattis

Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/testdata/l0_sublevels, line 18 at r6 (raw file):

L0.2:  a----b
L0.1:     b-------j
L0.0:           e-j

Something seems off about this visualization. I was expecting the keys to be mapped onto an alphabetic grid. Something like:

L0.2:  ab
L0.1:   b-------j
L0.0:      e----j

Sstables with adjacent alphabetic characters will be a bit hard to read, so we might want to adjust the input. For example, in this case we could use c rather than b:

L0.2:  a-c
L0.1:    c------j
L0.0:      e----j

internal/manifest/testdata/l0_sublevels, line 112 at r6 (raw file):

----
compaction picked with stack depth reduction 5
000006,000003,000005,000009,000010,000001,000002

I'd also like to visualize the picked compaction files. This could be the normal L0SubLevels visualization, but with only the picked files present. I'd start there and see if it is sufficient.

itsbilal

Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/testdata/l0_sublevels, line 18 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Something seems off about this visualization. I was expecting the keys to be mapped onto an alphabetic grid. Something like:
L0.2:  ab
L0.1:   b-------j
L0.0:      e----j
Sstables with adjacent alphabetic characters will be a bit hard to read, so we might want to adjust the input. For example, in this case we could use c rather than b:
L0.2:  a-c
L0.1:    c------j
L0.0:      e----j

I tried many variations of this, including what you suggested (mapping directly to an alphabet grid), and ultimately settled with this. One side benefit of the current approach is that it more accurately shows the system's perception of things; for end keys, the alphabet is on the 2nd character, and for start keys it's on the 1st character for that interval (every interval gets 3 characters).

A bunch of issues arise with being able to accurately map interval keys with isLargest = false as well as = true. Very narrow sstables have minIntervalIndex == maxIntervalIndex, so you have to ensure the end key of the wide SST here maps with p and not with n:

Visualization:
L0.3:   f---------------------p
L0.2:   fg
L0.1:   f------h
L0.0:   fg    h---j    kl    np

internal/manifest/testdata/l0_sublevels, line 112 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'd also like to visualize the picked compaction files. This could be the normal L0SubLevels visualization, but with only the picked files present. I'd start there and see if it is sufficient.

Done. Suffixed picked SSTs with a *.

petermattis

Reviewable status: 0 of 3 files reviewed, 11 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 349 at r7 (raw file):

}

func (s *L0SubLevels) visualize(compactionFiles bitSet) string {

Let's move this to l0_sublevels_test.go. I doubt we're going to use this outside of tests.

internal/manifest/l0_sublevels.go, line 351 at r7 (raw file):

func (s *L0SubLevels) visualize(compactionFiles bitSet) string {
	var buf strings.Builder
	fmt.Fprintf(&buf, "Visualization:")

I suspect this help text could be left out without causing any difficulties.

internal/manifest/l0_sublevels.go, line 362 at r7 (raw file):

		fmt.Fprintf(&buf, "L0.%d:", i)
		for j, f := range s.Files[i] {
			for curInterval < f.minIntervalIndex {

This is a big part of why I find the visualization confusing right now. You're drawing sstables based on the intervals while I was wanting to imagine them drawn based on each "key" occupying a cell.

Here is another proposal for how to show the visualization which addresses the isLargest problem: imagine a horizontal axis that that looks like aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz. Position the start key of an sstable at the first character and the end key of the table at the second character.

For one of the examples in the testdata, your visualization currently gives:

L0.2:   a---b
L0.1:      b------j
L0.0:            ej

My proposal would give:

L0.2:  a--b
L0.1:    b----------------j
L0.0:          e----------j

In order to display compacting sstables, I propose we capitalize the keys, and replace - with =:

L0.2:  A==B
L0.1:    B================J
L0.0:          E==========J

Perhaps the capitalization is unnecessary.

petermattis

Reviewable status: 0 of 3 files reviewed, 11 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 362 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This is a big part of why I find the visualization confusing right now. You're drawing sstables based on the intervals while I was wanting to imagine them drawn based on each "key" occupying a cell.

Here is another proposal for how to show the visualization which addresses the isLargest problem: imagine a horizontal axis that that looks like aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz. Position the start key of an sstable at the first character and the end key of the table at the second character.

For one of the examples in the testdata, your visualization currently gives:
L0.2:   a---b
L0.1:      b------j
L0.0:            ej
My proposal would give:
L0.2:  a--b
L0.1:    b----------------j
L0.0:          e----------j
In order to display compacting sstables, I propose we capitalize the keys, and replace - with =:
L0.2:  A==B
L0.1:    B================J
L0.0:          E==========J
Perhaps the capitalization is unnecessary.

Here's another example. Current vis:

L0.4:                  f---g
L0.3:                  f---------i
L0.2:                  f------h
L0.1:               e---f
L0.0:   ab    cd       f---g

My proposal:

L0.4:            f--g
L0.3:            f------i
L0.2:            f----h
L0.1:          e--f
L0.0:  a--bc--d  f--g

While it is a bit disturbing to see a--bc--d, we can always adjust the test data if we want to have "space" between the sstables. Alternately, make the grid slightly wider:

L0.4:                 f---g
L0.3:                 f---------i
L0.2:                 f-----h
L0.1:              e---f
L0.0:  a---b c---d    f---g
       aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv ww xx yy zz

I like this last suggestion the best so far and I think it gets everything both of us want. Apologies for the bike shedding on this.

petermattis

Reviewable status: 0 of 3 files reviewed, 11 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 362 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Here's another example. Current vis:
L0.4:                  f---g
L0.3:                  f---------i
L0.2:                  f------h
L0.1:               e---f
L0.0:   ab    cd       f---g
My proposal:
L0.4:            f--g
L0.3:            f------i
L0.2:            f----h
L0.1:          e--f
L0.0:  a--bc--d  f--g
While it is a bit disturbing to see a--bc--d, we can always adjust the test data if we want to have "space" between the sstables. Alternately, make the grid slightly wider:
L0.4:                 f---g
L0.3:                 f---------i
L0.2:                 f-----h
L0.1:              e---f
L0.0:  a---b c---d    f---g
       aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv ww xx yy zz
I like this last suggestion the best so far and I think it gets everything both of us want. Apologies for the bike shedding on this.

One final bike shed color, use + to indicate compactions:

L0.4:                 f+++g
L0.3:                 f+++++++++i
L0.2:                 f+++++h
L0.1:              e+++f
L0.0:  a---b c---d    f+++g

itsbilal

Reviewable status: 0 of 3 files reviewed, 11 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels.go, line 349 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Let's move this to l0_sublevels_test.go. I doubt we're going to use this outside of tests.

Done.

internal/manifest/l0_sublevels.go, line 351 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I suspect this help text could be left out without causing any difficulties.

Done.

internal/manifest/l0_sublevels.go, line 362 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

One final bike shed color, use + to indicate compactions:

L0.4:                 f+++g
L0.3:                 f+++++++++i
L0.2:                 f+++++h
L0.1:              e+++f
L0.0:  a---b c---d    f+++g

Done. Appreciate the bikeshedding, as I was able to just implement the final suggestion as-is. Thanks!

petermattis

I'm still working through the tests, but I've got a few more things for you to work through in the meantime.

Reviewable status: 0 of 3 files reviewed, 13 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)

internal/manifest/testdata/l0_sublevels, line 17 at r8 (raw file):

	000003:e#5,1-j#7,1
compacting file count: 0, base compacting intervals:

The blank line here is fouling up the datadriven tests which expect the output to have no blank lines. This shows up as the doubled ---- lines immediately after the input. You might need to do some manual massaging of this file after you fix this up.

internal/manifest/testdata/l0_sublevels, line 21 at r8 (raw file):

L0.1:     b------------------------j
L0.0:              e---------------j
       aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv ww xx yy zz

Perhaps cut-off the horizontal "axis" after the largest sstable endpoint key. That is, truncate the above to aa bb cc dd ee ff gg hh ii jj.

internal/manifest/testdata/l0_sublevels, line 71 at r8 (raw file):


L0.1:  a---b
L0.0:     b------------------------j e---j

There is probably something better than could be done here for overlapping sstables, though I don't have a specific idea right now.

internal/manifest/testdata/l0_sublevels, line 172 at r8 (raw file):

L0.2:                 f------h
L0.1:              e---f
L0.0:  a---b c---d    f---g

Can we display c---d as c^^^^d to indicate it is intra-L0 compacting? And c___d to indicate base compacting.

internal/manifest/testdata/l0_sublevels, line 211 at r8 (raw file):

	000002:c#3,1-d#5,1
	000006:f#4,1-g#5,1
compacting file count: 6, base compacting intervals: [4, 9],

Let's output the visualization here so we can see all the compacting files.

internal/manifest/testdata/l0_sublevels, line 421 at r8 (raw file):

1

# Ensure that when 0011 is not base compacting, it's chosen for compactions

Nit: there are still a lot of sstable numbers displayed as %04d in this file, both in comments and in the input.

Are the file numbers important any more? Could you omit them from the input and assign them automatically. I think it might be better to refer to sstables by their level and start and end key. For example, L0.0:[f,g].

internal/manifest/testdata/l0_sublevels, line 490 at r8 (raw file):

L6
  0007:a.SET.0-f.SET.0
  0008:g.SET.0-z.SET.0 compacting

Can we extend the visualization to include non-L0 levels? I was confused as to what was happening when just looking at the visualization.

itsbilal

TFTR!

Reviewable status: 0 of 3 files reviewed, 13 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/testdata/l0_sublevels, line 17 at r8 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The blank line here is fouling up the datadriven tests which expect the output to have no blank lines. This shows up as the doubled ---- lines immediately after the input. You might need to do some manual massaging of this file after you fix this up.

Done.

internal/manifest/testdata/l0_sublevels, line 21 at r8 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Perhaps cut-off the horizontal "axis" after the largest sstable endpoint key. That is, truncate the above to aa bb cc dd ee ff gg hh ii jj.

Done.

internal/manifest/testdata/l0_sublevels, line 71 at r8 (raw file):

Previously, petermattis (Peter Mattis) wrote…

There is probably something better than could be done here for overlapping sstables, though I don't have a specific idea right now.

Given it's a test case that's crafted specifically to test check-ordering, I'd rather not worry about addressing it specifically.

internal/manifest/testdata/l0_sublevels, line 172 at r8 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can we display c---d as c^^^^d to indicate it is intra-L0 compacting? And c___d to indicate base compacting.

Done.

internal/manifest/testdata/l0_sublevels, line 211 at r8 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Let's output the visualization here so we can see all the compacting files.

Done.

internal/manifest/testdata/l0_sublevels, line 421 at r8 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: there are still a lot of sstable numbers displayed as %04d in this file, both in comments and in the input.

Are the file numbers important any more? Could you omit them from the input and assign them automatically. I think it might be better to refer to sstables by their level and start and end key. For example, L0.0:[f,g].

Fixed. I think the file nums are still a convenient marker to have, one that's more agnostic to slight modifications to test cases (which can completely change the sublevel structure sometimes).

internal/manifest/testdata/l0_sublevels, line 490 at r8 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can we extend the visualization to include non-L0 levels? I was confused as to what was happening when just looking at the visualization.

Done.

petermattis

Reviewable status: 0 of 3 files reviewed, 16 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)

internal/manifest/testdata/l0_sublevels, line 71 at r8 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Given it's a test case that's crafted specifically to test check-ordering, I'd rather not worry about addressing it specifically.

Ack. Agreed this is fine for now. Perhaps even fine forever.

internal/manifest/testdata/l0_sublevels, line 421 at r8 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Fixed. I think the file nums are still a convenient marker to have, one that's more agnostic to slight modifications to test cases (which can completely change the sublevel structure sometimes).

Personally, I'm tending to ignore the filenums now, but they aren't bothersome. If you find them useful, then leave them in.

internal/manifest/testdata/l0_sublevels, line 201 at r9 (raw file):

	000003:e#5,1-f#7,1
compacting file count: 6, base compacting intervals: [4, 9]
L0.4:                 f___g

I know it was my suggestion, but using _ for base compaction is not very distinguishable from -. Any other ideas?

internal/manifest/testdata/l0_sublevels, line 258 at r9 (raw file):

       aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss

# Set SSTable 000011 which is "under" SSTable 000009 to IsBaseCompacting = true.

Nit: s/under/newer/g

internal/manifest/testdata/l0_sublevels, line 259 at r9 (raw file):


# Set SSTable 000011 which is "under" SSTable 000009 to IsBaseCompacting = true.
# This should prevent SSTable 000009 from participating in a base compaction.

Am I misreading the test output. Looks like 000009 still participated in the base compaction.

It should be impossible for [n,p] to be base compacting while an older table that it overlaps with in L0 is not base compacting, right?

internal/manifest/testdata/l0_sublevels, line 360 at r9 (raw file):

4

# Assume the above base compaction is chosen. This should reduce max depth after

The above was an intra-L0 compaction, not a base compaction.

internal/manifest/testdata/l0_sublevels, line 365 at r9 (raw file):

define
L0
  000005:f.SET.6-h.SET.9 base_compacting

Should this be intra-L0 compacting? As it stands, the test input should be impossible (newer sstables participating in a base compaction which overlap with an older sstable that isn't being compacted).

internal/manifest/testdata/l0_sublevels, line 619 at r9 (raw file):


# In L0.0, SST 000007 is marked as base compacting. There are two SSTs to the left
# of it in the sublevel, and one to its right. The ones to its left should be

I only see one sstable to the left of 000007 in L0.1. Am I misreading the comment?

internal/manifest/testdata/l0_sublevels, line 620 at r9 (raw file):

# In L0.0, SST 000007 is marked as base compacting. There are two SSTs to the left
# of it in the sublevel, and one to its right. The ones to its left should be
# chosen by extendCandidateToRectangle.

It isn't clear to me that extendCandidateToRectangle is noticing [k,l] in this scenario as [k,l] doesn't overlap any of the sstables that are part of the intra-L0 seed. Perhaps [k,l] should be [i,l]. Or explain what I'm missing.

internal/manifest/testdata/l0_sublevels, line 673 at r9 (raw file):



# Now shift the base_compacting marker one SST to the left. But since file 6

Ah, I think you're using "left" to mean left in the L0 input, not "left" within the L0 sublevel. That's confusing here and in one of the earlier comments.

This change adds methods to L0SubLevels to help pick, score, and generate L0 -> LBase, and L0 -> L0 compactions, based on information captured in the data structure about L0 sublevels. These functions will be called from in compaction.go and compaction_picker.go in a future change. Also adds associated datadriven unit tests, and a benchmark. Covers a large part of cockroachdb#563. Thanks Sumeer for his work, most of this was written by him.

itsbilal

Done. Sorry for the wasted effort, the one sort slipped things up enough to make the tests not make sense. Fixed now!

Reviewable status: 0 of 3 files reviewed, 15 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/testdata/l0_sublevels, line 201 at r9 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I know it was my suggestion, but using _ for base compaction is not very distinguishable from -. Any other ideas?

How does v seem? Updated.

internal/manifest/testdata/l0_sublevels, line 258 at r9 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: s/under/newer/g

Under would be older.

internal/manifest/testdata/l0_sublevels, line 259 at r9 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Am I misreading the test output. Looks like 000009 still participated in the base compaction.

It should be impossible for [n,p] to be base compacting while an older table that it overlaps with in L0 is not base compacting, right?

Bad initialization. Fixed.

internal/manifest/testdata/l0_sublevels, line 360 at r9 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The above was an intra-L0 compaction, not a base compaction.

There's a base compaction above the intra-L0 compaction, that's the one I chose.

internal/manifest/testdata/l0_sublevels, line 365 at r9 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Should this be intra-L0 compacting? As it stands, the test input should be impossible (newer sstables participating in a base compaction which overlap with an older sstable that isn't being compacted).

The visualization is wrong, my bad. Should be fixed now.

internal/manifest/testdata/l0_sublevels, line 619 at r9 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I only see one sstable to the left of 000007 in L0.1. Am I misreading the comment?

Bad visualization again, my bad. Fixed.

internal/manifest/testdata/l0_sublevels, line 620 at r9 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It isn't clear to me that extendCandidateToRectangle is noticing [k,l] in this scenario as [k,l] doesn't overlap any of the sstables that are part of the intra-L0 seed. Perhaps [k,l] should be [i,l]. Or explain what I'm missing.

Bad visualization again, my bad. Fixed.

internal/manifest/testdata/l0_sublevels, line 673 at r9 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ah, I think you're using "left" to mean left in the L0 input, not "left" within the L0 sublevel. That's confusing here and in one of the earlier comments.

No, you were right, it's left in the sublevel. Bad initialization trips up again. Fixed.

petermattis

Tests look good!

Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)

internal/manifest/l0_sublevels_test.go, line 298 at r10 (raw file):

			}
			SortBySeqNum(fileMetas[0])
			for i := 1; i < NumLevels; i++ {

Oops!

internal/manifest/l0_sublevels_test.go, line 299 at r10 (raw file):

			SortBySeqNum(fileMetas[0])
			for i := 1; i < NumLevels; i++ {
				SortBySmallest(fileMetas[i], base.DefaultComparer.Compare)

Double oops!

internal/manifest/testdata/l0_sublevels, line 201 at r9 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

How does v seem? Updated.

Definitely better, though we should avoid ever having an end-key of v.

itsbilal

Thanks for the in-depth reviewing!

Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

internal/manifest/l0_sublevels_test.go, line 298 at r10 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Oops!

😶

itsbilal requested review from jbowens, petermattis and sumeerbhola April 8, 2020 16:57

itsbilal self-assigned this Apr 8, 2020

itsbilal mentioned this pull request Apr 8, 2020

internal/manifest: Introduce L0SubLevels data structure #614

Merged

itsbilal force-pushed the l0-sublevels-phase-1.5 branch 3 times, most recently from f468826 to fcfd0f4 Compare April 8, 2020 22:23

itsbilal changed the title ~~internal/manifest: Add L0SubLevels methods to pick compactions~~ [WIP] internal/manifest: Add L0SubLevels methods to pick compactions Apr 15, 2020

itsbilal force-pushed the l0-sublevels-phase-1.5 branch 4 times, most recently from 4eb7e2a to ae52503 Compare April 23, 2020 16:06

itsbilal changed the title ~~[WIP] internal/manifest: Add L0SubLevels methods to pick compactions~~ internal/manifest: Add L0SubLevels methods to pick compactions Apr 23, 2020

petermattis reviewed Apr 23, 2020

View reviewed changes

petermattis reviewed Apr 25, 2020

View reviewed changes

petermattis reviewed Apr 26, 2020

View reviewed changes

petermattis mentioned this pull request Apr 27, 2020

internal/manifest: remove unused intervalRangeIsBaseCompacting local #641

Merged

itsbilal force-pushed the l0-sublevels-phase-1.5 branch 2 times, most recently from c1d6a7c to 2335f7c Compare April 27, 2020 20:31

itsbilal commented Apr 27, 2020

View reviewed changes

itsbilal mentioned this pull request Apr 27, 2020

[WIP] *: Wire up compaction picking with L0Sublevels, enable flush splits #642

Closed

sumeerbhola requested changes Apr 27, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from 2335f7c to d85254f Compare April 28, 2020 16:26

itsbilal commented Apr 28, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from d85254f to 615e06b Compare April 28, 2020 19:23

petermattis reviewed Apr 28, 2020

View reviewed changes

sumeerbhola reviewed Apr 29, 2020

View reviewed changes

petermattis reviewed Apr 29, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from 615e06b to b5647f8 Compare April 29, 2020 21:22

itsbilal commented Apr 29, 2020

View reviewed changes

petermattis reviewed Apr 30, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from b5647f8 to 5aedf70 Compare April 30, 2020 20:17

itsbilal commented Apr 30, 2020

View reviewed changes

petermattis reviewed Apr 30, 2020

View reviewed changes

itsbilal commented Apr 30, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from 5aedf70 to cb8b480 Compare April 30, 2020 21:29

petermattis reviewed May 1, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from cb8b480 to f400531 Compare May 1, 2020 16:37

itsbilal commented May 1, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from f400531 to 08625c3 Compare May 1, 2020 16:47

petermattis reviewed May 1, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from 08625c3 to 9999de2 Compare May 4, 2020 16:20

itsbilal commented May 4, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from 9999de2 to aa5fe0e Compare May 4, 2020 17:32

petermattis reviewed May 4, 2020

View reviewed changes

itsbilal force-pushed the l0-sublevels-phase-1.5 branch from aa5fe0e to 181ecd9 Compare May 4, 2020 17:58

itsbilal commented May 4, 2020

View reviewed changes

petermattis approved these changes May 4, 2020

View reviewed changes

itsbilal commented May 5, 2020

View reviewed changes

itsbilal merged commit 00ded19 into cockroachdb:master May 5, 2020

Conversation

itsbilal commented Apr 8, 2020

Uh oh!

petermattis commented Apr 8, 2020

Uh oh!

itsbilal commented Apr 15, 2020

Uh oh!

itsbilal commented Apr 23, 2020

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment