Skip to content

*: Introduce read triggered compaction heuristic#1009

Merged
aadityasondhi merged 1 commit intocockroachdb:masterfrom
aadityasondhi:read-triggered-compactions
Dec 21, 2020
Merged

*: Introduce read triggered compaction heuristic#1009
aadityasondhi merged 1 commit intocockroachdb:masterfrom
aadityasondhi:read-triggered-compactions

Conversation

@aadityasondhi
Copy link
Copy Markdown
Contributor

@aadityasondhi aadityasondhi commented Nov 23, 2020

Previously, our compactions were triggered on write based heuristics.
This change introduces compactions based on high read activity to improve
read performance in read-heavy workloads. It is inspired from LevelDB's
read based compaction heuristic which is outlined in the following issue:
#29.

These compactions are triggered using an AllowedSeeks parameter on
each file's FileMetadata. Reads are sampled based on a rate defined in
iterator.MaybeSampleRead(). If a read is sampled, AllowedSeeks is
decremented on the file in the top-most level containing the key. Once
AllowedSeeks reaches 0, a compaction for the key range is scheduled.

Read triggered compactions are only considered if no other compaction is
possible at the time. This helps prioritize score based compactions to
maintain a healthy LSM shape.

The results of this change in benchmarks are outlined in the github
issue: #29.

Fixes #29.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall some really exciting work! I left some comments after a first pass that you can work on in the meantime. One thing in addition to those: we're going to need tests for this functionality, and the way I see it, there are two moving pieces that need to be individually unit tested: 1) pickReadTriggeredCompaction, and 2) the read sampling in Iterator.

For the compaction picking side, it would be a good idea to spin up a datadriven test that lets you define a couple d.mu.readCompactions scenarios (some with files moved around, some with the simple case) and confirm if pickReadTriggeredCompaction still picks the sensible top file in each case.

The sampling side will be a bit harder, but I can imagine some tests (datadriven or not) that spin up an iterator, reduce AllowedSeeks to 1 for a file at the top, really increase the sampling rate, then confirm that one read puts that file in readCompactions. Feel free to look around the existing iterator tests to see if you can draw some inspiration for something like this, and let me know if you need more help with this part.

Reviewed 15 of 15 files at r1.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @aadityasondhi, @jbowens, @petermattis, and @sumeerbhola)


compaction.go, line 1064 at r1 (raw file):

}

type readCompaction struct {

Would be a good idea to add a comment here on why readCompactions are based off of key ranges and not file handles (as the version could change between the sampling and the scheduling).


compaction.go, line 1210 at r1 (raw file):

}

func (d *DB) checkFlushThreshold() bool {

Nit: this can be passedFlushThreshold to clarify what the return value is.


compaction.go, line 1540 at r1 (raw file):

		return
	}
	for len(d.mu.compact.readCompactions) > 0 && d.mu.compact.compactingCount < d.opts.MaxConcurrentCompactions && !d.mu.compact.flushing {

All of this should be inside the compaction picker, instead of in this db-level function. See the case of pickElisionOnlyCompaction inside compactionPickerByScore.pickAuto; I imagine pickReadTriggeredCompaction will just be another case like pickElisionOnlyCompaction at the end of that method. You can do everything except for the actual c := newCompaction(...); d.addInProgressCompaction(c); go d.compact() stuff in that method.

The advantage of this approach is that it separates compaction picking code with compaction executing code.


compaction_picker.go, line 1421 at r1 (raw file):

) (slice manifest.LevelSlice, shouldCompact bool) {
	numOverlap, topLevel := 0, 0
	var topOverlapping manifest.LevelSlice

Nit: this can be topOverlaps for better connection with the method that generated it.


compaction_picker.go, line 1434 at r1 (raw file):

	}
	if numOverlap >= 2 {
		shouldCompact = true

Nit: unnecessary


db.go, line 313 at r1 (raw file):

			// inProgress is the set of in-progress flushes and compactions.
			inProgress map[*compaction]struct{}
			// readCompactions is a list read triggered compactions. The next compaction

s/list/list of/


iterator.go, line 177 at r1 (raw file):

	}
	// Approximate gap in bytes between samples of data read during iteration.
	readBytesPeriod := 1048576 * i.readState.db.opts.Experimental.ReadSamplingRate

The 1048576 value should be in a const defined at the top. ReadSamplingRate can probably be renamed to ReadSamplingMultiplier.


iterator.go, line 197 at r1 (raw file):

func (i *Iterator) sampleRead() {
	topFile := i.readSampling.topFile
	topLevel, numOverlappingLevels := numLevels+1, 0

numLevels is already an invalid level number (7), so you could just set it to that here, and check if topLevel >= numLevels below.


iterator.go, line 200 at r1 (raw file):

	if mi, ok := i.iter.(*mergingIter); ok {
		if len(mi.levels) > 1 {
			for _, iter := range mi.levels {

This code block looks like a pyramid. Might be a good idea to add a "forEachLevelItermethod tomergingIterthat takes in afunc(li *levelIter), and you define the stuff in the ok` block below in a separate closure that you pass into that method.


iterator.go, line 205 at r1 (raw file):

					file := li.files.Current()
					if file != nil {
						numOverlappingLevels++

You can probably stop this loop right here if numOverlappingLevels has exceeded 2. Quick optimization. I believe mi.levels is organized top-to-bottom so you should be able to do this no problem, but you can confirm that.


options.go, line 336 at r1 (raw file):

		// ReadCompactionThreshold controls the frequency of read triggered
		// compactions by adjusting FileMetadata.AllowedSeeks.

You could mention the formula used to set AllowedSeeks here. Mostly to give the intuition that a higher threshold = more read compactions happening, as allowedSeeks would be lower. In that case, it seems less like a read compaction threshold and more like a read compaction rate., as the units are "number of bytes of file size per allowed seek"


internal/manifest/level.go, line 27 at r1 (raw file):

// LevelInt returns the int representation of a Level
func LevelInt(l Level) int {

This sounds like a type in itself. Could probably be renamed to LevelToInt.


internal/manifest/version.go, line 97 at r1 (raw file):

	IsIntraL0Compacting bool
	Atomic              struct {
		AllowedSeeks int64

Exported vars usually have a comment.

@aadityasondhi aadityasondhi force-pushed the read-triggered-compactions branch from 972300e to 3ed701d Compare November 25, 2020 20:17
Copy link
Copy Markdown
Contributor Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 16 files reviewed, 9 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)


compaction.go, line 1540 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

All of this should be inside the compaction picker, instead of in this db-level function. See the case of pickElisionOnlyCompaction inside compactionPickerByScore.pickAuto; I imagine pickReadTriggeredCompaction will just be another case like pickElisionOnlyCompaction at the end of that method. You can do everything except for the actual c := newCompaction(...); d.addInProgressCompaction(c); go d.compact() stuff in that method.

The advantage of this approach is that it separates compaction picking code with compaction executing code.

Done.


db.go, line 313 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

s/list/list of/

Done.


iterator.go, line 177 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The 1048576 value should be in a const defined at the top. ReadSamplingRate can probably be renamed to ReadSamplingMultiplier.

Done.


iterator.go, line 197 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

numLevels is already an invalid level number (7), so you could just set it to that here, and check if topLevel >= numLevels below.

Done.


iterator.go, line 200 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This code block looks like a pyramid. Might be a good idea to add a "forEachLevelItermethod tomergingIterthat takes in afunc(li *levelIter), and you define the stuff in the ok` block below in a separate closure that you pass into that method.

Done.


iterator.go, line 205 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

You can probably stop this loop right here if numOverlappingLevels has exceeded 2. Quick optimization. I believe mi.levels is organized top-to-bottom so you should be able to do this no problem, but you can confirm that.

Done.


options.go, line 336 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

You could mention the formula used to set AllowedSeeks here. Mostly to give the intuition that a higher threshold = more read compactions happening, as allowedSeeks would be lower. In that case, it seems less like a read compaction threshold and more like a read compaction rate., as the units are "number of bytes of file size per allowed seek"

Done.


internal/manifest/level.go, line 27 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This sounds like a type in itself. Could probably be renamed to LevelToInt.

Done.


internal/manifest/version.go, line 97 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Exported vars usually have a comment.

Done.

Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this is taking shape! Some more, relatively minor comments that you can get to soon enough. The tests are really the only remaining major piece; after that, this should be good.

Reviewed 14 of 14 files at r2.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @aadityasondhi, @jbowens, @petermattis, and @sumeerbhola)


compaction_picker.go, line 1338 at r2 (raw file):

) (pc *pickedCompaction) {
	if env.readTriggeredCompactionEnv.flushing || env.readTriggeredCompactionEnv.readCompactions == nil {
		return nil

Worthwhile to add a comment here on why we don't schedule read compactions if flushing is true.


compaction_picker.go, line 1349 at r2 (raw file):

	if pc != nil {
		if pc.startLevel.level == 0 && pc.outputLevel.level == 0 {
			panic("intra L0 = BAD!")

I assume this was a debug statement, ha. Either way, newPickedCompaction takes care of this (it only creates non-intra-L0 compactions) so you're good.


iterator.go, line 187 at r2 (raw file):

	}
	if i.readSampling.randvar == nil {
		i.readSampling.randvar = randvar.NewUniform(0, 2*samplingPeriod)

Any reason to not roll this 2* multiplier into the sampling period itself, or in the default for ReadSamplingMultiplier?


iterator.go, line 201 at r2 (raw file):

	topLevel, numOverlappingLevels := numLevels, 0
	if mi, ok := i.iter.(*mergingIter); ok {
		if len(mi.levels) > 1 {

This is unnecessary, as your closure would simply not be executed if for any reason there are no levels.


iterator.go, line 208 at r2 (raw file):

						numOverlappingLevels++
						if numOverlappingLevels >= 2 {
							// terminate the loop early if at least 2 overlapping levels are found

Nit: Comments in Pebble are usually capitalized like full sentences and with periods at the end. (Same for the other one below).


version_set.go, line 410 at r2 (raw file):

		var err error
		newVersion, zombies, err = bve.Apply(currentVersion, vs.cmp, vs.opts.Comparer.FormatKey, vs.opts.FlushSplitBytes, vs.opts.Experimental.ReadCompactionRate)

This function signature is getting very long. This doesn't need to be done in this PR, but it would be worthwhile to have an applyOptions struct, defined in the same package as Apply (so internal/manifest), that takes all of these options.


internal/manifest/version.go, line 99 at r2 (raw file):

	Atomic struct {
		// AllowedSeeks is used to determine if a file should be picked for
		// a read triggered compaction. It is decremented when read sampling

Mention when it's decremented - after every positioning operation that returns a user key (eg. Next, Prev, SeekGE, SeekLT, etc).

@aadityasondhi aadityasondhi force-pushed the read-triggered-compactions branch 2 times, most recently from 488a28a to 82e58b4 Compare November 30, 2020 13:58
@aadityasondhi aadityasondhi force-pushed the read-triggered-compactions branch 2 times, most recently from 67f1556 to 19cbfd1 Compare December 10, 2020 15:54
Copy link
Copy Markdown
Contributor Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 12 of 19 files reviewed, 5 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)


compaction_picker.go, line 1349 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I assume this was a debug statement, ha. Either way, newPickedCompaction takes care of this (it only creates non-intra-L0 compactions) so you're good.

Done.


iterator.go, line 187 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Any reason to not roll this 2* multiplier into the sampling period itself, or in the default for ReadSamplingMultiplier?

This is because the random variable is a uniform distribution and we want the expected value to be the constant we set. We get that by multiplying by 2. LevelDB also does something similar here: https://github.com/google/leveldb/blob/master/db/db_iter.cc#L106


iterator.go, line 201 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This is unnecessary, as your closure would simply not be executed if for any reason there are no levels.

This was intended to be a bit of a micro-optimization. I am checking for at least 2 levels here so that there is a chance there is an overlap. If there is only one level then my loop underneath would run but there is no chance I find an overlap. Maybe I am just overthinking. Let me know if this makes sense, otherwise happy to remove.


version_set.go, line 410 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This function signature is getting very long. This doesn't need to be done in this PR, but it would be worthwhile to have an applyOptions struct, defined in the same package as Apply (so internal/manifest), that takes all of these options.

Going to defer this to later in a separate PR.


internal/manifest/version.go, line 99 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Mention when it's decremented - after every positioning operation that returns a user key (eg. Next, Prev, SeekGE, SeekLT, etc).

Done.

Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on tests! I think the non-test code is in very good shape, save for the one edge case we discussed this morning (and I see you have a TODO for that already).

For some of the tests I'm not entirely convinced if maybe-compact is actually doing read compactions or if it's doing score-based compactions that happen to be what you expect, so I left some suggestions around confirming this and messing with file sizes to quieten down score-based compactions.

Reviewed 8 of 8 files at r4.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @aadityasondhi, @itsbilal, @jbowens, @petermattis, and @sumeerbhola)


compaction_picker.go, line 1338 at r4 (raw file):

) (pc *pickedCompaction) {
	// If a flush is in-progress or expected to happen soon, it means more writes are taking place. We would
	// then be scheduling more write focussed compactions. In this case, skip read compactions as they are

Nit: We would "soon" be scheduling more write-based compactions.


compaction_picker.go, line 1340 at r4 (raw file):

	// then be scheduling more write focussed compactions. In this case, skip read compactions as they are
	// lower priority.
	if env.readTriggeredCompactionEnv.flushing || env.readTriggeredCompactionEnv.readCompactions == nil {

len(env.readTriggeredCompactionEnv.readCompactions) == 0 is slightly more general than the nil check and more performant.

Also readTriggeredCompactionEnv seems like a very long variable name. I'd be open to just readCompactionEnv or rtCompactionEnv.


compaction_test.go, line 1705 at r4 (raw file):

						case "true":
							d.mu.compact.flushing = true
						case "false":

Nit: can remove this, as default would capture it too.


compaction_test.go, line 1718 at r4 (raw file):

					parts := strings.Split(line, " ")
					if len(parts) != 2 {
						panic("malformed data for set-read-compaction")

Might be better to just return an error string here, like "


compaction_test.go, line 1724 at r4 (raw file):

						rc := readCompaction{
							level: l,
							start: []byte(keys[0]),

I think you'd have to run this by ParseInternalKey, and also use the a.SET.5 format instead of a#5,SET.


compaction_test.go, line 1738 at r4 (raw file):

				d.mu.Lock()
				d.opts.private.disableAutomaticCompactions = false
				d.maybeScheduleCompaction()

You might be scheduling non-read compactions here by mistake, if you aren't defining enough files in other levels to have a proper LSM shape. It would be worth setting the size parameter in your tests so you can ensure that isn't happening. You can also add a show-read-compactions command that just shows that queue, and if it's empty after this call, you can confirm that the read compaction you added earlier was taken up.


iterator.go, line 201 at r2 (raw file):

Previously, aadityasondhi (Aaditya Sondhi) wrote…

This was intended to be a bit of a micro-optimization. I am checking for at least 2 levels here so that there is a chance there is an overlap. If there is only one level then my loop underneath would run but there is no chance I find an overlap. Maybe I am just overthinking. Let me know if this makes sense, otherwise happy to remove.

Sounds good.


iterator.go, line 191 at r4 (raw file):

	}
	if i.readSampling.randvar == nil {
		i.readSampling.randvar = randvar.NewUniform(0, 2*samplingPeriod)

Having a lot of conditionals can also be bad for performance. It might be a good idea to merge this with the above conditional, as you should never have any case where rand == nil but randVar != nil or vice versa. Just do this initialization inside the same if statement as above.


iterator.go, line 209 at r4 (raw file):

				l := manifest.LevelToInt(li.level)
				if file := li.files.Current(); file != nil {
					if containsKey := i.cmp(file.Smallest.UserKey, i.key) <= 0 && i.cmp(file.Largest.UserKey, i.key) >= 0; containsKey {

Full-key comparisons can actually be somewhat slow. There's one optimization you can make here; you need to only check i.cmp(file.Smallest.UserKey, i.key) <= 0 if if i.dir == iterPosNext || i.dir == iterPosCurForward, and i.cmp(file.Largest.UserKey, i.key) >= 0 if i.dir == iterPosPrev || i.dir == iterPosCurReverse

Also containsKey var is unnecessary. But if this line gets really long after the suggested change, you can break that var definition out to be before the if statement, and then just do if containsKey.


iterator_test.go, line 536 at r4 (raw file):

				return err.Error()
			}
			d.opts.private.forceReadSampling = true

As mentioned offline, putting this in the iterator may be cleaner. You can do that down in the iter case.


iterator_test.go, line 589 at r4 (raw file):

			return runIterCmd(td, iter)

		case "read-compaction":

Nit: read-compactions, the plural


iterator_test.go, line 595 at r4 (raw file):

			d.mu.Lock()
			s := fmt.Sprintf("%+v", d.mu.compact.readCompactions)

While this works, it's neater to print one read compaction per line, in a more human-friendly format.


iterator_test.go, line 599 at r4 (raw file):

			return s

		case "pending-read-compaction":

Nit: iter-read-compactions may be a bit more intuitive of a name.


options.go, line 342 at r4 (raw file):

		//
		// TODO(aadityasondhi): Experiment with this to find a good value
		ReadSamplingMultiplier float64

This can become a uint64 again now that you don't need to pass in fractions for tests.


testdata/compaction_read_triggered, line 17 at r4 (raw file):

----

maybe-compact

Add a read-compactions call that shows the state of that queue before and after each of these maybe-compact calls. Otherwise tests look good.

@itsbilal
Copy link
Copy Markdown
Contributor


compaction_test.go, line 1718 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Might be better to just return an error string here, like "

Oops, incomplete comment. I meant like usage: <level>: start.SET.1-end.SET.2

@aadityasondhi aadityasondhi force-pushed the read-triggered-compactions branch from 19cbfd1 to daa62ea Compare December 13, 2020 22:49
Copy link
Copy Markdown
Contributor Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I made almost all of the changes you recommended. Had a couple of comments/questions below.

Reviewable status: 11 of 19 files reviewed, 13 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)


compaction_picker.go, line 1340 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

len(env.readTriggeredCompactionEnv.readCompactions) == 0 is slightly more general than the nil check and more performant.

Also readTriggeredCompactionEnv seems like a very long variable name. I'd be open to just readCompactionEnv or rtCompactionEnv.

env.readTriggeredCompactionEnv.readCompactions is a pointer to the list so I am doing a null check. I think it would panic if I just check for len() without making sure its not nil first.

I renamed it though. I agree it is a little too long


compaction_test.go, line 1718 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Oops, incomplete comment. I meant like usage: <level>: start.SET.1-end.SET.2

Done.


compaction_test.go, line 1724 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I think you'd have to run this by ParseInternalKey, and also use the a.SET.5 format instead of a#5,SET.

Done.


compaction_test.go, line 1738 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

You might be scheduling non-read compactions here by mistake, if you aren't defining enough files in other levels to have a proper LSM shape. It would be worth setting the size parameter in your tests so you can ensure that isn't happening. You can also add a show-read-compactions command that just shows that queue, and if it's empty after this call, you can confirm that the read compaction you added earlier was taken up.

I added the show-read-compactions command. By size, are you referring to FileMetadata.Size?


iterator.go, line 191 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Having a lot of conditionals can also be bad for performance. It might be a good idea to merge this with the above conditional, as you should never have any case where rand == nil but randVar != nil or vice versa. Just do this initialization inside the same if statement as above.

Put it in with a || to still check both. Let me know if we would rather just only check one. I feel a little uneasy not doing the null check for both even if there isn't a case where it could happen.


iterator.go, line 209 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Full-key comparisons can actually be somewhat slow. There's one optimization you can make here; you need to only check i.cmp(file.Smallest.UserKey, i.key) <= 0 if if i.dir == iterPosNext || i.dir == iterPosCurForward, and i.cmp(file.Largest.UserKey, i.key) >= 0 if i.dir == iterPosPrev || i.dir == iterPosCurReverse

Also containsKey var is unnecessary. But if this line gets really long after the suggested change, you can break that var definition out to be before the if statement, and then just do if containsKey.

That's true, thanks for pointing it out.


iterator_test.go, line 536 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

As mentioned offline, putting this in the iterator may be cleaner. You can do that down in the iter case.

Done.


iterator_test.go, line 595 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

While this works, it's neater to print one read compaction per line, in a more human-friendly format.

Done.


options.go, line 342 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This can become a uint64 again now that you don't need to pass in fractions for tests.

Done.


testdata/compaction_read_triggered, line 17 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Add a read-compactions call that shows the state of that queue before and after each of these maybe-compact calls. Otherwise tests look good.

Done.

@aadityasondhi aadityasondhi force-pushed the read-triggered-compactions branch from daa62ea to 047010c Compare December 13, 2020 23:36
Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! I imagine this is the last round of suggested fixes, and then this is good to go. The sampling tests look good. Only major suggestion is to update the way define works in the read compaction picking tests, and to test / highlight two other cases there.

Non-test code looks good to merge.

Reviewed 10 of 10 files at r5.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @aadityasondhi, @jbowens, @petermattis, and @sumeerbhola)


compaction_picker.go, line 1340 at r4 (raw file):

Previously, aadityasondhi (Aaditya Sondhi) wrote…

env.readTriggeredCompactionEnv.readCompactions is a pointer to the list so I am doing a null check. I think it would panic if I just check for len() without making sure its not nil first.

I renamed it though. I agree it is a little too long

If you do len(nil) it returns 0 instead of panicking as nil is a valid 0-length slice. You can try it in Go Playground. That's why it's preferable; it handles both the nil case as well as the empty slice case.


compaction_test.go, line 1738 at r4 (raw file):

Previously, aadityasondhi (Aaditya Sondhi) wrote…

I added the show-read-compactions command. By size, are you referring to FileMetadata.Size?

Yep, I meant FileMetadata.Size. I thought runDbDefineCmd would let you pass in file sizes for each file but apparently not. It's more work than I guessed at first (but also not significantly so). Basically, runDBDefineCmd lets you pass in individual keys for each level, that it then flushes directly to the level you specify, using some custom flushing/level-setting logic.

If you ctrl+f for "parseMeta" in compaction_test.go or compaction_picker_test.go, you'll see some tests that use a custom parseMeta function that allow you to directly specify files and sometimes file sizes. You may want to copy one of those define cases verbatim, such as the one in TestCompactionFindL0Limit, as it's closer to what you want. Once you have made that change, you can do something like this:

L5
  100: a.SET.5-b.SET.6 size=10
L6
  010: a.SET.0-b.SET.0 size=100

Now L6 and L5 have the right relative shape, so you're guaranteed to not get a score-based compaction in the tests.


compaction_test.go, line 1697 at r5 (raw file):

				return s

			case "set-read-compaction":

Nit: set implies you're clearing it first, and then adding all the read compactions you've specified. Since you're doing an append here, it makes sense to rename it to add-read-compaction.


compaction_test.go, line 1723 at r5 (raw file):

						rc := readCompaction{
							level: l,
							start: base.ParseInternalKey(keys[0]).UserKey,

Ah oops, I didn't realize you were just taking the UserKey anyway. Then you can go back to the old []byte() case, but ensure you don't specify sequence number and kind in the tests. I.e. it'd just be

add-read-compaction
a-b

iterator.go, line 191 at r4 (raw file):

Previously, aadityasondhi (Aaditya Sondhi) wrote…

Put it in with a || to still check both. Let me know if we would rather just only check one. I feel a little uneasy not doing the null check for both even if there isn't a case where it could happen.

Usually we're often okay with panicking if one is set but not the other, as it'd highlight an issue more quickly. But I'm also okay with a defensive programming solution like you've done, so it's okay.


iterator_test.go, line 533 at r5 (raw file):

			var err error
			if d, err = runDBDefineCmd(td, opts); err != nil {

BTW, this use of runDBDefineCmd is okay as this test isn't influenced by file sizes / bounds too much.


testdata/compaction_read_triggered, line 6 at r5 (raw file):

a.SET.55:a b.SET.5:b
L6
a.SET.55:a b.SET.5:b

I would also add a simple variant of this test case after you've made the define changes, where L5 and L6 have another file beside the a-b one (eg. e-f), that stays unaffected throughout the whole thing. Or you could just update this test case to have that second file in both levels, up to you.


testdata/compaction_read_triggered, line 41 at r5 (raw file):


Nit: too many blank lines (and the same thing below). The large comment is enough of a test case delimiter.


testdata/compaction_read_triggered, line 175 at r5 (raw file):

----
6:
  000006:[a#0,SET-b#0,SET]

I would add one more test case (after you make the define changes): where L0 has a file in the key range, and so does L5 and L6. And you add-read-compaction the range but in L4. Your logic should compact the file in L0. To make things interesting you can also use slightly different ranges; the L0 file can be a-b and size 10, the L5 file a-d and size 100, and L6 a-e and size 1000. And you can add-read-compaction L4 a-c and see what happens.

Otherwise this file looks great.


testdata/iterator_read_sampling, line 169 at r5 (raw file):



# EDGE CASE (TODO(aaditya): Investigate the performance impact of the fix for this)

I'd move this TODO to sampleRead for higher visibility, and if you think you won't be able to get to it by the end of your internship, you would also want to file an issue about this missed case in read compactions. I'd also expand the comment below with a short sentence to explain why sampling skips that file; it's because the iterator has already moved past it.

@itsbilal
Copy link
Copy Markdown
Contributor


compaction_picker.go, line 1340 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

If you do len(nil) it returns 0 instead of panicking as nil is a valid 0-length slice. You can try it in Go Playground. That's why it's preferable; it handles both the nil case as well as the empty slice case.

Oops nevermind, I missed the fact that it's a pointer to a slice. This is good then.

@aadityasondhi aadityasondhi force-pushed the read-triggered-compactions branch from 047010c to 9486b4c Compare December 17, 2020 02:11
Copy link
Copy Markdown
Contributor Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I made most of the changes recommended. I took a slightly different approach to address the missing test cases which I have outlined in the comments below. I also found a potential bug in the logic that checks for a file picked for read compaction that has been compacted by another compaction. I will be investigating it.

Reviewable status: 15 of 21 files reviewed, 5 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)


compaction_test.go, line 1738 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Yep, I meant FileMetadata.Size. I thought runDbDefineCmd would let you pass in file sizes for each file but apparently not. It's more work than I guessed at first (but also not significantly so). Basically, runDBDefineCmd lets you pass in individual keys for each level, that it then flushes directly to the level you specify, using some custom flushing/level-setting logic.

If you ctrl+f for "parseMeta" in compaction_test.go or compaction_picker_test.go, you'll see some tests that use a custom parseMeta function that allow you to directly specify files and sometimes file sizes. You may want to copy one of those define cases verbatim, such as the one in TestCompactionFindL0Limit, as it's closer to what you want. Once you have made that change, you can do something like this:

L5
  100: a.SET.5-b.SET.6 size=10
L6
  010: a.SET.0-b.SET.0 size=100

Now L6 and L5 have the right relative shape, so you're guaranteed to not get a score-based compaction in the tests.

I added the tests but took a slightly different approach that made more sense to me. I added these tests in compaction_picker_test.go instead so that I could preserve testing the db related logic that included setting flush and copying a pointer for db.mu.compact.readCompactions into compactionEnv inside compaction_test.go. Open for suggestions if you know of a better way to split these up.


compaction_test.go, line 1723 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Ah oops, I didn't realize you were just taking the UserKey anyway. Then you can go back to the old []byte() case, but ensure you don't specify sequence number and kind in the tests. I.e. it'd just be

add-read-compaction
a-b

Done.


testdata/compaction_read_triggered, line 6 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I would also add a simple variant of this test case after you've made the define changes, where L5 and L6 have another file beside the a-b one (eg. e-f), that stays unaffected throughout the whole thing. Or you could just update this test case to have that second file in both levels, up to you.

Same as previous. Added to tests in compaction_picker_test.go.


testdata/compaction_read_triggered, line 175 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I would add one more test case (after you make the define changes): where L0 has a file in the key range, and so does L5 and L6. And you add-read-compaction the range but in L4. Your logic should compact the file in L0. To make things interesting you can also use slightly different ranges; the L0 file can be a-b and size 10, the L5 file a-d and size 100, and L6 a-e and size 1000. And you can add-read-compaction L4 a-c and see what happens.

Otherwise this file looks great.

Added the test in compaction_test.go. Interestingly enough, it highlights a bug in the code where base level is being set to 1 even though L1-L4 are empty. I will look into it.


testdata/iterator_read_sampling, line 169 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I'd move this TODO to sampleRead for higher visibility, and if you think you won't be able to get to it by the end of your internship, you would also want to file an issue about this missed case in read compactions. I'd also expand the comment below with a short sentence to explain why sampling skips that file; it's because the iterator has already moved past it.

Moved it to the code. I still plan to get to it but in case I do not, I will file an issue for this.

@aadityasondhi aadityasondhi force-pushed the read-triggered-compactions branch from 9486b4c to d5d9287 Compare December 17, 2020 21:52
Copy link
Copy Markdown
Contributor Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the "bug" in the most recent revision. As discussed offline, it was just a test setup issue.

Reviewable status: 15 of 22 files reviewed, 5 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)

Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: great work! This is good to merge. Just a couple of minor changes and then you can merge as-is without waiting for a re-review from me.

One more thing: Update the commit message and PR description to say "Fixes #29", so that that issue gets closed when you merge this PR.

Reviewed 7 of 7 files at r6.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @aadityasondhi, @jbowens, @petermattis, and @sumeerbhola)


compaction_picker_test.go, line 902 at r6 (raw file):

		parts = strings.Split(fields[0], "-")
		if len(parts) != 2 {
			return nil, errors.Errorf("malformed table spec: %s", s)

You might want to add usage: <file-num>: start.SET.1-end.SET.2 as that makes this all more readable.


data_test.go, line 398 at r6 (raw file):

	}
	d.mu.Lock()
	if baseLevel < 0 {

I'd remove this condition entirely (i.e. do nothing if baseLevel is negative), and leave dynamicBaseLevel to its default of true.


data_test.go, line 401 at r6 (raw file):

		d.mu.versions.dynamicBaseLevel = false
	} else {
		d.mu.versions.dynamicBaseLevel = true

This should be false if you're manually setting a baseLevel.


testdata/compaction_picker_read_triggered, line 19 at r6 (raw file):

  000011:[g#1,SET-l#2,SET]

add-read-compaction

I would add a pick-auto before this that doesn't pick anything, just to make it very clear that this is working as intended.


testdata/compaction_read_triggered, line 84 at r6 (raw file):

# Test case where there is mismatch in the level of chosen read compaction and current version, but the correct target
# file is in L0. PickReadTriggeredCompaction should be able to handle this and the L0 file that overlaps in the same
# key range to compact.

I think you should move this test to compaction_picker_read_triggered. Two benefits: 1) it tests this "complex case" in that test file as well, and 2) you don't need to mess with base-level as (I believe) that code path sets it correctly.

@jbowens
Copy link
Copy Markdown
Contributor

jbowens commented Dec 18, 2020

Exciting!!! 🎉

Previously, our compactions were triggered on write based heuristics.
This change introduces compactions based on high read activity to improve
read performance in read-heavy workloads. It is inspired from LevelDB's
read based compaction heuristic which is outlined in the following issue:
cockroachdb#29.

These compactions are triggered using an `AllowedSeeks` parameter on
each file's `FileMetadata`. Reads are sampled based on a rate defined in
`iterator.MaybeSampleRead()`. If a read is sampled, `AllowedSeeks` is
decremented on the file in the top-most level containing the key. Once
`AllowedSeeks` reaches 0, a compaction for the key range is scheduled.

Read triggered compactions are only considered if no other compaction is
possible at the time. This helps prioritize score based compactions to
maintain a healthy LSM shape.

The results of this change in benchmarks are outlined in the github
issue: cockroachdb#29.

Fixes cockroachdb#29.
@aadityasondhi aadityasondhi force-pushed the read-triggered-compactions branch from d5d9287 to 27b322c Compare December 21, 2020 14:27
Copy link
Copy Markdown
Contributor Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 18 of 21 files reviewed, 4 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)


data_test.go, line 398 at r6 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I'd remove this condition entirely (i.e. do nothing if baseLevel is negative), and leave dynamicBaseLevel to its default of true.

Removed based on the comment below.


data_test.go, line 401 at r6 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This should be false if you're manually setting a baseLevel.

Removed based on the comment below.


testdata/compaction_picker_read_triggered, line 19 at r6 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I would add a pick-auto before this that doesn't pick anything, just to make it very clear that this is working as intended.

Done.


testdata/compaction_read_triggered, line 84 at r6 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I think you should move this test to compaction_picker_read_triggered. Two benefits: 1) it tests this "complex case" in that test file as well, and 2) you don't need to mess with base-level as (I believe) that code path sets it correctly.

Done.

@aadityasondhi aadityasondhi merged commit 4b1164f into cockroachdb:master Dec 21, 2020
@aadityasondhi aadityasondhi deleted the read-triggered-compactions branch December 21, 2020 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: read-based compaction heuristic

4 participants