Skip to content

db: check for in-progress compactions during input expansion#927

Merged
aadityasondhi merged 1 commit intocockroachdb:masterfrom
aadityasondhi:check-concurrent-compactions
Sep 22, 2020
Merged

db: check for in-progress compactions during input expansion#927
aadityasondhi merged 1 commit intocockroachdb:masterfrom
aadityasondhi:check-concurrent-compactions

Conversation

@aadityasondhi
Copy link
Copy Markdown
Contributor

Currently, pickedCompaction.setupInputs does not consider FileMetada.compacting when expanding the input slice. This ignores concurrent in-progress compactions.

This change lets setupInputs return early if there is an in-progress compaction in the files picked for compaction. This avoids wasteful expansion of compaction inputs.

Fix #644.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@aadityasondhi aadityasondhi force-pushed the check-concurrent-compactions branch from 3961a98 to 122080c Compare September 18, 2020 17:31
Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid so far, great work! I'll review the test very shortly, but until I have a suggestion to get an additional isCompacting check without much additional work.

Reviewed 3 of 6 files at r1.
Reviewable status: 3 of 6 files reviewed, 4 unresolved discussions (waiting on @aadityasondhi, @itsbilal, and @jbowens)


compaction_picker.go, line 261 at r1 (raw file):

	}
	grow0 := pc.version.Overlaps(pc.startLevel.level, pc.cmp, sm.UserKey, la.UserKey)
	grow0, isCompacting := expandToAtomicUnit(pc.cmp, grow0)

I just realized two interesting things here:

  1. We almost always call expandToAtomicUnit after calling Overlaps, which grows the compaction, but doesn't have this check
  2. If one file in an atomic compaction unit is compacting, they all must be compacting, and your change already checks the boundary atomic compaction units for isCompacting but only if we grew the compaction on that side to "complete" the ACU.

If we return isCompacting = true even if we didn't grow the compaction, we are basically able to overlay an isCompacting check on top of Overlaps without significant additional work. The change for that would have to be in expandToAtomicUnits


compaction_picker.go, line 328 at r1 (raw file):

	if inputs.Empty() {
		// Nothing to expand.
		return inputs, isCompacting

For better readability, it might be better to just return inputs, false here.


compaction_picker.go, line 349 at r1 (raw file):

			// include prev in the compaction.

			if cur.Compacting {

Moving this block to the start of this loop would get you the isCompacting = true assignment even if you don't grow the compaction. See earlier comment on the motivation behind this.


compaction_picker.go, line 372 at r1 (raw file):

			if cur.Compacting {
				isCompacting = true

Same point as earlier

@aadityasondhi aadityasondhi force-pushed the check-concurrent-compactions branch from 122080c to 05f94a2 Compare September 21, 2020 15:35
Copy link
Copy Markdown
Contributor Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @jbowens)


compaction_picker.go, line 261 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I just realized two interesting things here:

  1. We almost always call expandToAtomicUnit after calling Overlaps, which grows the compaction, but doesn't have this check
  2. If one file in an atomic compaction unit is compacting, they all must be compacting, and your change already checks the boundary atomic compaction units for isCompacting but only if we grew the compaction on that side to "complete" the ACU.

If we return isCompacting = true even if we didn't grow the compaction, we are basically able to overlay an isCompacting check on top of Overlaps without significant additional work. The change for that would have to be in expandToAtomicUnits

Thanks for pointing that out! I will make the change.


compaction_picker.go, line 328 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

For better readability, it might be better to just return inputs, false here.

Done.


compaction_picker.go, line 349 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Moving this block to the start of this loop would get you the isCompacting = true assignment even if you don't grow the compaction. See earlier comment on the motivation behind this.

Done.


compaction_picker.go, line 372 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Same point as earlier

Done.

Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just some more minor changes and then this should be good to go.

Reviewed 3 of 6 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @aadityasondhi and @jbowens)


compaction_picker.go, line 321 at r2 (raw file):

// tombstones will be truncated at "b" (the largest key in its atomic
// compaction unit). In the scenario here, that could result in b#1 becoming
// visible when it should be deleted.

It would be worthwhile to add a sentence or two (maybe in a new paragraph) to this comment to specify what isCompacting means. Just mention that it would be true if any of the boundary files / atomic compaction units are already compacting.


compaction_picker.go, line 375 at r2 (raw file):

	})

	return inputs, isCompacting

The cur.Compacting check would not happen if there's no next/prev file to start with, so you could also double it up with an isCompacting || inputIter.First().Compacting || inputIter.Last().Compacting here.


compaction_picker_test.go, line 573 at r1 (raw file):

	opts.EnsureDefaults()
	parseMeta := func(s string) *fileMetadata {
		parts := strings.Split(s, ",")

Our convention in other files seems to be to use space to split filemeta-level tags like this, instead of commas.


compaction_picker_test.go, line 645 at r1 (raw file):

					[]byte(d.CmdArgs[0].String()), []byte(d.CmdArgs[1].String()))

				var compactionFlag bool

Nit: s/compactionFlag/isCompacting/ (or isCompactingFlag)


compaction_picker_test.go, line 662 at r1 (raw file):

				}
				if compactionFlag {
					fmt.Fprintf(&buf, "compacting\n")

Nit: s/compacting/is-compacting


testdata/compaction_setup_inputs, line 95 at r1 (raw file):

L2
  a.SET.3-c.SET.4,compacting
  c.SET.3-d.SET.2

Since you're starting the compaction "rooted" at a, it would be a better idea to put the compacting flag on the next file.

And to ensure you return isCompacting = true even if you didn't grow the compaction but the start/end file is compacting, you could add a case similar to this one, with a compacting a-c file, and a d-e file instead of a c-d file. And to account for the case where there's just no next file but you still return isCompacting = true. you could keep the compacting a-c file and have no other files in L2.

Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be outside the scope of this project, but there's a TODO in pickFile that is pretty related:

pebble/compaction_picker.go

Lines 772 to 774 in 255915d

// TODO(peter): For concurrent compactions, we may want to try harder to
// pick a seed file whose resulting compaction bounds do not overlap with
// an in-progress compaction.

nit: we typically use imperative mood for the commit subject line. eg:
db: check for in-progress compactions during input expansion

https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/73072807/Git+Commit+Messages

The changes here look great!

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @aadityasondhi and @jbowens)

@aadityasondhi aadityasondhi force-pushed the check-concurrent-compactions branch 2 times, most recently from 0942861 to 66464f6 Compare September 21, 2020 18:51
Copy link
Copy Markdown
Contributor Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 5 unresolved discussions (waiting on @aadityasondhi and @itsbilal)

a discussion (no related file):
@jbowens Thanks for pointing me to the commit message guide, I made the change. Re: the TODO item, I took a quick look and it seems like it is covered here:

if compacting {
. I am not sure if I am missing something. I can remove the TODO if you think it is being satisfied.



compaction_picker.go, line 321 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

It would be worthwhile to add a sentence or two (maybe in a new paragraph) to this comment to specify what isCompacting means. Just mention that it would be true if any of the boundary files / atomic compaction units are already compacting.

Will do.


compaction_picker.go, line 375 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The cur.Compacting check would not happen if there's no next/prev file to start with, so you could also double it up with an isCompacting || inputIter.First().Compacting || inputIter.Last().Compacting here.

Nice catch, will make the change


compaction_picker_test.go, line 573 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Our convention in other files seems to be to use space to split filemeta-level tags like this, instead of commas.

Thanks, will do.


testdata/compaction_setup_inputs, line 95 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Since you're starting the compaction "rooted" at a, it would be a better idea to put the compacting flag on the next file.

And to ensure you return isCompacting = true even if you didn't grow the compaction but the start/end file is compacting, you could add a case similar to this one, with a compacting a-c file, and a d-e file instead of a c-d file. And to account for the case where there's just no next file but you still return isCompacting = true. you could keep the compacting a-c file and have no other files in L2.

Yeah nice catch, will fix

@aadityasondhi aadityasondhi changed the title db: checking FileMetadata.Compacting for in-progress compactions db: check for in-progress compactions during input expansion Sep 21, 2020
Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 5 unresolved discussions (waiting on @aadityasondhi and @itsbilal)

a discussion (no related file):

Previously, aadityasondhi (Aaditya Sondhi) wrote…

@jbowens Thanks for pointing me to the commit message guide, I made the change. Re: the TODO item, I took a quick look and it seems like it is covered here:

if compacting {
. I am not sure if I am missing something. I can remove the TODO if you think it is being satisfied.

I think the unsatisfied bit is that we might pick a file that's not compacting but has an atomic compaction unit that includes a file that is compacting. We will discard the picked compaction when we recognize that it includes compacting inputs, but it's not ideal that we don't pursue any compaction at all.

To fully address the TODO, I think we'd need to check if the file's atomic compaction unit includes any compacting files. If it does, we'd want to pick the file with the next lowest ratio.

Now that I'm describing this, it does seem like a fair bit or work, so maybe it should be out-of-scope.


Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 5 unresolved discussions (waiting on @aadityasondhi and @itsbilal)

a discussion (no related file):

Previously, jbowens (Jackson Owens) wrote…

I think the unsatisfied bit is that we might pick a file that's not compacting but has an atomic compaction unit that includes a file that is compacting. We will discard the picked compaction when we recognize that it includes compacting inputs, but it's not ideal that we don't pursue any compaction at all.

To fully address the TODO, I think we'd need to check if the file's atomic compaction unit includes any compacting files. If it does, we'd want to pick the file with the next lowest ratio.

Now that I'm describing this, it does seem like a fair bit or work, so maybe it should be out-of-scope.

Yeah I think that's out-of-scope for this particular issue, even though that'd be the ideal setup. I think it would be better to file a follow-up issue (as merging this would close #644 ) to be able to go back after an isCompacting check fails, and choose the next best compaction file in the level instead of giving up on that level altogether. For now I think this is good to go.


Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, here are some test suggestions, after the last change.

Reviewed 3 of 3 files at r3.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @aadityasondhi and @itsbilal)


compaction_picker_test.go, line 662 at r3 (raw file):

				}
				if isCompacting {
					fmt.Fprintf(&buf, "is-compacting\n")

Don't add a trailing \n, datadriven will take care of it. And reviewable complains about double \n's at the end.


testdata/compaction_setup_inputs, line 109 at r3 (raw file):

  a.SET.5-b.SET.6
L2
  a.SET.3-c.SET.4 compacting

Would also be a good idea to add a negative case:

a.SET.3-c.SET.4
d.SET.3-e.SET.2 compacting

as well as the single-file case:

a.SET.3-c.SET.4 compacting

Currently, `pickedCompaction.setupInputs` does not consider `FileMetada.compacting` when expanding the input slice. This ignores concurrent in-progress compactions.

This change lets `setupInputs` return early if there is an in-progress compaction in the files picked for compaction. This avoids wasteful expansion of compaction inputs.

Fix cockroachdb#644.
@aadityasondhi aadityasondhi force-pushed the check-concurrent-compactions branch from 66464f6 to 6e61e17 Compare September 22, 2020 14:44
Copy link
Copy Markdown
Contributor Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 6 files reviewed, 7 unresolved discussions (waiting on @aadityasondhi and @itsbilal)


compaction_picker_test.go, line 662 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Don't add a trailing \n, datadriven will take care of it. And reviewable complains about double \n's at the end.

Done.


testdata/compaction_setup_inputs, line 109 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Would also be a good idea to add a negative case:

a.SET.3-c.SET.4
d.SET.3-e.SET.2 compacting

as well as the single-file case:

a.SET.3-c.SET.4 compacting

Done.

Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great work!

We don't use bors in the Pebble repo so feel free to just use the green button to merge this into Pebble. And the next time one of us do a "vendor bump" in Cockroach to pull in the latest Pebble changes, this change will land there.

Reviewed 2 of 2 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aadityasondhi)

@aadityasondhi aadityasondhi merged commit 1ea4dba into cockroachdb:master Sep 22, 2020
@aadityasondhi aadityasondhi deleted the check-concurrent-compactions branch September 22, 2020 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

db: should compaction.setupInputs() consider FileMetadata.Compacting?

4 participants