Skip to content

compaction: elision-only compactions for tables with only range keys#1759

Merged
nicktrav merged 1 commit intocockroachdb:masterfrom
nicktrav:nickt.range-key-elision
Jun 14, 2022
Merged

compaction: elision-only compactions for tables with only range keys#1759
nicktrav merged 1 commit intocockroachdb:masterfrom
nicktrav:nickt.range-key-elision

Conversation

@nicktrav
Copy link
Copy Markdown
Contributor

Currently, a table is marked as eligible for elision-only compaction by
the elisionOnlyAnnotator under the following circumstances:

  • the table's range deletion estimate is greater than or equal to 10% of
    the total table size, OR
  • the number of deletions is greater than or equal to 10% of the table's
    total point key entries.

If a table contains only range keys, the second predicate is true (given
that 0 >= 0). If the table contains a range key delete or a range key
unset, an elision-only compaction is possible (i.e. if there are no
spans underneath, or snapshots preventing the elision, etc.).

However, if the table contains just range key sets, the table is always
marked for elision, but an elision-only compaction has no effect. The
same elision-only compaction will continue to be scheduled.

Update the elision-only compaction heuristics to only schedule such
compactions when there is at least one range deletion, range key
deletion, or range key unset. This requires storing an additional uint
on the table stats, which allows determining whether a table contains
only range key sets.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nicktrav nicktrav marked this pull request as ready for review June 11, 2022 19:42
@nicktrav nicktrav requested review from itsbilal and jbowens June 11, 2022 19:42
@nicktrav
Copy link
Copy Markdown
Contributor Author

Noticed this issue while rebasing #1750 to pick up the compaction support on master. Splitting it out to avoid additional noise on that PR. This would need to land first.

Another approach I played around with was updating TableStats.NumEntries to take into account range keys. That seems to break a bunch of things that rely on this being a point key only number (i.e. disk estimates currently only take into account point keys, and this can result in a div-by-zero panic if the table contains only point keys). In this case we'd still need a way of disambiguating range-key-set-only tables, which shouldn't be elided.

Another idea was to only use range key dels in the heuristic (i.e. ignore tables with range key unsets that could potentially be elided). I think in this case a range key unset could fall all the way down to L6 and never be deleted, under the right circumstances.

Open to other ideas.

Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

Reviewed all commit messages.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @nicktrav)


compaction_picker.go line 1249 at r1 (raw file):

	// dropped by tombstones and duplicated user keys. See #847.
	if f.Stats.RangeDeletionsBytesEstimate*10 < f.Size &&
		f.Stats.NumDeletions*10 < f.Stats.NumEntries {

wdyt about about making this <= instead, with a comment explaining that NumEntries may be zero if the table only contains range keys?

If we want elision-only compactions to trigger for the purpose of eliding range key unsets/dels (maybe?), I think we should give some thought to the heuristic that we want to trigger with. Range keys are expected to be few and small, since in CockroachDB's case they have no user-provided values. Space amplification isn't much of a concern. The main reason to want to elide their deletions is to remove potentially wide ranges that might force ingests into higher levels. I'm not sure that's much of an issue, because the compaction into a table just containing a few range keys will be very cheap.

Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

Reviewed all commit messages.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @nicktrav)


compaction_picker.go line 1249 at r1 (raw file):

	// dropped by tombstones and duplicated user keys. See #847.
	if f.Stats.RangeDeletionsBytesEstimate*10 < f.Size &&
		f.Stats.NumDeletions*10 < f.Stats.NumEntries {

wdyt about about making this <= instead, with a comment explaining that NumEntries may be zero if the table only contains range keys?

If we want elision-only compactions to trigger for the purpose of eliding range key unsets/dels (maybe?), I think we should give some thought to the heuristic that we want to trigger with. Range keys are expected to be few and small, since in CockroachDB's case they have no user-provided values. Space amplification isn't much of a concern. The main reason to want to elide their deletions is to remove potentially wide ranges that might force ingests into higher levels. I'm not sure that's much of an issue, because the compaction into a table just containing a few range keys will be very cheap.

Currently, a table is marked as eligible for elision-only compaction by
the `elisionOnlyAnnotator` under the following circumstances:

- the table's range deletion estimate is greater than or equal to 10% of
  the total table size, OR
- the number of deletions is greater than or equal to 10% of the table's
  total point key entries.

If a table contains only range keys, the second predicate is true (given
that `0 >= 0`), scheduling an elision-only compaction. Howeve, if the
table contains only range key-sets, such keys cannot be elided, and the
compaction picker will continue to schedule the table for elision,
without effect. This can tie up compaction slots.

While it is _technically_ possible that a table with containing
exclusively range keys, but no range key sets _could_ be eligible for an
elision-only compaction (i.e. if there are no spans underneath, or
snapshots preventing the elision, etc.), the utility of such a
compaction is minimal, given that a compaction into a table containing a
few range keys would be inexpensive.

Tweak the elision-only compaction heuristics to skip elision-only
compactions of tables that contain exclusively range keys.
@nicktrav nicktrav force-pushed the nickt.range-key-elision branch from 2d7219a to 1022ff8 Compare June 13, 2022 15:38
Copy link
Copy Markdown
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)


compaction_picker.go line 1249 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

wdyt about about making this <= instead, with a comment explaining that NumEntries may be zero if the table only contains range keys?

If we want elision-only compactions to trigger for the purpose of eliding range key unsets/dels (maybe?), I think we should give some thought to the heuristic that we want to trigger with. Range keys are expected to be few and small, since in CockroachDB's case they have no user-provided values. Space amplification isn't much of a concern. The main reason to want to elide their deletions is to remove potentially wide ranges that might force ingests into higher levels. I'm not sure that's much of an issue, because the compaction into a table just containing a few range keys will be very cheap.

SGTM. Updated the heuristic to <= and dropped in a TODO.

Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Nice catch!

Reviewed 2 of 13 files at r1, 2 of 11 files at r2.
Reviewable status: 4 of 13 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)

@nicktrav
Copy link
Copy Markdown
Contributor Author

TFTR!

@nicktrav nicktrav merged commit ef1ca57 into cockroachdb:master Jun 14, 2022
@nicktrav nicktrav deleted the nickt.range-key-elision branch June 14, 2022 14:45
@nicktrav nicktrav mentioned this pull request Jun 16, 2022
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants