Skip to content

release-21.1: improve restore performance consistency#64656

Merged
pbardea merged 4 commits intocockroachdb:release-21.1from
pbardea:backport21.1-63471-63875-64067
May 17, 2021
Merged

release-21.1: improve restore performance consistency#64656
pbardea merged 4 commits intocockroachdb:release-21.1from
pbardea:backport21.1-63471-63875-64067

Conversation

@pbardea
Copy link
Copy Markdown
Contributor

@pbardea pbardea commented May 4, 2021

Backport:

Please see individual PRs for details.

/cc @cockroachdb/release

Added do-not-merge label until release-21.1 opens for 21.1.1 backports once the release is out.

pbardea added 4 commits May 4, 2021 15:39
This commit filters out importSpans that have no associated data.

File spans' start keys are set to be the .Next() of the EndKey of the
previous file span. This was at least the case on the backup that is
currently being run against restore2TB. This lead to a lot of importSpan
entries that contained no files and would cover an empty key range. The
precense of these import spans did not effect the performance of
restores that much, but they did cause unnecessary splits and scatters
which further contributed to the elevated NLHEs that were seen.

This commit generally calms down the leaseholder errors, but do not
eliminate them entirely. Future investigation is still needed. It also
does not seem to have any performance impact on RESTORE performance (in
terms of speed of the job).

Release note: None
This change reworks how RESTORE splits and scatter's in 2 ways:

1.
This commits updates the split and scatter mechanism that restore uses
to split at the key of the next chunk/entry rather than at the current
chunk/entry.

This resolves a long-standing TODO that updates the split and scattering
of RESTORE to perform a split at the key of the next chunk/entry.
Previously, we would split at a the start key of the span, and then
scatter that span.

Consider a chunk with entries that we want to split and scatter. If we
split at the start of each entry and scatter the entry we just split off
(the right hand side of the split), we'll be continuously scattering the
suffix keyspace of the chunk. It's preferrable to split out a prefix
first (by splitting at the key of the next chunk) and scattering the
prefix.

We additionally refactor the split and scatter processor interface and
stop scattering the individual entries, but rather just split them.

2.
Individual entries are no longer scattered. They used to be scattered
with the randomizeLeases option set to false, but there was not any
indication that this was beneficial to performance, so it was removed
for simplicity.

Release note: None
This change fixes a bug in the planning of the restore DistSQL flow. The
ordering of the source router slots did not match the order that the
streams were added to the range router. This meant that the range router
could mis-direct rows in the flow to the incorrect node.

Ever after this change, it appears that a lot of AddSSTables are not
handled locally and so the release note does not indicate any
performance improvement.

Release note: None
This change updates the chunk size to take larger clusters into account.
On larger clusters, having more chunks (that are each smaller) is
beneficial because this results in more even load after scattering these
chunks.

Release note: None
@pbardea pbardea added the do-not-merge bors won't merge a PR with this label. label May 4, 2021
@pbardea pbardea requested review from a team and dt May 4, 2021 15:41
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for 21.1.1)

@pbardea pbardea removed the do-not-merge bors won't merge a PR with this label. label May 17, 2021
@pbardea
Copy link
Copy Markdown
Contributor Author

pbardea commented May 17, 2021

TFTR!
bors r=dt

craig bot pushed a commit that referenced this pull request May 17, 2021
64656: release-21.1: improve restore performance consistency r=dt a=pbardea

Backport:
  * 2/2 commits from "backupccl: rework split and scatter mechanism" (#63471)
  * 1/1 commits from "backupccl: fix a bug in routing scattered ranges" (#63875)
  * 1/1 commits from "backupccl: create more chunks on larger clusters" (#64067)

Please see individual PRs for details.

/cc @cockroachdb/release

Added do-not-merge label until release-21.1 opens for 21.1.1 backports once the release is out.


Co-authored-by: Paul Bardea <pbardea@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 17, 2021

This PR was included in a batch that successfully built, but then failed to merge into release-21.1 (it was a non-fast-forward update). It will be automatically retried.

@pbardea
Copy link
Copy Markdown
Contributor Author

pbardea commented May 17, 2021

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 17, 2021

Canceled.

@pbardea pbardea merged commit c60cbde into cockroachdb:release-21.1 May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants