Skip to content

backupccl: move descriptors and descriptor changes field from manifest to SST#96245

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
rhu713:descriptor-iter
Feb 14, 2023
Merged

backupccl: move descriptors and descriptor changes field from manifest to SST#96245
craig[bot] merged 1 commit intocockroachdb:masterfrom
rhu713:descriptor-iter

Conversation

@rhu713
Copy link
Copy Markdown
Contributor

@rhu713 rhu713 commented Jan 30, 2023

As part of an effort to make backup manifests scale better for larger clusters, this patch moves descriptors and descriptor changes from the manifest to an external SST. This avoids the need to alloc enough memory to hold every descriptor and descriptor revision for every layer of a backup during a backup or restore job. This patch also changes the access pattern for descriptors and descriptor changes to use iterators, so that they can be accessed in a streaming manner from the external SST.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@rhu713 rhu713 force-pushed the descriptor-iter branch 3 times, most recently from e6f4520 to 90efbb5 Compare January 31, 2023 12:52
@rhu713 rhu713 requested a review from adityamaru January 31, 2023 12:52
@rhu713 rhu713 marked this pull request as ready for review January 31, 2023 12:53
@rhu713 rhu713 requested review from a team as code owners January 31, 2023 12:53
@adityamaru
Copy link
Copy Markdown
Contributor

I'm going to review this today, adding @dt as well in case he wants to take a look.

@adityamaru adityamaru requested a review from dt February 1, 2023 14:24
Copy link
Copy Markdown
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing out my first round of comments, overall it looks really good!

// a slice of mutable descriptors.
func BackupManifestDescriptors(
backupManifest *backuppb.BackupManifest,
ctx context.Context, backupManifest *backuppb.BackupManifest, iterFactory *IterFactory,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd prefer if we stopped passing in backupManifest to this method, and only passed in the EndTime. I like the mental model of only using the iter to access descriptors from a manifest, and it was a little confusing seeing us also pass in the manifest to this method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, changed to pass in the end time.

func (f *IterFactory) NewFileIter(
ctx context.Context,
) (bulk.Iterator[*backuppb.BackupManifest_File], error) {
if f.m.HasExternalFilesList {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we rename the proto field now that it encompasses more than just Files. Maybe HasExternalSSTs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, changed to HasExternalManifestSSTs

return err
}
defer w.Close()
fileSST := storage.MakeBackupSSTWriter(ctx, dest.Settings(), w)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to descsSST

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"github.com/cockroachdb/cockroach/pkg/sql/isql"
"github.com/cockroachdb/cockroach/pkg/sql/stats"
"github.com/cockroachdb/cockroach/pkg/testutils/testcluster"
"github.com/cockroachdb/cockroach/pkg/util/bulk"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this file isn't in backupinfo? Doesn't have to be done in this PR but just curious if you knew.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was more of a util method not just related to backups so I put it in the utils package.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh sorry I meant why is backup_metadata_test.go in backupccl as compared to backupinfo? Anyways, we can move it in a smaller followup if need be!

t.Fatal(err)
}
if info.Name() == backupbase.BackupManifestName || !strings.HasSuffix(path, ".sst") {
if info.Name() == backupbase.BackupManifestName || !strings.HasSuffix(path, ".sst") || info.Name() == backupinfo.BackupMetadataDescriptorsListPath {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add files list sst to this check as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think it's better to add the files list SST to this exclusion list just so we're not relying on existing behavior for this test to pass.

return nil
}

// We do not need to upgrade descriptors that exist in the external SST
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, and I'll probably delete this soon once we restrict restores to our new policy. In the meantime though can we check backupManifests[0].ClusterVersion instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, used cluster version.

@adityamaru adityamaru self-requested a review February 2, 2023 17:20
@rhu713 rhu713 force-pushed the descriptor-iter branch 9 times, most recently from 06c6435 to 13eb67a Compare February 8, 2023 17:38
"github.com/cockroachdb/cockroach/pkg/sql/isql"
"github.com/cockroachdb/cockroach/pkg/sql/stats"
"github.com/cockroachdb/cockroach/pkg/testutils/testcluster"
"github.com/cockroachdb/cockroach/pkg/util/bulk"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh sorry I meant why is backup_metadata_test.go in backupccl as compared to backupinfo? Anyways, we can move it in a smaller followup if need be!

@@ -1269,100 +1367,116 @@ func (dri *DescriptorRevisionIterator) Err() error {
// revision, and false if there are no more elements or if an error was
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the comment specifies we return true/false but I don't think thats the case anymore.

// descriptors at the start time, so we don't have to look back further
// than the very last backup.
if m.StartTime.IsEmpty() {
if m.StartTime.IsEmpty() || m.MVCCFilter == backuppb.MVCCFilter_Latest {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this previously a bug? if so do we have a test for it now.

continue
}

if rev.ID == prevRevID {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's just me but the sorting comment is a little far from this logic. Can I bug you to add a sentence that since revisions are sorted by ID and the descending order of revisions we can skip over older revisions once we have seen a revision for an ID.

@adityamaru
Copy link
Copy Markdown
Contributor

I'd really like if we added a mixed-version database/table restore roachtest. I'll file an issue for this that we should try to get to before backporting this.

@rhu713 rhu713 force-pushed the descriptor-iter branch 4 times, most recently from 742a2c6 to 0d9bd2a Compare February 14, 2023 17:10
…t to SST

As part of an effort to make backup manifests scale better for larger clusters,
this patch moves descriptors and descriptor changes from the manifest to an
external SST. This avoids the need to alloc enough memory to hold every
descriptor and descriptor revision for every layer of a backup during a backup
or restore job. This patch also changes the access pattern for descriptors and
descriptor changes to use iterators, so that they can be accessed in a
streaming manner from the external SST.

Release note: None
@rhu713
Copy link
Copy Markdown
Contributor Author

rhu713 commented Feb 14, 2023

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 14, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants