Skip to content

backupccl: create pending schema changes job upon restore#47588

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
pbardea:mutation-job
May 14, 2020
Merged

backupccl: create pending schema changes job upon restore#47588
craig[bot] merged 1 commit intocockroachdb:masterfrom
pbardea:mutation-job

Conversation

@pbardea
Copy link
Copy Markdown
Contributor

@pbardea pbardea commented Apr 17, 2020

This commit updates restore to start schema change jobs for any pending
mutations that are left on the table descriptor. These mutations may be
found on the table descriptor if they were backed up while in the middle
of a schema change.

Fixes #44019.

Release note (bug fix): Properly support restoring tables that were
backed up while they were in the middle of a schema change.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@pbardea
Copy link
Copy Markdown
Contributor Author

pbardea commented Apr 21, 2020

Created as proof of concept. Closing for now.

@pbardea pbardea closed this Apr 21, 2020
@pbardea
Copy link
Copy Markdown
Contributor Author

pbardea commented Apr 21, 2020

This seems like the best solution forward. Re-openeing.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 26, 2020

❌ The GitHub CI (Cockroach) build has failed on 165d6bcd.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 28, 2020

❌ The GitHub CI (Cockroach) build has failed on 2e3018d8.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 29, 2020

❌ The GitHub CI (Cockroach) build has failed on c2e311c6.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@pbardea pbardea force-pushed the mutation-job branch 3 times, most recently from 2f7ac59 to b7f90c5 Compare April 30, 2020 01:15
@pbardea pbardea requested a review from thoszhang April 30, 2020 13:54
@pbardea pbardea marked this pull request as ready for review April 30, 2020 13:54
Copy link
Copy Markdown

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 41 of 41 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pbardea)


pkg/ccl/backupccl/restore_job.go, line 1118 at r1 (raw file):

	})
	if err != nil {
		return err

We should also call CleanupOnRollback on all the StartableJobs if we hit an error.


pkg/ccl/backupccl/restore_job.go, line 1292 at r1 (raw file):

		newJobs = append(newJobs, newJob)

		log.Infof(ctx, "queued new schema change job for table %d, mutation %d",

Maybe also add the job ID?


pkg/ccl/backupccl/testdata/restore_mid_schema_change/create.sql, line 5 at r1 (raw file):

This implies that for a given table, a BACKUP was taken while the schema changes
were running. If there are multiple schema changes, they were all running
simultaneously.

It might be useful to clarify that to generate the test backups, it's necessary to modify the binary (or whatever you had to do) to cause the schema changes to block while you took the backup.


pkg/sql/table.go, line 925 at r1 (raw file):

// N.B.: This does not currently attempt to fully reconstruct the schema change
// job being performed, but serves to give an indication of the work being done.
func JobDescriptionFromMutationID(

I think this belongs more in backupccl since it's so specific to RESTORE.


pkg/sql/table.go, line 928 at r1 (raw file):

	tableDesc *sqlbase.TableDescriptor, id sqlbase.MutationID,
) (string, int, error) {
	var jobDescBuilder strings.Builder

Overall I'm kind of hesitant about the approach of building up a description string manually. I know accuracy isn't the point, but it seems like it'd be too easy to produce invalid SQL in edge cases (character escaping being one set of problems), and it's not worth trying to address/test those cases manually. I think we should either generate obviously non-SQL job descriptions, or make some nodes and let the formatter do it (if that turns out to be easy).

@pbardea pbardea changed the title backupccl: create schema change job upon restore backupccl: create pending schema changes job upon restore May 4, 2020
Copy link
Copy Markdown
Contributor Author

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang)


pkg/ccl/backupccl/restore_job.go, line 1118 at r1 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

We should also call CleanupOnRollback on all the StartableJobs if we hit an error.

Done. I noticed in the schema changer when we create child jobs, we just log the error if the cleanup fails. I did the same here, and I think it should be OK since we're returning an error anyway. Let me know if this isn't right though.


pkg/ccl/backupccl/restore_job.go, line 1292 at r1 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

Maybe also add the job ID?

Done.


pkg/ccl/backupccl/testdata/restore_mid_schema_change/create.sql, line 5 at r1 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

It might be useful to clarify that to generate the test backups, it's necessary to modify the binary (or whatever you had to do) to cause the schema changes to block while you took the backup.

Done.


pkg/sql/table.go, line 925 at r1 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

I think this belongs more in backupccl since it's so specific to RESTORE.

Done.


pkg/sql/table.go, line 928 at r1 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

Overall I'm kind of hesitant about the approach of building up a description string manually. I know accuracy isn't the point, but it seems like it'd be too easy to produce invalid SQL in edge cases (character escaping being one set of problems), and it's not worth trying to address/test those cases manually. I think we should either generate obviously non-SQL job descriptions, or make some nodes and let the formatter do it (if that turns out to be easy).

I think this makes sense. I updated this to create obviously non-SQL descriptions. I didn't see an easy way to construct SQL nodes and delegating this to the formatter. If we think that that's important, we can revisit this.

Copy link
Copy Markdown

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/ccl/backupccl/restore_job.go, line 1118 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Done. I noticed in the schema changer when we create child jobs, we just log the error if the cleanup fails. I did the same here, and I think it should be OK since we're returning an error anyway. Let me know if this isn't right though.

Yeah, that's fine. We may end up leaking job IDs, but the job itself was never committed.

@thoszhang
Copy link
Copy Markdown

I think @mwang1026 mentioned having some backups taken during schema changes, as part of testing. It might be good to try to manually restore a few of those.

This commit updates restore to start schema change jobs for any pending
mutations that are left on the table descriptor. These mutations may be
found on the table descriptor if they were backed up while in the middle
of a schema change.

Release note (bug fix): Properly support restoring tables that were
backed up while they were in the middle of a schema change.
@pbardea
Copy link
Copy Markdown
Contributor Author

pbardea commented May 14, 2020

I restored a few of the backups mentioned above and the schema changes seem to persist. TFTR!
bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 14, 2020

Build succeeded

@craig craig bot merged commit 752c325 into cockroachdb:master May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bulkio, sql: mutations with job IDs are persisted in backups and cause problems after restoring

3 participants