backupccl: create pending schema changes job upon restore#47588
backupccl: create pending schema changes job upon restore#47588craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
Created as proof of concept. Closing for now. |
|
This seems like the best solution forward. Re-openeing. |
|
❌ The GitHub CI (Cockroach) build has failed on 165d6bcd. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
|
❌ The GitHub CI (Cockroach) build has failed on 2e3018d8. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
|
❌ The GitHub CI (Cockroach) build has failed on c2e311c6. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
2f7ac59 to
b7f90c5
Compare
thoszhang
left a comment
There was a problem hiding this comment.
Reviewed 41 of 41 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @pbardea)
pkg/ccl/backupccl/restore_job.go, line 1118 at r1 (raw file):
}) if err != nil { return err
We should also call CleanupOnRollback on all the StartableJobs if we hit an error.
pkg/ccl/backupccl/restore_job.go, line 1292 at r1 (raw file):
newJobs = append(newJobs, newJob) log.Infof(ctx, "queued new schema change job for table %d, mutation %d",
Maybe also add the job ID?
pkg/ccl/backupccl/testdata/restore_mid_schema_change/create.sql, line 5 at r1 (raw file):
This implies that for a given table, a BACKUP was taken while the schema changes were running. If there are multiple schema changes, they were all running simultaneously.
It might be useful to clarify that to generate the test backups, it's necessary to modify the binary (or whatever you had to do) to cause the schema changes to block while you took the backup.
pkg/sql/table.go, line 925 at r1 (raw file):
// N.B.: This does not currently attempt to fully reconstruct the schema change // job being performed, but serves to give an indication of the work being done. func JobDescriptionFromMutationID(
I think this belongs more in backupccl since it's so specific to RESTORE.
pkg/sql/table.go, line 928 at r1 (raw file):
tableDesc *sqlbase.TableDescriptor, id sqlbase.MutationID, ) (string, int, error) { var jobDescBuilder strings.Builder
Overall I'm kind of hesitant about the approach of building up a description string manually. I know accuracy isn't the point, but it seems like it'd be too easy to produce invalid SQL in edge cases (character escaping being one set of problems), and it's not worth trying to address/test those cases manually. I think we should either generate obviously non-SQL job descriptions, or make some nodes and let the formatter do it (if that turns out to be easy).
pbardea
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang)
pkg/ccl/backupccl/restore_job.go, line 1118 at r1 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
We should also call
CleanupOnRollbackon all theStartableJobs if we hit an error.
Done. I noticed in the schema changer when we create child jobs, we just log the error if the cleanup fails. I did the same here, and I think it should be OK since we're returning an error anyway. Let me know if this isn't right though.
pkg/ccl/backupccl/restore_job.go, line 1292 at r1 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
Maybe also add the job ID?
Done.
pkg/ccl/backupccl/testdata/restore_mid_schema_change/create.sql, line 5 at r1 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
It might be useful to clarify that to generate the test backups, it's necessary to modify the binary (or whatever you had to do) to cause the schema changes to block while you took the backup.
Done.
pkg/sql/table.go, line 925 at r1 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
I think this belongs more in backupccl since it's so specific to RESTORE.
Done.
pkg/sql/table.go, line 928 at r1 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
Overall I'm kind of hesitant about the approach of building up a description string manually. I know accuracy isn't the point, but it seems like it'd be too easy to produce invalid SQL in edge cases (character escaping being one set of problems), and it's not worth trying to address/test those cases manually. I think we should either generate obviously non-SQL job descriptions, or make some nodes and let the formatter do it (if that turns out to be easy).
I think this makes sense. I updated this to create obviously non-SQL descriptions. I didn't see an easy way to construct SQL nodes and delegating this to the formatter. If we think that that's important, we can revisit this.
thoszhang
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/ccl/backupccl/restore_job.go, line 1118 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Done. I noticed in the schema changer when we create child jobs, we just log the error if the cleanup fails. I did the same here, and I think it should be OK since we're returning an error anyway. Let me know if this isn't right though.
Yeah, that's fine. We may end up leaking job IDs, but the job itself was never committed.
|
I think @mwang1026 mentioned having some backups taken during schema changes, as part of testing. It might be good to try to manually restore a few of those. |
This commit updates restore to start schema change jobs for any pending mutations that are left on the table descriptor. These mutations may be found on the table descriptor if they were backed up while in the middle of a schema change. Release note (bug fix): Properly support restoring tables that were backed up while they were in the middle of a schema change.
|
I restored a few of the backups mentioned above and the schema changes seem to persist. TFTR! |
Build succeeded |
This commit updates restore to start schema change jobs for any pending
mutations that are left on the table descriptor. These mutations may be
found on the table descriptor if they were backed up while in the middle
of a schema change.
Fixes #44019.
Release note (bug fix): Properly support restoring tables that were
backed up while they were in the middle of a schema change.