backupccl: remap all restored tables#85492
Conversation
d2c0278 to
1e29ba6
Compare
benbardin
left a comment
There was a problem hiding this comment.
LGTM, but consider also waiting for another review or two from the others :)
ajwerner
left a comment
There was a problem hiding this comment.
though the test failure seems plausibly legit. This was satisfying to read.
Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 2 of 2 files at r4, 1 of 3 files at r5, 4 of 5 files at r6, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @dt)
-- commits line 2 at r1:
I'm so curious
pkg/ccl/backupccl/restore_job.go line 1667 at r3 (raw file):
// tempSystemDatabaseID returns the ID of the descriptor for the temporary // system database used in full cluster restores.
Can you note that this returns descpb.InvalidID in the case that there are no system tables in the tables slice and that it relies on the DescriptorRewrites map having already been populated?
pkg/ccl/backupccl/restore_old_versions_test.go line 543 at r4 (raw file):
{"root", "", "{admin}"}, }) sqlDB.CheckQueryResults(t, "SELECT comment FROM system.comments", [][]string{
technically, should this have an order by?
adityamaru
left a comment
There was a problem hiding this comment.
Can I bug you to flesh out the commit message in commit number 5. For anyone who comes along later it would be helpful to read why we made this switch etc. Feel free to ignore.
stevendanna
left a comment
There was a problem hiding this comment.
Overall, it is nice to see this code get deleted.
One question I have is why we don't need the custom restore funcs for comments, session vars, and zone configuration all of which reference descriptor IDs in their table. If we do need them, then perhaps the descriptor-conflicts tests isn't testing exactly what it used to be.
This is why https://github.com/cockroachdb/cockroach/pull/85098/files landed first |
|
@dt Ah, thanks. missed that one. |
17fb6c0 to
d992357
Compare
Added a lengthy commit message. |
|
TFTRs! bors r+ |
|
bors r- gotta fix a merge conflict |
|
Canceled. |
Release note: none.
Release note: none.
Release note: none.
Release note: none.
Previously we updated one old ID to one new ID at a time. However this was incorrect: if a new ID can potentially be higher or lower and conflict with an old ID yet to be remapped, and thus subsequently be remapped again incorrectly as it then looks like the other old ID. Since IDs can be higher or lower there is no order that is sure to be free of such conflicts when doing each remapping separately. Instead, we now do all the remappings in one pass, using a CASE to switch from old ID to new ID as we go. Release note: none.
Previously we kept the IDs of all descriptors being restored unchanged during full-cluster restore. This had the advantage of meaning that all mentions of IDs in anything we restored, for example in system tables or jobs, were still valid as we put the same descriptor back at the same ID when we restored it. However, this approach proved very brittle to changes in what IDs we use for what over time; for example, in 22.1, new system tables were introduced with IDs in the lower 50s, while those IDs had previously been available to user-created tables which could appear in the backup being restored. This posed a problem: we needed to restore that backed up user table to ID 52, but there was a new system table in the way. A workaround in 22.1 is that, mid-restore, we can pick up and move an existing system table in the restoring cluster to a new ID, picked to be greater than any ID we are going to restore into, then carry on with the restore into the now conflict-free ID spans. However this approach is brittle, as the pick-up-and-move could break any number of assumptions or edges between descriptors. Fortunately for 22.1, no such edges exist and only a couple new system descriptors had found their way into the formerly user-owned IDs, however if left unchecked, this problem could easily grow in complexity, with some future descriptors being much harder to move. Instead, this changes how we restore during full cluster restore to match how we restore in other restores: each descriptor being restored is assigned a new descriptor ID, generated as the next available ID in the restoring cluster, and references to the old ID in other data being restored are updated. Descriptor ID rewriting within descriptors themselves has been in place as long as restore has existed, however during full-cluster restore, data in addition to descriptors is restored, particularly in system table rows. These rows can also contain references to descriptor IDs, which is why a prior change started updating these rows as they are restored to contain the new IDs. The 'old' and 'new' IDs were always the same in that change however, as cluster-restore did not actually generate and assign _new_ IDs. That last part is what this change does: instead of assigning the same ID, it assigns a new one, and the rewriting enabled in prior changes is now actually updating IDs. The special-case moving of conflicting descriptors is no longer needed now that all descriptors are assigned new IDs and is removed. Release note: none.
Release note: none.
|
bors r+ |
|
Build succeeded: |
This PR has a few changes, broken down into separate commits:
a) stop restoring tmp tables and remove the special-case code to synthesize their special schemas; These were previously restored only to be dropped so that restored jobs that referenced them would not be broken, but we stopped restoring jobs.
b) synthesize type-change jobs during cluster restore; this goes with not restoring jobs.
c) fix some assumptions in tests/other code about what IDs restored tables have.
d) finally, always assign new IDs to all restored objects, even during cluster restore, removing the need to carefully move conflicting tables or other things around.
Commit-by-commit review recommended.