-
Notifications
You must be signed in to change notification settings - Fork 4.1k
migrations: add explicit API for migrating ingested data later #58611
Description
The long-running migration system described here allows performing a migration that may rewrite all row data or all schemas or otherwise perform some migration and then update the cluster version to show that that has been done, which is used as a way to know some invariant now holds, for example that there are no longer interleaved rows to consider or no longer old-format FKs.
However, row data and schema elements can also be ingested into a cluster from a BACKUP using RESTORE. If that backup has captured the pre-migration state, restoring it as-is into the cluster would mean the invariant no longer holds, which could lead to serious bugs if it is assumed elsewhere, based on the cluster version.
In the past, this has been handled in a somewhat ad-hoc and case-by-case basis (or missed in some cases). When the schema team migrated the FK representation they hooked in to points in the backup code to specifically check the FK representation used in backed up tables and modify it as needed. Some other migrations, such as those that updated permissions on all system tables, were written to run once, update every current table, and then never run again, meaning tables restored later would not have the migrated permissions.
To be more robust and less prone to overlooking the potential for pre-migration data to be restored post-migration, we should a) persist the cluster version of the backed up data in the backup and b) step though each migration from that to the current cluster version, calling a new method in the migration definition API that asks that migration implementation if and how it wants to migrate the data being RESTORED. Many migrations -- such as the one to move intents to lock table -- may opt to do nothing, but should do so explicitly to ensure implementors consider restored data in their migration.
Jira issue: CRDB-3378
Epic CRDB-10338