backupccl: offline table data in revision history backups can leak into restored cluster

Before 22.2, backups were assumed to exclude data from offline, importing, tables; however, backups with revision history _will_ contain offline table data because `getRelevantDescChanges` will include offline table descriptors contained in the target database(s). Note that it makes sense to include the offline table in the backup to ensure the user can conduct a RESTORE AOST=beforeImportStartTime, which would restore the importing table to it's pre-import state. However, the inclusion of offline table data in the backup can also lead to corrupt data on restore. 

Consider the following sequences:

Sequence 1: with a rolled back import
t0: begin IMPORT on foo
t1: backup foo with revision history - captures foo's pre-import state and some importing data
t2: rollback import foo via non-mvcc clear range
t3: incremental backup foo with revision history
 - fails to reintroduce foo
 
t4: restore foo to latest time
 - b/c of the non-mvcc clear range, the incremental backup is completely naive to the rollback, thus, the importing data will get restored. 
 
 Sequence 2: with a completed import
t0: begin IMPORT on foo
t1: backup foo with revision history - captures foo's pre-import state and some importing data
t2: complete IMPORT on foo
t3: incremental backup foo with revision history
 - fails to reintroduce foo. if the IMPORT used non-mvcc AddSSTable, then this incremental backup could have missed 
 spans from the completed import, leading to data loss.

t4: restore foo to latest time
 - foo could get restored with _some_ of the imported data
 
 Important note: in either scenario, if another incremental backup completed between t0 and t2, the backup/restore would work just fine. I.e. if an incremental backup observed the table offline at the start and end of its interval, there's no bug.
 
This bug closely relates to #87305 which is also apparent in 22.2, except this one also manifests on earlier releases and only for backups with revision history. Further, this bug is actually worse than #87305, because here, the incremental backup at t3 does not reintroduce foo's spans, rendering the backup unrestorable, and currently, with undetectable data corruption.

Here's the root cause: 
 - foo only appears in the `revs` input to `getReintroducedSpans`, not in `tables` because `tables` is created from `prevBackup.Descriptors` but as described above, `foo` is excluded from this field. This matters because when we look through revs to add to `tablesToReinclude`  and `reintroducedTables`, we only add a `rev` if it was _already_ in the `offlineInLastBackup` map which is constructed with the `tables` variable.

The implications:
- This backup chain cannot get restored to a valid state because we did not reintroduce foo.
- The next full backup that runs will  be fine.
- This bug should not affect a backup chain that has been taken fully on 22.2, where we backup all offline spans, but could affect a chain with backups taken on earlier versions.

 
Jira issue: CRDB-19657

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backupccl: offline table data in revision history backups can leak into restored cluster #88042

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

backupccl: offline table data in revision history backups can leak into restored cluster #88042

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions