Skip to content

backupccl: incrementally backup in progress imports on existing tables and elide importing data in RESTORE #86054

@msbutler

Description

@msbutler

Currently, a table with an in-progress cannot get backed up, let-alone restored to its pre-import state. Backing up an in-progress import also has the benefit of distributing the work of backing up the import over a series of incremental backups, as oppose to what currently occurs: the first incremental backup to begin after the import finishes has to back up everything.

To address this, main challenge involves rolling back imported data on the restored cluster. To understand why this is a challenge, consider how rollbacks occur today:

If an IMPORT writing data into an existing, non-empty cluster fails or is cancelled mid-IMPORT, to roll it back, any rows it had written are found and deleted by scanning the table for rows with a timestamp greater than the time at which the IMPORT started. This works since the table is offline to other writes while it is importing, but relies on the fact that the timestamps on rows do not change -- which may not be true if the table were backed up and then restored, after which all keys, both existing and imported, would have times based on when it was restored.

The second paragraph in #76722 outlined one strategy which involved writing additional metadata to each imported key, and indeed several PRs began implementing this approach (#85338, #85692 #85138). However, we realized that binding the Import Start Time in the backed up table descriptor is sufficient. Specifically, when the restore_data_processor rewrites backed up keys to the restore cluster, it can use the ImportStartTime in the restored table descriptor to filter out keys in the backed up, in-progress import, before AddSSTable rewrites the timestamps of all the keys.

Note: the more complicated approach outlined in #76722 would have been necessary if RESTOREs of whole tenants implemented MVCC AddSSTable-- i.e. rewrote timestamps in RESTORE-- because during the restore, the host tenant cannot access tenant table descriptors and thus filter keys in the restore processor. And indeed, we thought it was necessary to make whole tenant RESTOREs MVCC compatible. But now, we no longer think that whole tenant operations (like tenant streaming) need to be MVCC, since it's relatively easy to ensure that all downstream operations understand that whole tenant operations are non-MVCC. So given that whole tenant restores will continue to preserve timestamps from the backup, the restored tenant can rollback their import using the normal process described in the second paragraph.

Jira issue: CRDB-18546

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-disaster-recoverybranch-release-22.2Used to mark GA and release blockers, technical advisories, and bugs for 22.2release-blockerIndicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions