Skip to content

import: track and rollback imported data without timestamps #76722

@dt

Description

@dt

Currently if an IMPORT writing data into an existing, non-empty cluster fails or is cancelled mid-IMPORT, to roll it back, any rows it had written are found and deleted by scanning the table for rows with a timestamp greater than the time at which the IMPORT started. This works since the table is offline to other writes while it is importing, but relies on the fact that the timestamps on rows do not change -- which may not be true if the table were backed up and then restore, after which all keys, both existing and imported, would have times based on when it was restored. Today this is a non-issue as importing tables are not included in backups and are not restored, but we would like to change this, which will thus require finding and rolling back rows imported by a failed import without relying on their MVCC timestamps to have any particular value.

Instead, we should add a field to roaches.Value that can be used to mark values imported by a specific IMPORt, and then later, if we need to rollback that IMPORt, find those values and delete them. This new field could be an arbitrarily tag collection. Being a top-level field will make it easier to do predicate push-down to the likes of DeleteRange so it could be told to delete all rows with this marker value without needing to know how to dig into the packed Value field itself.

Jira issue: CRDB-13248

Epic CRDB-14921

Metadata

Metadata

Assignees

Labels

C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-disaster-recovery

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions