-
Notifications
You must be signed in to change notification settings - Fork 4.1k
import: track and rollback imported data without timestamps #76722
Description
Currently if an IMPORT writing data into an existing, non-empty cluster fails or is cancelled mid-IMPORT, to roll it back, any rows it had written are found and deleted by scanning the table for rows with a timestamp greater than the time at which the IMPORT started. This works since the table is offline to other writes while it is importing, but relies on the fact that the timestamps on rows do not change -- which may not be true if the table were backed up and then restore, after which all keys, both existing and imported, would have times based on when it was restored. Today this is a non-issue as importing tables are not included in backups and are not restored, but we would like to change this, which will thus require finding and rolling back rows imported by a failed import without relying on their MVCC timestamps to have any particular value.
Instead, we should add a field to roaches.Value that can be used to mark values imported by a specific IMPORt, and then later, if we need to rollback that IMPORt, find those values and delete them. This new field could be an arbitrarily tag collection. Being a top-level field will make it easier to do predicate push-down to the likes of DeleteRange so it could be told to delete all rows with this marker value without needing to know how to dig into the packed Value field itself.
Jira issue: CRDB-13248
Epic CRDB-14921