Skip to content

filetable: fix corruption caused by internal executor behavior leading to missing constraints  #76764

@ajwerner

Description

@ajwerner

Describe the problem

Part of the schema change protocol requires a pre-commit hook where a bunch of stuff happens related to descriptor management and safety. When you use the internal executor API with a non-nil transaction, it bypasses the lifecycle
and state management which normally happens for SQL transactions. In particular, the deferred creation of jobs and
the checking of the two version invariant.

In the case of filetable, we see just that:

addFKQuery := fmt.Sprintf(`ALTER TABLE %s ADD CONSTRAINT file_id_fk FOREIGN KEY (
file_id) REFERENCES %s (file_id)`, f.GetFQPayloadTableName(), f.GetFQFileTableName())
_, err = ie.ExecEx(ctx, "create-payload-table", txn,
sessiondata.InternalExecutorOverride{User: f.username},
addFKQuery)
if err != nil {
return errors.Wrap(err, "failed to add FK constraint to the payload table file_id column")
}

The intention here is good, but the outcome is dramatically bad. The end result is that the table does not have the foreign key that was intended and the descriptor is corrupt in that it references a job which was never created.

To Reproduce


Update as of 2022-09-20: 3c44cad fixes the bug that allowed the corruption to occur, but this issue should remain open to track work to fix any corruption that exists out in the wild.


CREATE DATABASE to_backup;
CREATE DATABASE backups;
BACKUP DATABASE to_backup INTO 'userfile://backups.public.userfiles_$user/data';
SELECT * FROM backups.crdb_internal.invalid_objects;
  id  | database_name | schema_name |            obj_name            |                     error
------+---------------+-------------+--------------------------------+-------------------------------------------------
  118 | backups       | public      | userfiles_$user_upload_payload | mutation job 737535981478510593: job not found
SELECT t->'mutationJobs', t->'mutations'
  FROM (
        SELECT crdb_internal.pb_to_json('desc', descriptor)->'table' AS t
          FROM system.descriptor
         WHERE id = '"userfiles_$user_upload_payload"'::REGCLASS
       );
[{"jobId": "737535981478510593", "mutationId": 1}]
[
  {
    "constraint": {
      "check": {
        "constraintId": 3
      },
      "constraintType": "FOREIGN_KEY",
      "foreignKey": {
        "constraintId": 3,
        "name": "file_id_fk",
        "originColumnIds": [
          1
        ],
        "originTableId": 118,
        "referencedColumnIds": [
          2
        ],
        "referencedTableId": 117,
        "validity": "Validating"
      },
      "name": "file_id_fk",
      "uniqueWithoutIndexConstraint": {
        "constraintId": 3
      }
    },
    "direction": "ADD",
    "mutationId": 1,
    "state": "DELETE_ONLY"
  }
]

Jira issue: CRDB-13265

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-disaster-recovery

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions