schemachange: attempting to update succeeded job over and over

A customer cluster got all gunked up because a schema change (or a table truncation?) fails a couple of times a second with the following amusing message:
```
W190603 13:33:12.043023 211 sql/schema_changer.go:1586  [n1] Error executing schema change: failed to update job 456021744522723331: cannot update progress on succeeded job (id 456021744522723331)
```
Why someone is trying to update the progress of a succeeded job, I do not know. Two nodes racing on finishing the schema change maybe?

The schema change in question is:
```
456021744522723331 | SCHEMA CHANGE     | TRUNCATE TABLE <redacted> CASCADE
```
The table has id: 4191 and state: DROP and drop_job_id: 456021744522723331

These schema change retries kill us because, with every one, we seem to acquire and release the "schema change lease" for this table (I can see this by diffing consecutive versions of the descriptor) which eventually leads to the system config range being unable to accept writes because it's gotten too big and it can't be split.

[Debug.zip](https://drive.google.com/open?id=131rC76c6RdMfWfCxjWea3EU7Jw6ELaEf) here (internal only)

@dt you want this one?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schemachange: attempting to update succeeded job over and over #38088

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

schemachange: attempting to update succeeded job over and over #38088

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions