Skip to content

schemachange: attempting to update succeeded job over and over #38088

@andreimatei

Description

@andreimatei

A customer cluster got all gunked up because a schema change (or a table truncation?) fails a couple of times a second with the following amusing message:

W190603 13:33:12.043023 211 sql/schema_changer.go:1586  [n1] Error executing schema change: failed to update job 456021744522723331: cannot update progress on succeeded job (id 456021744522723331)

Why someone is trying to update the progress of a succeeded job, I do not know. Two nodes racing on finishing the schema change maybe?

The schema change in question is:

456021744522723331 | SCHEMA CHANGE     | TRUNCATE TABLE <redacted> CASCADE

The table has id: 4191 and state: DROP and drop_job_id: 456021744522723331

These schema change retries kill us because, with every one, we seem to acquire and release the "schema change lease" for this table (I can see this by diffing consecutive versions of the descriptor) which eventually leads to the system config range being unable to accept writes because it's gotten too big and it can't be split.

Debug.zip here (internal only)

@dt you want this one?

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-1-stabilitySevere stability issues that can be fixed by upgrading, but usually don’t resolve by restarting

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions