Skip to content

bulkio, sql: mutations with job IDs are persisted in backups and cause problems after restoring #44019

@thoszhang

Description

@thoszhang

When taking backups, we currently preserve the contents of mutationJobs and gc_mutations on the table descriptors, which contain job IDs for the schema change. This causes problems when restoring the backup on a different cluster, where jobs with those IDs don't exist.

For a simple reproduction (with gc_mutations), create a table with an index and drop the index, which leaves a job in the "waiting for GC TTL" state, and then take a backup of the table:

root@:26257/defaultdb> create table t (a int, index (a));
CREATE TABLE

Time: 10.054ms

root@:26257/defaultdb> insert into t values (1), (2), (3);
INSERT 3

Time: 3.78ms

root@:26257/defaultdb> drop index t@t_a_idx;
DROP INDEX

Time: 112.24ms

root@:26257/defaultdb> show jobs;
        job_id       |   job_type    |              description              | statement | user_name | status  |   running_status   |             created              |             started              | finished |             modified             | fraction_completed | error | coordinator_id
+--------------------+---------------+---------------------------------------+-----------+-----------+---------+--------------------+----------------------------------+----------------------------------+----------+----------------------------------+--------------------+-------+----------------+
  521128172017778689 | SCHEMA CHANGE | DROP INDEX defaultdb.public.t@t_a_idx |           | root      | running | waiting for GC TTL | 2020-01-15 16:34:57.023391+00:00 | 2020-01-15 16:34:57.048795+00:00 | NULL     | 2020-01-15 16:34:57.095392+00:00 |                  0 |       |           NULL
(1 row)

Time: 6.022ms

root@:26257/defaultdb> backup t to 'nodelocal:///backup';
        job_id       |  status   | fraction_completed | rows | index_entries | system_records | bytes
+--------------------+-----------+--------------------+------+---------------+----------------+-------+
  521128953384796161 | succeeded |                  1 |    3 |             0 |              0 |    57
(1 row)

Time: 50.286ms

Inspecting the backup with protoc --decode_raw shows that field 33, which is gc_mutations, is preserved:

    33 {
      1: 2
      2: 1579106097092344000
      3: 521128172017778689
    }

Then start a fresh cluster and restore the table, and do something to trigger the async schema changer on that table, like updating the zone config to change gc.ttlseconds:

root@:26257/defaultdb> restore table t from 'nodelocal:///backup';
        job_id       |  status   | fraction_completed | rows | index_entries | system_records | bytes
+--------------------+-----------+--------------------+------+---------------+----------------+-------+
  521133970934759425 | succeeded |                  1 |    3 |             0 |              0 |    57
(1 row)

Time: 63.545ms

root@:26257/defaultdb> alter table t configure zone using gc.ttlseconds='10';
CONFIGURE ZONE 1

Time: 7.229ms

Eventually log statements like this will appearing several times per second (I tested this with a local 3-node cluster and a 1-node cluster):

I200115 17:39:13.936893 159 sql/sqlbase/structured.go:1516  [n1,scExec] publish: descID=52 (t) version=7 mtime=1970-01-01 00:00:00 +0000 UTC
W200115 17:39:13.938967 159 sql/schema_changer.go:1991  [n1] Error executing schema change: job with ID 521128172017778689 does not exist
I200115 17:39:13.943209 159 sql/sqlbase/structured.go:1516  [n1,scExec] publish: descID=52 (t) version=7 mtime=1970-01-01 00:00:00 +0000 UTC
W200115 17:39:13.945117 159 sql/schema_changer.go:1991  [n1] Error executing schema change: job with ID 521128172017778689 does not exist

I believe this is happening because SchemaChanger.maybeGCMutations() is getting called in a loop (from SchemaChanger.exec()), and it calls Publish() but never succeeds because the callback to update the job status on success can't find the job: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/schema_changer.go#L789

I'm not sure why the attempt occurs multiple times per second, or how much of this is caused by retries inside Publish()/PublishMultiple() and how much of it is caused by retries in the SchemaChangeManager. The retries seem to happen more frequently on a 3-node cluster.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions