Skip to content

sql: some schema changes to drop columns cannot be rolled back and will block other schema changes #47712

@thoszhang

Description

@thoszhang

This issue has been split off from #46541 (comment) because it's a more severe variant that leads to a permanently broken table descriptor state.

To recap from #46541, we currently treat rolling back a column drop as equivalent to adding the column in the first place. For some combinations of constraints and default/computed values in the column, this means we may never correctly roll back the column drop.

For example, a default value that is incompatible with a unique constraint causes validation to fail (using 19.2.5 to illustrate):

create table t (a int, b int unique default 1);
insert into t values (1, 1), (1, 2);
begin; alter table t drop column b; create unique index on t(a); commit;

The error that's returned is one about a not having unique values, but there's also an error from b during the rollback that will cause the async schema changer to permanently get stuck retrying forever:

W200413 17:46:00.575033 1658 sql/schema_changer.go:1175  [n1,client=[::1]:50514,user=root,scExec] error purging mutation: candidate pg code: 23505
  - error with attached stack trace:
    github.com/cockroachdb/cockroach/pkg/sql/row.NewUniquenessConstraintViolationError
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/errors.go:135
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).wrapDupError
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:135
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).flush
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:113
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).mainLoop
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:225
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).doRun
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:135
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).Run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:120
    github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:372
    github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:372
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill.func2.4
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:873
    github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:717
    github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).exec
    	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:700
    github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn
    	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:716
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill.func2
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:789
    github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:166
    github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
    	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
    runtime.goexit
    	/usr/local/go/src/runtime/asm_amd64.s:1337
  - error with embedded safe details: duplicate key value (%s)=(%s) violates unique constraint %q
    -- arg 1: <string>
    -- arg 2: <string>
    -- arg 3: <string>
  - duplicate key value (b)=(1) violates unique constraint "t_b_key"
while handling error: candidate pg code: 23505
  - error with attached stack trace:
    github.com/cockroachdb/cockroach/pkg/sql/row.NewUniquenessConstraintViolationError
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/errors.go:135
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).wrapDupError
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:135
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).flush
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:113
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).mainLoop
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:225
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).doRun
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:135
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).Run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:120
    github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:372
    github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:372
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill.func2.4
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:873
    github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:717
    github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).exec
    	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:700
    github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn
    	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:716
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill.func2
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:789
    github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:166
    github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
    	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
    runtime.goexit
    	/usr/local/go/src/runtime/asm_amd64.s:1337
  - error with embedded safe details: duplicate key value (%s)=(%s) violates unique constraint %q
    -- arg 1: <string>
    -- arg 2: <string>
    -- arg 3: <string>
  - duplicate key value (a)=(1) violates unique constraint "t_a_key"

This means we can never finish processing the rollback mutation on the table descriptor. In 19.2 and earlier versions, the async schema changer would have permanently kept retrying and failing to do so. In 20.1, we now give up and cause the schema change job to permanently fail. In either case, though, the result is that we end up with a mutation on the table descriptor that will never be cleaned up.

One very bad consequence of this is that other schema changes requiring mutations will never make progress, since they will always be queued behind the schema change that will never complete. I've confirmed this behavior in both 20.1 and 19.2.

Jira issue: CRDB-4385

Epic CRDB-104

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions