-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql: some schema changes to drop columns cannot be rolled back and will block other schema changes #47712
Description
This issue has been split off from #46541 (comment) because it's a more severe variant that leads to a permanently broken table descriptor state.
To recap from #46541, we currently treat rolling back a column drop as equivalent to adding the column in the first place. For some combinations of constraints and default/computed values in the column, this means we may never correctly roll back the column drop.
For example, a default value that is incompatible with a unique constraint causes validation to fail (using 19.2.5 to illustrate):
create table t (a int, b int unique default 1);
insert into t values (1, 1), (1, 2);
begin; alter table t drop column b; create unique index on t(a); commit;
The error that's returned is one about a not having unique values, but there's also an error from b during the rollback that will cause the async schema changer to permanently get stuck retrying forever:
W200413 17:46:00.575033 1658 sql/schema_changer.go:1175 [n1,client=[::1]:50514,user=root,scExec] error purging mutation: candidate pg code: 23505
- error with attached stack trace:
github.com/cockroachdb/cockroach/pkg/sql/row.NewUniquenessConstraintViolationError
/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/errors.go:135
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).wrapDupError
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:135
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).flush
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:113
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).mainLoop
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:225
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).doRun
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:135
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).Run
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:120
github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run
/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:372
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run
/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:372
github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill.func2.4
/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:873
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn.func1
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:717
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).exec
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:700
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:716
github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill.func2
/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:789
github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1
/go/src/github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:166
github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337
- error with embedded safe details: duplicate key value (%s)=(%s) violates unique constraint %q
-- arg 1: <string>
-- arg 2: <string>
-- arg 3: <string>
- duplicate key value (b)=(1) violates unique constraint "t_b_key"
while handling error: candidate pg code: 23505
- error with attached stack trace:
github.com/cockroachdb/cockroach/pkg/sql/row.NewUniquenessConstraintViolationError
/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/errors.go:135
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).wrapDupError
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:135
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).flush
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:113
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).mainLoop
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:225
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).doRun
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:135
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).Run
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:120
github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run
/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:372
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run
/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:372
github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill.func2.4
/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:873
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn.func1
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:717
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).exec
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:700
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:716
github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill.func2
/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:789
github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1
/go/src/github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:166
github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337
- error with embedded safe details: duplicate key value (%s)=(%s) violates unique constraint %q
-- arg 1: <string>
-- arg 2: <string>
-- arg 3: <string>
- duplicate key value (a)=(1) violates unique constraint "t_a_key"
This means we can never finish processing the rollback mutation on the table descriptor. In 19.2 and earlier versions, the async schema changer would have permanently kept retrying and failing to do so. In 20.1, we now give up and cause the schema change job to permanently fail. In either case, though, the result is that we end up with a mutation on the table descriptor that will never be cleaned up.
One very bad consequence of this is that other schema changes requiring mutations will never make progress, since they will always be queued behind the schema change that will never complete. I've confirmed this behavior in both 20.1 and 19.2.
Jira issue: CRDB-4385
Epic CRDB-104