schemachange: speed up slow schema changes#48608
schemachange: speed up slow schema changes#48608craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
| if scErr == nil { | ||
| return nil | ||
| } | ||
| switch { |
There was a problem hiding this comment.
probably cleaner as:
switch scErr := sc.exec(ctx); scErr {
case nil:
return nil
...
pkg/sql/schema_changer.go
Outdated
| } | ||
| } | ||
| return nil | ||
| return jobs.NewRetryJobError(scErr.Error()) |
There was a problem hiding this comment.
If you're here it probably means that your context was canceled. It's reasonably like that scErr is nil here which means this will panic.
There was a problem hiding this comment.
well if scErr was nil, we would return inside the body of the loop
| MaxBackoff: 20 * time.Second, | ||
| Multiplier: 1.5, | ||
| } | ||
| var scErr error |
There was a problem hiding this comment.
I'm not sure it makes sense to retain this across iterations of the loop.
There was a problem hiding this comment.
no but we need it after we exit the loop to return to registry the last error from the schema change
e14652c to
e281a34
Compare
|
❌ The GitHub CI (Cockroach) build has failed on e281a348. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Touches cockroachdb#47790. Release note (performance improvement): Before this a simple schema change could take 30s+. The reason was that if the schema change is not first in line in the table mutation queue it would return a re-triable error and the jobs framework will re-adopt and run it later. The problem is that the job adoption loop is 30s. To repro run this for some time: ``` cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);' ``` Instead of returning on retriable errors we retry with a exponential backoff in the schema change code. This pattern of dealing with retriable errors in client job code is encouraged vs relying on the registry beacuse the latter leads to slowness and additionally to more complicated test fixtures that rely in hacking with the internals of the job registry,
| MaxBackoff: 20 * time.Second, | ||
| Multiplier: 1.5, | ||
| } | ||
| var scErr error |
|
bors r+ |
Build failed (retrying...) |
|
bors r+ |
|
Already running a review |
Build succeeded |
Touches #45150.
Fixes #47607.
Touches #47790.
Release note (performance improvement):
Before this a simple schema change could take 30s+.
The reason was that if the schema change is not first
in line in the table mutation queue it would return a
re-triable error and the jobs framework will re-adopt and
run it later. The problem is that the job adoption loop
is 30s.
To repro run this for some time:
Instead of returning on re-triable errors we retry with exponential
backoff in the schema change code. This pattern of dealing with
re-triable errors in client job code is encouraged vs relying on the
registry because the latter leads to slowness and additionally to more
complicated test fixtures that rely on hacking with the internals of the
job registry,