sql: schema change job resumer#44391
Conversation
18dca1d to
6a02833
Compare
3a9d1d8 to
59a342b
Compare
59a342b to
8e77ef1
Compare
spaskob
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @spaskob)
pkg/jobs/registry.go, line 265 at r1 (raw file):
StatusFailed
or StatusCanceled perhaps?
pkg/jobs/registry.go, line 338 at r1 (raw file):
// Registry.Start has been called will not have any effect. // TODO (lucy): change this back var DefaultAdoptInterval = 100 * time.Millisecond
you will change back before merging?
This may be too fast as the code polls the job table.
a29620f to
8a48e81
Compare
dddd41b to
4a78615
Compare
588204c to
b35604e
Compare
|
RFAL |
|
FYI, I became dissatisfied with my attempt to improve the internal executor API (some progress is in #45617), so I decided to put that aside and stick with the approach of giving the job registry an internal executor factory for now. One advantage of this is that it requires fewer changes to the schema change internals. I think we can live with it for 20.1. |
1362420 to
d064916
Compare
spaskob
left a comment
There was a problem hiding this comment.
Thank you for the detailed description of the commits!
It's a big PR so I did a first pass and added some preliminary comments. I plan to do more after seeing comments from the others. I don't have knowledge of most of the components apart from jobs so I wonder what is an effective way to get this PR through why still making sure it is well reviewed.
Reviewed 56 of 59 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @lucy-zhang, and @pbardea)
pkg/jobs/registry.go, line 890 at r2 (raw file):
if err != nil { // TODO (lucy): This needs to distinguish between assertion errors in // the job registry and assertion errors elsewhere, and only fail on the
What do you mean by elsewhere? The Resume function?
pkg/jobs/registry_test.go, line 188 at r2 (raw file):
t.Skip("") // TODO (lucy): This test probably shouldn't continue to exist in its current // form if GCMutations will cease to be used. Refactor or get rid of it.
Is it possible to fix the test in this PR? The GTC code has been a source of production issues.
pkg/sql/schema_changer.go, line 678 at r2 (raw file):
} // if dropJobID != 0 {
Is this a leftover from testing or needs a TODO to document?
pkg/sql/schema_changer.go, line 769 at r2 (raw file):
if err := sc.initJobRunningStatus(ctx); err != nil { if log.V(2) { log.Infof(ctx, "Failed to update job %d running status: %+v", *sc.job.ID(), err)
Please use format:
"job %d: ..."
We should put some job specific logging utilities in place at some point...
pkg/sql/schema_changer.go, line 1792 at r2 (raw file):
log.Infof( ctx, "descriptor %d not found for schema change job %d processing mutation %d;"+
job %d: ....
This the format I have consistently used for jobs.
pkg/sql/schema_changer.go, line 1804 at r2 (raw file):
return err } else { return jobs.NewRetryJobError(err.Error())
Is it possible for a job to be stuck in always returning retriable errors?
pkg/sql/schema_changer.go, line 1848 at r2 (raw file):
} if details.TableID == sqlbase.InvalidID { return errors.AssertionFailedf("job %d has no database ID or table ID", r.job.ID())
job %d: ...
pkg/sql/schema_changer.go, line 1879 at r2 (raw file):
log.Infof( ctx, "descriptor %d not found for rollback of schema change job %d processing mutation %d;"+
ditto
pkg/sql/schema_changer.go, line 1891 at r2 (raw file):
// Always retry when we get an error. Otherwise we risk leaving the in- // progress schema change state on the table descriptor forever. return jobs.NewRetryJobError(rollbackErr.Error())
similarly here can we get stuck in infinitely retrying a job in this case?
pkg/sql/table.go, line 857 at r2 (raw file):
var spanList []jobspb.ResumeSpanList if job != nil {
nit: since this predicate or it's negation are used later in the code, consider a bool
jobExists := job != nil
pkg/sql/table.go, line 934 at r2 (raw file):
} } log.Infof(ctx, "updated existing schema change job for table %d, mutation %d",
add the job id as well, I am trying to have a consistent format of job related messages:
"job %d: info about this job"
pkg/sql/delegate/show_jobs.go, line 51 at r2 (raw file):
// TODO (lucy): Ideally we should retry a few times with a small backoff to // quickly pick up the schema change jobs that run quickly, before increasing
The jobs infrastructure is not a good fit for sub-minute executions. For future releases we should re-think what part of the job infrastructure can be abstracted away to support small fast operations that can be queued but don't need support for resuming (ie they can be retired if failed).
ajwerner
left a comment
There was a problem hiding this comment.
very preliminary first pass, will need more, just wanted to publish the comments I had.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @lucy-zhang, and @pbardea)
pkg/jobs/registry.go, line 199 at r2 (raw file):
} // SetSessionBoundInternalExecutorFactory sets the
nit: can you note that this this isn't on the constructor to bypass a circular dependency?
pkg/server/server.go, line 836 at r2 (raw file):
// SessionBoundInternalExecutorFactory. s.distSQLServer.ServerConfig.SessionBoundInternalExecutorFactory = func(
these factory functions are the same, can you just make it a variable?
pkg/sql/conn_executor.go, line 2131 at r2 (raw file):
// error is the schema change error. // TODO (lucy): I'm not sure the above is true. What about DROP TABLE // with multiple tables?
Or DROP DATABASE
pkg/sql/schema_changer.go, line 1492 at r2 (raw file):
} } if err := sc.job.WithTxn(txn).SetDetails(
:( every time I see this API used I get so sad. It needs to go with serious urgency.
pkg/sql/schema_changer.go, line 1788 at r2 (raw file):
}, } if err := sc.exec(ctx); err != nil {
nit: I think this block deserves an NB explaining that gives some discussions of the types of errors and their handling. I also might do this as a switch, I don't think the linter likes returns in if { ... } else if { ... } chains.
pkg/sql/schema_changer.go, line 1833 at r2 (raw file):
// OnTerminal is part of the jobs.Resumer interface. func (r schemaChangeResumer) OnTerminal(
Given the whole thing with queued jobs behind this one needing to wait for an adoption and the latency involved there, maybe it'd be a good idea to poke the adoption loop here if there is another job. Of course the complexity there is knowing that there is another job queued up behind this job. There's some opportunities to do that but it's a little awkward.
I hate the arbitrary interface to these callbacks.
528ae03 to
95caf3e
Compare
| optional int64 expiration_time = 2 [(gogoproto.nullable) = false]; | ||
| } | ||
| optional SchemaChangeLease lease = 15; | ||
| optional SchemaChangeLease lease = 15 [deprecated = true]; |
There was a problem hiding this comment.
can be a followup, but I think we usually also rename the field to DeprecatedFoo since the annotation is only shown in some editors.
| // Find the job. | ||
| var jobID int64 | ||
| if len(tableDesc.MutationJobs) > 0 { | ||
| // TODO (lucy): We need to get rid of MutationJobs. This is the only place |
There was a problem hiding this comment.
Ugh, yeah, I'd forgotten this bit of code, and how unfortunate its use of jobs was. I've been wanting to rework this and how we checkpoint/load spans constantly in the backfiller and this is yet another bit of motivation.
| return err | ||
| } | ||
| } | ||
| if err := sc.ExtendLease(ctx, lease); err != nil { |
There was a problem hiding this comment.
2¢: I think it might be good to include checks to ctx.Err() where we used to check for lease loss, since job-lease-loss would be communicated that way. We'll probably find it anyway when we go to do an RPC but it'd be nice to have some explicit checks at least as a often as we used to check the lease i think?
| return err | ||
| } | ||
|
|
||
| if err := sc.ExtendLease(ctx, lease); err != nil { |
95caf3e to
082b9c2
Compare
This PR adds a `SessionBoundInternalExecutorFactory` to the job registry, and exposes the ability to create session bound internal executors in jobs via a method `MakeSessionBoundInternalExecutor()` on `Job`. This is intended for use by the schema change job resumer, which stores state on the internal executors it creates so that it can override the tables in the TableCollection for each internal validation query. Release note: None
Previously, the job adoption loop would pick a job to adopt by attempting to adopt jobs in decreasing order by created timestamp. On a single-node cluster, this could cause a deadlock where a schema change job is never adopted because it's stuck behind a newer schema change job with a higher mutation ID, which itself is stuck in a retry loop due to not being first in line in the mutations list. This PR randomizes the order in which jobs are up for attempted adoption when triggered by the timer. In the case where we attempt to adopt a job upon being notified of a queued job, we continue to try to adopt the newest job. Release note (general change): Non-running jobs are now considered for adoption in randomized order instead of in a determistic order by creation time, to avoid potential deadlocks when schema change jobs need to execute in a specific order. This is a preemptive change, not a bug fix, but it affects all jobs.
082b9c2 to
71af92a
Compare
thoszhang
left a comment
There was a problem hiding this comment.
This is now temporarily rebased on top of #45765.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @pbardea, and @spaskob)
pkg/jobs/registry.go, line 199 at r2 (raw file):
Previously, ajwerner wrote…
nit: can you note that this this isn't on the constructor to bypass a circular dependency?
Done.
pkg/jobs/registry.go, line 890 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
What do you mean by elsewhere? The Resume function?
Yeah, that's what I meant. I've updated the comment.
pkg/jobs/registry_test.go, line 188 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
Is it possible to fix the test in this PR? The GTC code has been a source of production issues.
Probably, I'll work on it.
pkg/server/server.go, line 836 at r2 (raw file):
Previously, ajwerner wrote…
these factory functions are the same, can you just make it a variable?
Done.
pkg/sql/conn_executor.go, line 2131 at r2 (raw file):
Previously, ajwerner wrote…
Or
DROP DATABASE
DROP DATABASE is a special case since we only queue a single job, so whatever single error we get from that is what we'll return to the client. DROP TABLE can drop multiple tables each in their own jobs, though, which is what I was concerned about. In Registry.Run() we're returning the first error we get from any queued job, so you'd only be able to see other errors by looking them up in the jobs table.
But, actually, there's more to it. I'd been assuming that a single schema change statement can only produce a single job, with the sole exception of DROP TABLE. But this isn't true, because in order to reproduce the existing SchemaChanger behavior, we're now queuing a job for every updated table, which includes, e.g., the referenced table for a new foreign key, just so that job can wait for old table leases to expire on that table.
The more I think about it, the more I think we should just be waiting for old table leases to expire somewhere else. There's no good reason why it has to happen in the schema changer itself. We could just iterate through all modified tables in the TableCollection in the connExecutor after we finish waiting for all queued jobs and wait until all their old leases expire. Spas points out that these very small jobs that don't need cancel/resume/etc. shouldn't really be jobs in the normal sense at all, which I think is true.
pkg/sql/schema_changer.go, line 678 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
Is this a leftover from testing or needs a TODO to document?
I meant to get rid of the lines commented out. But we actually don't need a job update here at all, since the GC job is what should be setting the status to Status_WAIT_FOR_GC_INTERVAL; after draining names for a dropped table, the job itself is done. I've removed the job update entirely.
pkg/sql/schema_changer.go, line 769 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
Please use format:
"job %d: ..."
We should put some job specific logging utilities in place at some point...
Addressed in #45728. I took out the job %d here.
pkg/sql/schema_changer.go, line 1492 at r2 (raw file):
Previously, ajwerner wrote…
:( every time I see this API used I get so sad. It needs to go with serious urgency.
I agree
pkg/sql/schema_changer.go, line 1788 at r2 (raw file):
Previously, ajwerner wrote…
nit: I think this block deserves an
NBexplaining that gives some discussions of the types of errors and their handling. I also might do this as a switch, I don't think the linter likes returns inif { ... } else if { ... }chains.
Done.
pkg/sql/schema_changer.go, line 1792 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
job %d: ....
This the format I have consistently used for jobs.
Same as above
pkg/sql/schema_changer.go, line 1804 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
Is it possible for a job to be stuck in always returning retriable errors?
Not normally, but a bug somewhere that causes endless retriable errors could cause this (which was also true before). Overall, I'm trying to make the error handling logic match the old behavior, where any outstanding mutations would cause a schema changer to attempt to run no matter what. The goal is for table descriptors to never get out of sync with jobs and to avoid orphaned mutations, by ensuring that (for schema changes with mutations) there's always a job as long as the mutation is still there. I think this is still an improvement over the previous implementation, since the difference now is that jobs can at least be paused or cancelled if we're looping Resume().
Always retrying in OnFailOrCancel() is the more risky decision, since we can't cancel the job in that state, and we'd be retrying even on permanent errors (which is also the pre-existing behavior). But even in that case we'd be able to at least pause the job.
I think this change constitutes a strict improvement over the previous behavior, in that we can at least always pause the job if it's looping. Arguably, we should give up on retrying after some number of attempts and just leave the bad state to be cleaned up manually, but I'd rather leave that for a later PR (and we may want to make that change in the job registry for all jobs, not in this job resumer).
pkg/sql/schema_changer.go, line 1833 at r2 (raw file):
Previously, ajwerner wrote…
Given the whole thing with queued jobs behind this one needing to wait for an adoption and the latency involved there, maybe it'd be a good idea to poke the adoption loop here if there is another job. Of course the complexity there is knowing that there is another job queued up behind this job. There's some opportunities to do that but it's a little awkward.
I hate the arbitrary interface to these callbacks.
I think that's a good idea, especially for primary key changes where we queue up another job with a mutation after the first one finishes. I'm going to file this away as a potential improvement to follow (and as part of thinking more about how to improve the situation with #45150). (FWIW, OnTerminal is sort of deprecated in favor of just using both Resume and OnFailOrCancel, and I think it's slated to go away after 20.1.)
pkg/sql/schema_changer.go, line 1848 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
job %d: ...
Same as above
pkg/sql/schema_changer.go, line 1879 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
ditto
Same as above
pkg/sql/schema_changer.go, line 1891 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
similarly here can we get stuck in infinitely retrying a job in this case?
See above comment for Resume().
pkg/sql/table.go, line 857 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
nit: since this predicate or it's negation are used later in the code, consider a bool
jobExists := job != nil
Done.
pkg/sql/table.go, line 934 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
add the job id as well, I am trying to have a consistent format of job related messages:
"job %d: info about this job"
Done.
pkg/sql/delegate/show_jobs.go, line 51 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
The jobs infrastructure is not a good fit for sub-minute executions. For future releases we should re-think what part of the job infrastructure can be abstracted away to support small fast operations that can be queued but don't need support for resuming (ie they can be retired if failed).
Yeah, I agree with this.
One schema change-specific issue is that many (though not all) of the short jobs that we're queuing are doing nothing except waiting for old table leases to expire, which we used to queue a schema changer for, but probably didn't need to be in the schema changer at all. We should consider waiting for old table leases to expire as a separate step in the connExecutor after all the queued jobs run, which would allow us to stop queuing jobs for schema changes that require nothing beyond a write to the table descriptor.
pkg/sql/rowexec/backfiller.go, line 271 at r5 (raw file):
Previously, dt (David Taylor) wrote…
Ugh, yeah, I'd forgotten this bit of code, and how unfortunate its use of jobs was. I've been wanting to rework this and how we checkpoint/load spans constantly in the backfiller and this is yet another bit of motivation.
I'm hoping that even if we don't change the way this code works, we can at least give the backfiller a job ID or something.
pkg/sql/sqlbase/structured.proto, line 640 at r5 (raw file):
Previously, dt (David Taylor) wrote…
can be a followup, but I think we usually also rename the field to
DeprecatedFoosince the annotation is only shown in some editors.
Hmm, I saw something about how the JSON protobuf encoding uses field names in the admin UI or something and didn't want to risk changing the name. Maybe that doesn't apply since we're not using the field anymore.
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @pbardea, and @spaskob)
pkg/sql/conn_executor.go, line 2131 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
DROP DATABASEis a special case since we only queue a single job, so whatever single error we get from that is what we'll return to the client.DROP TABLEcan drop multiple tables each in their own jobs, though, which is what I was concerned about. InRegistry.Run()we're returning the first error we get from any queued job, so you'd only be able to see other errors by looking them up in the jobs table.But, actually, there's more to it. I'd been assuming that a single schema change statement can only produce a single job, with the sole exception of
DROP TABLE. But this isn't true, because in order to reproduce the existingSchemaChangerbehavior, we're now queuing a job for every updated table, which includes, e.g., the referenced table for a new foreign key, just so that job can wait for old table leases to expire on that table.The more I think about it, the more I think we should just be waiting for old table leases to expire somewhere else. There's no good reason why it has to happen in the schema changer itself. We could just iterate through all modified tables in the TableCollection in the connExecutor after we finish waiting for all queued jobs and wait until all their old leases expire. Spas points out that these very small jobs that don't need cancel/resume/etc. shouldn't really be jobs in the normal sense at all, which I think is true.
I was just having this very conversation with @pbardea. The interactions between waiting for leases to drain, waiting for leases to drain on versions of tables which don't even exist yet, and waiting for jobs is very hard for me to reason about. This waiting is a key component of the "correctness" of our DDLs.
For some jobs it's also critical to wait for multiple steps through multiple versions. For other changes we don't want to do much waiting at all.
For example, when dropping a database, we want to wait for there to be no leases on any of its tables. When dropping an index, we just need to wait for there to be no leases on that version that had that index as public. When we do the drop as a "backfill" we definitely don't need to wait for that backfill to finish (though if that's the behavior, I'm not so upset about it). I see there's a boolean threaded through decisions to control the queueing of jobs which I surmise is related to this waiting.
The fact that this waiting relates to correctness has me worried about how we reason about it. Is there a way we can logically centralize the waiting. Or at least clarify what changes lead to what waiting?
pkg/sql/schema_changer.go, line 1804 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
Is it possible for a job to be stuck in always returning retriable errors?
I'm actually okay with that so long as the errors really are retriable. Much better than having to think about the case where we fail out due to retriable errors.
All that being said, should there be some mechanism to do some backoff if we're spinning?
Please add some metrics or at least a TODO about metrics. I could see the metrics existing in the jobs layer.
pkg/sql/schema_changer.go, line 1891 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
similarly here can we get stuck in infinitely retrying a job in this case?
I'd much rather retry forever than do something incorrect. This should potentially be louder.
ajwerner
left a comment
There was a problem hiding this comment.
A bunch of people have now had some eyes on this. The commentary has been a mix of theoretical and nit-picky without a lot of highly actionable concrete feedback. The thing with this PR is it's scary for what it changes and deletes much more than for what it adds. We're learning a lot less leaving this in review than having it merged.
The question in my mind is, how much pain is there going to be post merge? Are there known issues with testing w.r.t. either failure or timings? If the answer is no then I'm emphatically in favor of merging this thing first and then spending time understanding the implications once it's merged. Maybe Sunday is better than today.
If there are items you'd like to do before pressing the button, it might be helpful to reviewers if you could list them out. If there's anything I or anybody can do to help, let us know.
I'd rather have this in and building confidence as the rest of the company shifts to testing and staring at the database a bunch than to have it sit in several more rounds of review. I feel that way even with a substantial list of follow up and testing to be added later so long as the merge doesn't disrupt everybody.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @pbardea, and @spaskob)
|
to echo what @ajwerner said, I think another point in favor of merging sooner is we (reviewers) could help take some of the workload in some cases by just doing the suggested nits/cleanups/etc as followups |
|
Sounds good to me, I will do one look now and will approve unless I find something concrete. |
adaa258 to
f9ac5a2
Compare
Release note (general change): TODO
f9ac5a2 to
497f15e
Compare
thoszhang
left a comment
There was a problem hiding this comment.
What I'm concerned about now is that the logic tests appear to be flaky. TC reports that TestParallel/subquery_retry_multinode has timed out twice in the last few iterations of this PR (and a few times before that, though the code has diverged). I don't get any failures when stressing this test on particular on my laptop. I'm not sure if this is a case of all the logic tests just taking longer, or if there's a new deadlock introduced somewhere that happens nondeterministically, or something else, but I'd rather not merge in this state. I'm still looking into it.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @pbardea, and @spaskob)
pkg/sql/conn_executor.go, line 2131 at r2 (raw file):
I see there's a boolean threaded through decisions to control the queueing of jobs which I surmise is related to this waiting.
queueJob is actually controlling whether we're queuing individual jobs to drop tables (which is to say, drain names for them; GC happens separately), or whether we don't queue individual jobs because the tables are being dropped as part of CREATE DATABASE. I added a comment to clarify in drop_database.go.
As for the issue being discussed, the places where we currently wait for leases to drain (i.e., call WaitForOneVersion are
PublishMultiple/Publish(so, internal waiting while advancing the schema change state machine)- in
waitToUpdateLeasesin adeferat the end of executing every schema change, to ensure that all old leases have expired for the table being operated on before we return results to the client - a bunch of somewhat ad-hoc calls to
waitToUpdateLeasesthroughout the schema changer, usually with the same goal as in (2) but for other tables (e.g., a referenced table in a FK constraint)
(The job registry only waits for the job to finish, so anything related to waiting for leases to drain currently has to go in the job.)
My original thought was to have the schema change job be responsible for all "internal" waiting within the job and for something else to be responsible for waiting for leases to drain on all updated table descriptors after a job. But most of the complications seem to be with schema changes that update multiple table descriptors (FKs, interleaved indexes, truncate, etc.). I don't really have a good proposal right now.
pkg/sql/schema_changer.go, line 1804 at r2 (raw file):
All that being said, should there be some mechanism to do some backoff if we're spinning?
Probably. I added some TODOs in the job registry.
pkg/sql/schema_changer.go, line 1891 at r2 (raw file):
Previously, ajwerner wrote…
I'd much rather retry forever than do something incorrect. This should potentially be louder.
I elaborated slightly in the comment.
|
|
|
Oh, thanks, I overlooked the newest activity on that issue. I'm going to merge this now, TFTRs |
jobs: allow jobs to create internal executors
This PR adds a
SessionBoundInternalExecutorFactoryto the jobregistry, and exposes the ability to create session bound internal
executors in jobs via a method
MakeSessionBoundInternalExecutor()onJob.This is intended for use by the schema change job resumer, which stores
state on the internal executors it creates so that it can override the
tables in the TableCollection for each internal validation query.
Release note: None
jobs: randomize job order in adoption loop
Previously, the job adoption loop would pick a job to adopt by
attempting to adopt jobs in decreasing order by created timestamp. On a
single-node cluster, this could cause a deadlock where a schema change
job is never adopted because it's stuck behind a newer schema change job
with a higher mutation ID, which itself is stuck in a retry loop due to
not being first in line in the mutations list.
This PR randomizes the order in which jobs are up for attempted
adoption when triggered by the timer. In the case where we attempt to
adopt a job upon being notified of a queued job, we continue to try to
adopt the newest job.
Release note (general change): Non-running jobs are now considered for
adoption in randomized order instead of in a determistic order by
creation time, to avoid potential deadlocks when schema change jobs need
to execute in a specific order. This is a preemptive change, not a bug
fix, but it affects all jobs.
sql, jobs: add schema change job resumer
[Commit message is WIP; what follows for now is a general description for reviewers.]
This commit adds a
SchemaChangeResumerfor the schema change job, which creates and runs aSchemaChangerto process outstanding schema changes for a given table ID and mutation ID (with the exception of dropping a database). A schema change job is now queued to run after a transaction that initiates a schema change before results are returned to the client. Schema changes are now managed like other jobs and can be paused, cancelled, etc.Specifically:
SchemaChangers are now created solely within jobs.SchemaChanger.exec()has been split into a main schema change phase (forResume) and a rollback phase (forOnFailOrCancel), and most of the code that updates job state (including making a new rollback job) has been removed from the schema changer.SchemaChangeManager, which used to handle async schema changes, and theSchemaChangers queued inextraTxnStateare all removed.TableDescriptor.Leaseis no longer used.ALTER TABLE) has been refactored to accommodate this. Creating jobs for dropping tables (including views, etc.) has been unified with creating jobs for other schema changes.Known issues (which I think we can defer to follow-up PRs):
I lowered the retry interval to 0.01 forUpdate 3/4/20: This is addressed by jobs: configure more frequent retries for polling in Registry.Run() #45677.SHOW JOBS WHEN COMPLETEto reduce the delay for short schema changes, to get the tests to run as quickly as possible on CI, but this is almost certainly too low. The tests time out when the interval is 0.25s (at some point they didn't, but I think the random job ordering, recently introduced, is also slowing things down). We need a way to retry more frequently in the beginning and increasing the backoff over time.