Skip to content

sql, jobs: add schema change job resumer#45870

Merged
thoszhang merged 3 commits intocockroachdb:masterfrom
thoszhang:schema-changer-job
Mar 9, 2020
Merged

sql, jobs: add schema change job resumer#45870
thoszhang merged 3 commits intocockroachdb:masterfrom
thoszhang:schema-changer-job

Conversation

@thoszhang
Copy link
Copy Markdown

@thoszhang thoszhang commented Mar 9, 2020

This PR is the same as #44391, which was originally reverted. This new PR has a few updates:


jobs: allow jobs to create internal executors

This PR adds a SessionBoundInternalExecutorFactory to the job
registry, and exposes the ability to create session bound internal
executors in jobs via a method MakeSessionBoundInternalExecutor() on
Job.

This is intended for use by the schema change job resumer, which stores
state on the internal executors it creates so that it can override the
tables in the TableCollection for each internal validation query.

Release note: None


jobs: randomize job order in adoption loop

Previously, the job adoption loop would pick a job to adopt by
attempting to adopt jobs in decreasing order by created timestamp. On a
single-node cluster, this could cause a deadlock where a schema change
job is never adopted because it's stuck behind a newer schema change job
with a higher mutation ID, which itself is stuck in a retry loop due to
not being first in line in the mutations list.

This PR randomizes the order in which jobs are up for attempted
adoption when triggered by the timer. In the case where we attempt to
adopt a job upon being notified of a queued job, we continue to try to
adopt the newest job.

Release note (general change): Non-running jobs are now considered for
adoption in randomized order instead of in a determistic order by
creation time, to avoid potential deadlocks when schema change jobs need
to execute in a specific order. This is a preemptive change, not a bug
fix, but it affects all jobs.


sql, jobs: add schema change job resumer

This commit adds a SchemaChangeResumer for the schema change job,
which creates and runs a SchemaChanger to process outstanding schema
changes for a given table ID and mutation ID (with the exception of
dropping a database). A schema change job is now queued to run after a
transaction that initiates a schema change before results are returned
to the client. Schema changes are now managed like other jobs and can be
paused, cancelled, etc.

Specifically:

  • SchemaChangers are now created solely within jobs.
    SchemaChanger.exec() has been split into a main schema change phase
    (for Resume) and a rollback phase (for OnFailOrCancel), and most
    of the code that updates job state (including making a new rollback
    job) has been removed from the schema changer.
  • The SchemaChangeManager, which used to handle async schema changes,
    and the SchemaChangers queued in extraTxnState are all removed.
    TableDescriptor.Lease is no longer used.
  • We now queue a job for every schema change, even ones with no
    mutation, for every table affected (including, e.g., referenced tables
    for foreign keys), so we can wait for old leases to expire. The API
    for writing an updated table descriptor (used in, e.g., the execution
    of ALTER TABLE) has been refactored to accommodate this. Creating
    jobs for dropping tables (including views, etc.) has been unified with
    creating jobs for other schema changes.
  • The schema change tests have been refactored and (partially or
    entirely) skipped or removed in a lot of cases. Testing knobs only
    relevant to the old way of running schema changes have been removed.
    Tests are skipped for now either because this PR doesn't include any
    way to GC tables/indexes (see below) or because they were too complex
    and tightly coupled to the previous implementation and there seemed to
    be diminishing returns in getting them to work right now.

There's some follow-up work left for PRs to follow shortly. The most
significant pre-existing functionality not implemented in this PR is GC
for tables and indexes (generally using ClearRange) after
gc.ttlseconds have elapsed. There are some tests commented out or
skipped, some of which are due to the lack of GC functionality and some
of which are due to tests being complex and tightly coupled to the
previous schema change implementation that need to be reconsidered.

Release note (general change): Schema changes are now scheduled and run
fully like other jobs and can take advantage of recent improvements in
stability in the job framework, so they now can be canceled, paused,
resumed after a crash or after being paused, and so on. Some other UI
differences come with this implementation change; notably, all schema
changes now have an associated job, failed schema changes are now rolled
back within the Reverting phase of the same job, and GC for dropped
indexes and tables is deferred to a later job.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@thoszhang thoszhang requested a review from dt March 9, 2020 05:22
@thoszhang thoszhang force-pushed the schema-changer-job branch 2 times, most recently from f43b118 to 74b39b9 Compare March 9, 2020 17:30
This PR adds a `SessionBoundInternalExecutorFactory` to the job
registry, and exposes the ability to create session bound internal
executors in jobs via a method `MakeSessionBoundInternalExecutor()` on
`Job`.

This is intended for use by the schema change job resumer, which stores
state on the internal executors it creates so that it can override the
tables in the TableCollection for each internal validation query.

Release note: None
Previously, the job adoption loop would pick a job to adopt by
attempting to adopt jobs in decreasing order by created timestamp. On a
single-node cluster, this could cause a deadlock where a schema change
job is never adopted because it's stuck behind a newer schema change job
with a higher mutation ID, which itself is stuck in a retry loop due to
not being first in line in the mutations list.

This PR randomizes the order in which jobs are up for attempted
adoption when triggered by the timer. In the case where we attempt to
adopt a job upon being notified of a queued job, we continue to try to
adopt the newest job.

Release note (general change): Non-running jobs are now considered for
adoption in randomized order instead of in a determistic order by
creation time, to avoid potential deadlocks when schema change jobs need
to execute in a specific order. This is a preemptive change, not a bug
fix, but it affects all jobs.
@thoszhang thoszhang force-pushed the schema-changer-job branch from 74b39b9 to bc12f50 Compare March 9, 2020 18:32
@thoszhang
Copy link
Copy Markdown
Author

The comment table migration ended up causing test failures since we're now creating jobs for the REVOKE statements, and sometimes tests assume a certain number of jobs in the job table. Those tests have gotten some quick fixes which should probably be cleaned up later. I also rebased on top of #45873.

The most recent failure is because TestLogic/fakedist-disk/aggregate/string_agg is flaky (see #45905). Trying again....

This commit adds a `SchemaChangeResumer` for the schema change job,
which creates and runs a `SchemaChanger` to process outstanding schema
changes for a given table ID and mutation ID (with the exception of
dropping a database). A schema change job is now queued to run after a
transaction that initiates a schema change before results are returned
to the client. Schema changes are now managed like other jobs and can be
paused, cancelled, etc.

Specifically:

- `SchemaChanger`s are now created solely within jobs.
  `SchemaChanger.exec()` has been split into a main schema change phase
  (for `Resume`) and a rollback phase (for `OnFailOrCancel`), and most
  of the code that updates job state (including making a new rollback
  job) has been removed from the schema changer.
- The `SchemaChangeManager`, which used to handle async schema changes,
  and the `SchemaChanger`s queued in `extraTxnState` are all removed.
  `TableDescriptor.Lease` is no longer used.
- We now queue a job for every schema change, even ones with no
  mutation, for every table affected (including, e.g., referenced tables
  for foreign keys), so we can wait for old leases to expire. The API
  for writing an updated table descriptor (used in, e.g., the execution
  of `ALTER TABLE`) has been refactored to accommodate this. Creating
  jobs for dropping tables (including views, etc.) has been unified with
  creating jobs for other schema changes.
- The schema change tests have been refactored and (partially or
  entirely) skipped or removed in a lot of cases. Testing knobs only
  relevant to the old way of running schema changes have been removed.
  Tests are skipped for now either because this PR doesn't include any
  way to GC tables/indexes (see below) or because they were too complex
  and tightly coupled to the previous implementation and there seemed to
  be diminishing returns in getting them to work right now.

There's some follow-up work left for PRs to follow shortly. The most
significant pre-existing functionality not implemented in this PR is GC
for tables and indexes (generally using `ClearRange`) after
`gc.ttlseconds` have elapsed. There are some tests commented out or
skipped, some of which are due to the lack of GC functionality and some
of which are due to tests being complex and tightly coupled to the
previous schema change implementation that need to be reconsidered.

Release note (general change): Schema changes are now scheduled and run
fully like other jobs and can take advantage of recent improvements in
stability in the job framework, so they now can be canceled, paused,
resumed after a crash or after being paused, and so on. Some other UI
differences come with this implementation change; notably, all schema
changes now have an associated job, failed schema changes are now rolled
back within the Reverting phase of the same job, and GC for dropped
indexes and tables is deferred to a later job.
@thoszhang thoszhang force-pushed the schema-changer-job branch from bc12f50 to e340ca9 Compare March 9, 2020 20:01
@thoszhang
Copy link
Copy Markdown
Author

Now I'm getting No space left on device :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants