sql: schema changes can be very slow

**Describe the problem**

We've had a number of reports of schema changes being slow lately (##38111, #47607,  #47512). This is a meta-issue for a problem which is theorized here and to track reproduction and evaluation. It also includes proposed steps. 

***Theory***

We’ve exposed the job adoption cycle to some schema changes even if they’d happen super fast.

 * Run just sends on the channel N times for N jobs.
 * It might pick up jobs other than the ones we wanted to pick
 * Especially bad is we might pick up GC jobs which jobs now create from other jobs.
 * Also might pick up queued schema changes which will fail immediately due to unmet dependencies.

***Alternative Consideration***

We scan the entire jobs table in the job adoption loop, as this table gets big, that's probably slow.

***Proposals***

* [x] 1. Understand the exact implications with regards to user-visible latency for schema changes

* We should write some tests which are hella slow and fix that
    * N goroutines, each M time creating a table, doing some queued schema changes, drop the table
    * Also randomized syntax tests seem slow

* [ ] 2. Prioritize the jobs we adopt in the adoption loop
 
* We really want to adopt just about anything over top of GC Jobs
* We really don't want to adopt jobs that have unmet mutation dependencies
* We could create a simple ranking mechanism where we prioritize
    1) Type - Do GC jobs last maybe
    2) Key - choose uniformly randomly over the jobs with the same key
    3) Value - choose in value order for jobs with the same key

I'm not exactly sold on the type prioritization but at least it's easy.
For the others, we could inject a function per type that controls its ranking.

It doesn't need to be very tightly coupled.

*  [ ] 3. Start jobs immediately after finishing a job.

* Right now we wait for the adoption loop before picking up another job. If you have a number of queued mutations, it may take a long time for the subsequent jobs to get picked up.
* Really easy

* [ ] 4. Stop creating jobs that just wait.
* Right now we create jobs that make no sense as jobs.
* There is no value if another node adopts the job except for coordinating the waiting.
* These excess jobs probably exacerbate the above problems.

* [ ] 5. Create an index on the status column of jobs.
* Junk piles up and then the query gets very slow.
* Adding an index should be an easy migration.

**Environment:**
 - Applies especially to CockroachDB 20.1



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: schema changes can be very slow #47790

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sql: schema changes can be very slow #47790

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions