-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql: stats collection doesn't get re-tried after a server shutdown #100482
Description
Discovered by @rytaft
Describe the problem
The automatic stats collection (used by SQL query planning) is triggered lazily when tables encounter mutations.
This lazy mechanism is based on an in-memory FIFO (a Go channel).
If a server gets shut down after a SQL mutation completes but before the mutation was popped from this FIFO (and thus before the job to compute the stats has been created), there will never be any re-computation.
This can result in bad query plans for tables that had a bunch of changes just before a node shuts down and then never change again.
To Reproduce
Issue a large number of mutations to a table and issue a server shutdown in the middle of these mutations.
Then restart the server and do not modify the table again. Notice that the stats remain out of date forever.
Expected behavior
- Either persist the in-flight FIFO queue of mutated tables so they get a chance to see their jobs created when the server starts again.
- Wait during graceful shutdown until all the jobs are created, but without running them. (This could delay the drain in an unacceptable way if there were many tables. Testing should evaluate this overhead)
Environment:
crdb v23.1 and master branch
Jira issue: CRDB-26459
Metadata
Metadata
Assignees
Labels
Type
Projects
Status