Skip to content

sql: stats collection doesn't get re-tried after a server shutdown #100482

@knz

Description

@knz

Discovered by @rytaft

Describe the problem

The automatic stats collection (used by SQL query planning) is triggered lazily when tables encounter mutations.

This lazy mechanism is based on an in-memory FIFO (a Go channel).

If a server gets shut down after a SQL mutation completes but before the mutation was popped from this FIFO (and thus before the job to compute the stats has been created), there will never be any re-computation.

This can result in bad query plans for tables that had a bunch of changes just before a node shuts down and then never change again.

To Reproduce

Issue a large number of mutations to a table and issue a server shutdown in the middle of these mutations.
Then restart the server and do not modify the table again. Notice that the stats remain out of date forever.

Expected behavior

  • Either persist the in-flight FIFO queue of mutated tables so they get a chance to see their jobs created when the server starts again.
  • Wait during graceful shutdown until all the jobs are created, but without running them. (This could delay the drain in an unacceptable way if there were many tables. Testing should evaluate this overhead)

Environment:

crdb v23.1 and master branch

Jira issue: CRDB-26459

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-sql-table-statsTable statistics (and their automatic refresh).C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-sql-queriesSQL Queries Team

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions