Skip to content

sql: automatically cancel stats jobs that are running for too long #118584

@rytaft

Description

@rytaft

Is your feature request related to a problem? Please describe.
We have seen cases where stats jobs get stuck for various reasons (e.g., a bug in the jobs system). Since we only allow one stats job at a time in the entire cluster, this can block stats collection for all tables in the cluster until the job is cancelled.

Describe the solution you'd like
We should have a way to detect when a stats job has been running for too long, and automatically cancel it. For example, if a stats job has been running for longer than one day (this period should be configurable), we might want to cancel the job.

Even if we eventually allow multiple stats jobs to run at the same time, we should probably still cancel long-running stats jobs. There is no downside to canceling (other than some wasted work), and it will ensure that no table goes too long without a stats refresh.

Jira issue: CRDB-35799

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-sql-table-statsTable statistics (and their automatic refresh).C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)O-supportWould prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docsP-3Issues/test failures with no fix SLAT-sql-queriesSQL Queries Team

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions