jobs: avoid crdb_internal.system_jobs in gc-jobs by stevendanna · Pull Request #108093 · cockroachdb/cockroach

stevendanna · 2023-08-03T10:29:19Z

The crdb_internal.system_jobs is a virtual table that joins information from the jobs table and the jobs_info table.

For the previous query,

SELECT id, payload, status FROM "".crdb_internal.system_jobs
WHERE (created < $1) AND (id > $2)
ORDER BY id
LIMIT $3

this is a little suboptimal because:

We don't make use of the progress column so any read of that is useless.
While the crdb_internal.virtual table has a virtual index on job id, and EXPLAIN will even claim that it will be used:
```
• limit
│ count: 100
│
└── • filter
    │ filter: created < '2023-07-20 07:29:01.17001'
    │
    └── • virtual table
          table: system_jobs@system_jobs_id_idx
          spans: [/101 - ]
```
This is actually a lie. A virtual index can only handle single-key spans. As a result the unconstrained query is used:

    WITH
        latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC),
        latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC)
    SELECT
       DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
                created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
    FROM system.jobs AS j
    INNER JOIN latestpayload AS payload ON j.id = payload.job_id
    LEFT JOIN latestprogress AS progress ON j.id = progress.job_id

which has a full scan of the jobs table and 2 full scans of the info table:

  • distinct
  │ distinct on: id, value, value
  │
  └── • merge join
      │ equality: (job_id) = (id)
      │
      ├── • render
      │   │
      │   └── • filter
      │       │ estimated row count: 7,318
      │       │ filter: info_key = 'legacy_payload'
      │       │
      │       └── • scan
      │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
      │             table: job_info@primary
      │             spans: FULL SCAN
      │
      └── • merge join (right outer)
          │ equality: (job_id) = (id)
          │ right cols are key
          │
          ├── • render
          │   │
          │   └── • filter
          │       │ estimated row count: 7,317
          │       │ filter: info_key = 'legacy_progress'
          │       │
          │       └── • scan
          │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
          │             table: job_info@primary
          │             spans: FULL SCAN
          │
          └── • scan
                missing stats
                table: jobs@primary
                spans: FULL SCAN

Because of the limit, I don't think this ends up being as bad as it looks. But it still isn't great.

In this PR, we replace crdb_internal.jobs with a query that removes the join on the unused progress field and also constrains the query of the job_info table.

  • distinct
  │ distinct on: id, value
  │
  └── • merge join
      │ equality: (job_id) = (id)
      │ right cols are key
      │
      ├── • render
      │   │
      │   └── • filter
      │       │ estimated row count: 7,318
      │       │ filter: info_key = 'legacy_payload'
      │       │
      │       └── • scan
      │             estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future)
      │             table: job_info@primary
      │             spans: [/101/'legacy_payload' - ]
      │
      └── • render
          │
          └── • limit
              │ count: 100
              │
              └── • filter
                  │ filter: created < '2023-07-20 07:29:01.17001'
                  │
                  └── • scan
                        missing stats
                        table: jobs@primary
                        spans: [/101 - ]

In a local example, this does seem faster:

> SELECT id, payload, status, created
> FROM "".crdb_internal.system_jobs
> WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100;

id | payload | status | created
-----+---------+--------+----------
(0 rows)

Time: 183ms total (execution 183ms / network 0ms)

> WITH
> latestpayload AS (
>     SELECT job_id, value
>     FROM system.job_info AS payload
>     WHERE job_id > 100 AND info_key = 'legacy_payload'
>     ORDER BY written desc
> ),
> jobpage AS (
>     SELECT id, status, created
>     FROM system.jobs
>     WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100)
>     ORDER BY id
>     LIMIT 100
> )
> SELECT distinct (id), latestpayload.value AS payload, status
> FROM jobpage AS j
> INNER JOIN latestpayload ON j.id = latestpayload.job_id;
  id | payload | status
-----+---------+---------
(0 rows)

Time: 43ms total (execution 42ms / network 0ms)

Release note: None

Epic: none

The crdb_internal.system_jobs is a virtual table that joins information from the jobs table and the jobs_info table. For the previous query, SELECT id, payload, status FROM "".crdb_internal.system_jobs WHERE (created < $1) AND (id > $2) ORDER BY id LIMIT $3 this is a little suboptimal because: - We don't make use of the progress column so any read of that is useless. - While the crdb_internal.virtual table has a virtual index on job id, and EXPLAIN will even claim that it will be used: • limit │ count: 100 │ └── • filter │ filter: created < '2023-07-20 07:29:01.17001' │ └── • virtual table table: system_jobs@system_jobs_id_idx spans: [/101 - ] This is actually a lie. A virtual index can only handle single-key spans. As a result the unconstrained query is used: WITH latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC), latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC) SELECT DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress, created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type FROM system.jobs AS j INNER JOIN latestpayload AS payload ON j.id = payload.job_id LEFT JOIN latestprogress AS progress ON j.id = progress.job_id which has a full scan of the jobs table and 2 full scans of the info table: • distinct │ distinct on: id, value, value │ └── • merge join │ equality: (job_id) = (id) │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,318 │ │ filter: info_key = 'legacy_payload' │ │ │ └── • scan │ estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: FULL SCAN │ └── • merge join (right outer) │ equality: (job_id) = (id) │ right cols are key │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,317 │ │ filter: info_key = 'legacy_progress' │ │ │ └── • scan │ estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: FULL SCAN │ └── • scan missing stats table: jobs@primary spans: FULL SCAN Because of the limit, I don't think this ends up being as bad as it looks. But it still isn't great. In this PR, we replace crdb_internal.jobs with a query that removes the join on the unused progress field and also constrains the query of the job_info table. • distinct │ distinct on: id, value │ └── • merge join │ equality: (job_id) = (id) │ right cols are key │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,318 │ │ filter: info_key = 'legacy_payload' │ │ │ └── • scan │ estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: [/101/'legacy_payload' - ] │ └── • render │ └── • limit │ count: 100 │ └── • filter │ filter: created < '2023-07-20 07:29:01.17001' │ └── • scan missing stats table: jobs@primary spans: [/101 - ] In a local example, this does seem faster: > SELECT id, payload, status, created > FROM "".crdb_internal.system_jobs > WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100; id | payload | status | created -----+---------+--------+---------- (0 rows) Time: 183ms total (execution 183ms / network 0ms) > WITH > latestpayload AS ( > SELECT job_id, value > FROM system.job_info AS payload > WHERE job_id > 100 AND info_key = 'legacy_payload' > ORDER BY written desc > ), > jobpage AS ( > SELECT id, status, created > FROM system.jobs > WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100) > ORDER BY id > LIMIT 100 > ) > SELECT distinct (id), latestpayload.value AS payload, status > FROM jobpage AS j > INNER JOIN latestpayload ON j.id = latestpayload.job_id; id | payload | status -----+---------+--------- (0 rows) Time: 43ms total (execution 42ms / network 0ms) Release note: None Epic: none

cockroach-teamcity · 2023-08-03T10:29:29Z

This change is

adityamaru

Nice!

adityamaru · 2023-08-03T13:41:32Z

pkg/jobs/registry.go

+
+const expiredJobsQueryWithJobInfoTable = `
+WITH
+latestpayload AS (


having to unmarshal the payload just to read FinishedMicros is 😢

miretskiy · 2023-08-03T14:46:12Z

Any plans on backporting this?

stevendanna · 2023-08-03T14:49:07Z

@miretskiy Yes, I think we should backport this.

stevendanna · 2023-08-08T20:11:06Z

bors r=adityamaru

craig · 2023-08-08T22:19:39Z

Build succeeded:

Bazel Essential CI (Cockroach)

stevendanna requested review from a team as code owners August 3, 2023 10:29

stevendanna requested review from msbutler and removed request for a team August 3, 2023 10:29

stevendanna requested review from adityamaru and dt and removed request for a team and msbutler August 3, 2023 10:29

adityamaru approved these changes Aug 3, 2023

View reviewed changes

stevendanna added the backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only label Aug 3, 2023

craig bot merged commit 0d110cd into cockroachdb:master Aug 8, 2023

blathers-crl bot mentioned this pull request Aug 8, 2023

release-23.1: jobs: avoid crdb_internal.system_jobs in gc-jobs #108390

Closed

stevendanna mentioned this pull request Apr 19, 2024

jobs: crdb_internal.jobs and crdb_internal.system_jobs perform poorly #122687

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobs: avoid crdb_internal.system_jobs in gc-jobs#108093

jobs: avoid crdb_internal.system_jobs in gc-jobs#108093
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:gc-jobs-query

stevendanna commented Aug 3, 2023

Uh oh!

cockroach-teamcity commented Aug 3, 2023

Uh oh!

adityamaru left a comment

Uh oh!

adityamaru Aug 3, 2023

Uh oh!

miretskiy commented Aug 3, 2023

Uh oh!

stevendanna commented Aug 3, 2023

Uh oh!

stevendanna commented Aug 8, 2023 •

edited

Loading

Uh oh!

craig bot commented Aug 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stevendanna commented Aug 3, 2023

Uh oh!

cockroach-teamcity commented Aug 3, 2023

Uh oh!

adityamaru left a comment

Choose a reason for hiding this comment

Uh oh!

adityamaru Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

miretskiy commented Aug 3, 2023

Uh oh!

stevendanna commented Aug 3, 2023

Uh oh!

stevendanna commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

craig bot commented Aug 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevendanna commented Aug 8, 2023 •

edited

Loading