Skip to content

jobs: avoid crdb_internal.system_jobs in gc-jobs#108093

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:gc-jobs-query
Aug 8, 2023
Merged

jobs: avoid crdb_internal.system_jobs in gc-jobs#108093
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:gc-jobs-query

Conversation

@stevendanna
Copy link
Copy Markdown
Collaborator

The crdb_internal.system_jobs is a virtual table that joins information from the jobs table and the jobs_info table.

For the previous query,

SELECT id, payload, status FROM "".crdb_internal.system_jobs
WHERE (created < $1) AND (id > $2)
ORDER BY id
LIMIT $3

this is a little suboptimal because:

  • We don't make use of the progress column so any read of that is useless.

  • While the crdb_internal.virtual table has a virtual index on job id, and EXPLAIN will even claim that it will be used:

    • limit
    │ count: 100
    │
    └── • filter
        │ filter: created < '2023-07-20 07:29:01.17001'
        │
        └── • virtual table
              table: system_jobs@system_jobs_id_idx
              spans: [/101 - ]
    

    This is actually a lie. A virtual index can only handle single-key spans. As a result the unconstrained query is used:

    WITH
        latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC),
        latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC)
    SELECT
       DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
                created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
    FROM system.jobs AS j
    INNER JOIN latestpayload AS payload ON j.id = payload.job_id
    LEFT JOIN latestprogress AS progress ON j.id = progress.job_id

which has a full scan of the jobs table and 2 full scans of the info table:

  • distinct
  │ distinct on: id, value, value
  │
  └── • merge join
      │ equality: (job_id) = (id)
      │
      ├── • render
      │   │
      │   └── • filter
      │       │ estimated row count: 7,318
      │       │ filter: info_key = 'legacy_payload'
      │       │
      │       └── • scan
      │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
      │             table: job_info@primary
      │             spans: FULL SCAN
      │
      └── • merge join (right outer)
          │ equality: (job_id) = (id)
          │ right cols are key
          │
          ├── • render
          │   │
          │   └── • filter
          │       │ estimated row count: 7,317
          │       │ filter: info_key = 'legacy_progress'
          │       │
          │       └── • scan
          │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
          │             table: job_info@primary
          │             spans: FULL SCAN
          │
          └── • scan
                missing stats
                table: jobs@primary
                spans: FULL SCAN

Because of the limit, I don't think this ends up being as bad as it looks. But it still isn't great.

In this PR, we replace crdb_internal.jobs with a query that removes the join on the unused progress field and also constrains the query of the job_info table.

  • distinct
  │ distinct on: id, value
  │
  └── • merge join
      │ equality: (job_id) = (id)
      │ right cols are key
      │
      ├── • render
      │   │
      │   └── • filter
      │       │ estimated row count: 7,318
      │       │ filter: info_key = 'legacy_payload'
      │       │
      │       └── • scan
      │             estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future)
      │             table: job_info@primary
      │             spans: [/101/'legacy_payload' - ]
      │
      └── • render
          │
          └── • limit
              │ count: 100
              │
              └── • filter
                  │ filter: created < '2023-07-20 07:29:01.17001'
                  │
                  └── • scan
                        missing stats
                        table: jobs@primary
                        spans: [/101 - ]

In a local example, this does seem faster:

> SELECT id, payload, status, created
> FROM "".crdb_internal.system_jobs
> WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100;

id | payload | status | created
-----+---------+--------+----------
(0 rows)

Time: 183ms total (execution 183ms / network 0ms)

> WITH
> latestpayload AS (
>     SELECT job_id, value
>     FROM system.job_info AS payload
>     WHERE job_id > 100 AND info_key = 'legacy_payload'
>     ORDER BY written desc
> ),
> jobpage AS (
>     SELECT id, status, created
>     FROM system.jobs
>     WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100)
>     ORDER BY id
>     LIMIT 100
> )
> SELECT distinct (id), latestpayload.value AS payload, status
> FROM jobpage AS j
> INNER JOIN latestpayload ON j.id = latestpayload.job_id;
  id | payload | status
-----+---------+---------
(0 rows)

Time: 43ms total (execution 42ms / network 0ms)

Release note: None

Epic: none

The crdb_internal.system_jobs is a virtual table that joins
information from the jobs table and the jobs_info table.

For the previous query,

    SELECT id, payload, status FROM "".crdb_internal.system_jobs
    WHERE (created < $1) AND (id > $2)
    ORDER BY id
    LIMIT $3

this is a little suboptimal because:

- We don't make use of the progress column so any read of that is
  useless.

- While the crdb_internal.virtual table has a virtual index on job id,
  and EXPLAIN will even claim that it will be used:

      • limit
      │ count: 100
      │
      └── • filter
          │ filter: created < '2023-07-20 07:29:01.17001'
          │
          └── • virtual table
                table: system_jobs@system_jobs_id_idx
                spans: [/101 - ]

  This is actually a lie. A virtual index can only handle single-key
  spans. As a result the unconstrained query is used:

    WITH
        latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC),
        latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC)
    SELECT
       DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
                created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
    FROM system.jobs AS j
    INNER JOIN latestpayload AS payload ON j.id = payload.job_id
    LEFT JOIN latestprogress AS progress ON j.id = progress.job_id

  which has a full scan of the jobs table and 2 full scans of the info
  table:

      • distinct
      │ distinct on: id, value, value
      │
      └── • merge join
          │ equality: (job_id) = (id)
          │
          ├── • render
          │   │
          │   └── • filter
          │       │ estimated row count: 7,318
          │       │ filter: info_key = 'legacy_payload'
          │       │
          │       └── • scan
          │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
          │             table: job_info@primary
          │             spans: FULL SCAN
          │
          └── • merge join (right outer)
              │ equality: (job_id) = (id)
              │ right cols are key
              │
              ├── • render
              │   │
              │   └── • filter
              │       │ estimated row count: 7,317
              │       │ filter: info_key = 'legacy_progress'
              │       │
              │       └── • scan
              │             estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future)
              │             table: job_info@primary
              │             spans: FULL SCAN
              │
              └── • scan
                    missing stats
                    table: jobs@primary
                    spans: FULL SCAN

  Because of the limit, I don't think this ends up being as bad as it
  looks. But it still isn't great.

In this PR, we replace crdb_internal.jobs with a query that removes
the join on the unused progress field and also constrains the query of
the job_info table.

      • distinct
      │ distinct on: id, value
      │
      └── • merge join
          │ equality: (job_id) = (id)
          │ right cols are key
          │
          ├── • render
          │   │
          │   └── • filter
          │       │ estimated row count: 7,318
          │       │ filter: info_key = 'legacy_payload'
          │       │
          │       └── • scan
          │             estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future)
          │             table: job_info@primary
          │             spans: [/101/'legacy_payload' - ]
          │
          └── • render
              │
              └── • limit
                  │ count: 100
                  │
                  └── • filter
                      │ filter: created < '2023-07-20 07:29:01.17001'
                      │
                      └── • scan
                            missing stats
                            table: jobs@primary
                            spans: [/101 - ]

In a local example, this does seem faster:

    > SELECT id, payload, status, created
    > FROM "".crdb_internal.system_jobs
    > WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100;

    id | payload | status | created
    -----+---------+--------+----------
    (0 rows)

    Time: 183ms total (execution 183ms / network 0ms)

    > WITH
    > latestpayload AS (
    >     SELECT job_id, value
    >     FROM system.job_info AS payload
    >     WHERE job_id > 100 AND info_key = 'legacy_payload'
    >     ORDER BY written desc
    > ),
    > jobpage AS (
    >     SELECT id, status, created
    >     FROM system.jobs
    >     WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100)
    >     ORDER BY id
    >     LIMIT 100
    > )
    > SELECT distinct (id), latestpayload.value AS payload, status
    > FROM jobpage AS j
    > INNER JOIN latestpayload ON j.id = latestpayload.job_id;
      id | payload | status
    -----+---------+---------
    (0 rows)

    Time: 43ms total (execution 42ms / network 0ms)

Release note: None

Epic: none
@stevendanna stevendanna requested review from a team as code owners August 3, 2023 10:29
@stevendanna stevendanna requested review from msbutler and removed request for a team August 3, 2023 10:29
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@stevendanna stevendanna requested review from adityamaru and dt and removed request for a team and msbutler August 3, 2023 10:29
Copy link
Copy Markdown
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


const expiredJobsQueryWithJobInfoTable = `
WITH
latestpayload AS (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having to unmarshal the payload just to read FinishedMicros is 😢

@miretskiy
Copy link
Copy Markdown
Contributor

Any plans on backporting this?

@stevendanna
Copy link
Copy Markdown
Collaborator Author

@miretskiy Yes, I think we should backport this.

@stevendanna stevendanna added the backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only label Aug 3, 2023
@stevendanna
Copy link
Copy Markdown
Collaborator Author

stevendanna commented Aug 8, 2023

bors r=adityamaru

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 8, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants