Skip to content

opt: add rule to decorrelate EXISTS (SELECT ... FOR UPDATE) subquery into semi join under read committed #114712

@michae2

Description

@michae2

When using read committed isolation (or when optimizer_use_lock_op_for_serializable is enabled) we use the new Lock operator implementation of SELECT FOR UPDATE. This Lock operator acts as an optimization barrier, which we need to prevent the locking side effect from being optimized away.

But there are some basic optimizations that we should bring back for the Lock operator. One such optimization is decorrelation of a correlated EXISTS subquery into a semi join. Consider the following query:

CREATE TABLE ab (a INT PRIMARY KEY, b INT);
CREATE TABLE c (c INT PRIMARY KEY);
SELECT * FROM ab WHERE a = 1 AND EXISTS (SELECT * FROM c WHERE c = b);

For a normal SELECT this query becomes a simple semi-join:

demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM ab WHERE a = 1 AND EXISTS (SELECT * FROM c WHERE c = b);
           info
---------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: c@c_pkey
  │ equality: (b) = (c)
  │ equality cols are key
  │
  └── • scan
        missing stats
        table: ab@ab_pkey
        spans: [/1 - /1]
(12 rows)

Time: 2ms total (execution 2ms / network 0ms)

And same for the old implementation of SELECT FOR UPDATE, but for the new implementation it stays an apply-join:

demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM ab WHERE a = 1 AND EXISTS (SELECT * FROM c WHERE c = b FOR UPDATE);
               info
----------------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: c@c_pkey
  │ equality: (b) = (c)
  │ equality cols are key
  │ locking strength: for update
  │
  └── • scan
        missing stats
        table: ab@ab_pkey
        spans: [/1 - /1]
(13 rows)

Time: 2ms total (execution 2ms / network 0ms)

demo@127.0.0.1:26257/demoapp/defaultdb> SET optimizer_use_lock_op_for_serializable = true;
SET

Time: 1ms total (execution 0ms / network 0ms)

demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM ab WHERE a = 1 AND EXISTS (SELECT * FROM c WHERE c = b FOR UPDATE);
           info
---------------------------
  distribution: local
  vectorized: true

  • apply join (semi)
  │
  └── • scan
        missing stats
        table: ab@ab_pkey
        spans: [/1 - /1]
(9 rows)

Time: 1ms total (execution 1ms / network 0ms)

Jira issue: CRDB-33647
Epic CRDB-38938

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-read-committedRelated to the introduction of Read CommittedA-sql-optimizerSQL logical planning and optimizations.C-performancePerf of queries or internals. Solution not expected to change functional behavior.T-sql-queriesSQL Queries Team

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions