Skip to content

Fix incorrect results with NOT EXISTS and IS DISTINCT FROM operator (#19680)#19811

Merged
Mytherin merged 2 commits intoduckdb:mainfrom
henry8128:fix_16803
Nov 20, 2025
Merged

Fix incorrect results with NOT EXISTS and IS DISTINCT FROM operator (#19680)#19811
Mytherin merged 2 commits intoduckdb:mainfrom
henry8128:fix_16803

Conversation

@henry8128
Copy link
Contributor

Fixes #19680

This fixes a bug where queries using NOT EXISTS with IS DISTINCT FROM returned incorrect results due to improper handling of NULL semantics in the optimizer.

The issue was that the optimizer's deliminator incorrectly treated DISTINCT FROM variants the same as regular equality/inequality comparisons, which have different NULL handling:

  • IS DISTINCT FROM: NULL-aware (NULL IS DISTINCT FROM NULL = FALSE)
  • != or =: NULL-unaware (NULL != NULL = NULL, filters out NULLs)

Incorrect Query Plan

┌───────────────────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             c2            │
│                           │
│          ~0 rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             #5            │
│__internal_decompress_integ│
│     ral_integer(#3, 1)    │
│             #1            │
│                           │
│          ~0 rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      NESTED_LOOP_JOIN     │
│    ────────────────────   │
│      Join Type: ANTI      │
│    Conditions: c2 != c2   ├──────────────┐
│                           │              │
│          ~0 rows          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   │
│            NULL           ││            NULL           │
│             #2            ││             #2            │
│            NULL           ││            NULL           │
│             #1            ││             #1            │
│            NULL           ││            NULL           │
│             #0            ││             #0            │
│            NULL           ││            NULL           │
│                           ││                           │
│          ~2 rows          ││           ~1 row          │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   │
│             #0            ││             #0            │
│__internal_compress_integra││__internal_compress_integra│
│     l_utinyint(#1, 1)     ││     l_utinyint(#1, 1)     │
│             #2            ││             #2            │
│                           ││                           │
│          ~2 rows          ││           ~1 row          │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   │
│            NULL           ││            NULL           │
│             #0            ││             #0            │
│            NULL           ││            NULL           │
│                           ││                           │
│          ~2 rows          ││           ~1 row          │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││           FILTER          │
│    ────────────────────   ││    ────────────────────   │
│         Table: t0         ││     (col0 IS NOT NULL)    │
│   Type: Sequential Scan   ││                           │
│      Projections: c2      ││                           │
│                           ││                           │
│          ~2 rows          ││           ~1 row          │
└───────────────────────────┘└─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │         SEQ_SCAN          │
                             │    ────────────────────   │
                             │         Table: t0         │
                             │   Type: Sequential Scan   │
                             │      Projections: c2      │
                             │                           │
                             │          ~2 rows          │
                             └───────────────────────────┘

The buggy plan shows two critical issues:

  ┌─────────────┴─────────────┐
  │      NESTED_LOOP_JOIN     │
  │      Join Type: ANTI      │
  │    Conditions: c2 != c2   │  ← ❌ Wrong(the join conditions should be c2 IS DISTINCT FROM c2)
  │          ~0 rows          │
  └─────────────┬─────────────┘
                │
                └─────────────┐
                             ┌┴─────────────┐
                             │   FILTER     │
                             │ (col0 IS NOT │  ← ❌ Wrong(the filter should be removed)
                             │    NULL)     │
                             └──────────────┘

Solution

This PR adds proper support for DISTINCT FROM operators throughout the optimization pipeline:

  1. Preserve DISTINCT FROM semantics in join conversion.(src/optimizer/deliminator.cpp)
// NOTE: We should NOT convert DISTINCT FROM to != in general
// Only convert if the ORIGINAL join had != or = (not DISTINCT FROM variants)
if (delim_join.join_type != JoinType::MARK &&
    original_join_comparison != ExpressionType::COMPARE_DISTINCT_FROM &&
    original_join_comparison != ExpressionType::COMPARE_NOT_DISTINCT_FROM) {
    // Safe to convert
}
  1. Skip NULL filters for DISTINCT FROM variants.(src/optimizer/deliminator.cpp)
// Only add IS NOT NULL filter for regular equality/inequality comparisons
// Do NOT add for DISTINCT FROM variants, as they handle NULL correctly
if (cond.comparison != ExpressionType::COMPARE_NOT_DISTINCT_FROM &&
    cond.comparison != ExpressionType::COMPARE_DISTINCT_FROM) {
    // Add IS NOT NULL filter
}
  1. Added negation support for COMPARE_DISTINCT_FROM and COMPARE_NOT_DISTINCT_FROM
    in expression type handling.(src/common/enums/expression_type.cpp)
  2. Updated parser to properly negate IS DISTINCT FROM expressions when wrapped with NOT. (src/parser/transform/expression/transform_bool_expr.cpp)
  3. Added regression test in test/sql/subquery/exists/test_correlated_exists_with_derived_table.test

…uckdb#19680)

This fixes a bug where queries using NOT EXISTS with IS DISTINCT FROM
returned incorrect results due to improper handling of NULL semantics in
the optimizer.

The issue was that the optimizer's deliminator incorrectly treated
DISTINCT FROM variants the same as regular equality/inequality comparisons,
which have different NULL handling:
- IS DISTINCT FROM: NULL-aware (NULL IS DISTINCT FROM NULL = FALSE)
- != or =: NULL-unaware (NULL != NULL = NULL, filters out NULLs)

Changes:
- Added negation support for COMPARE_DISTINCT_FROM and COMPARE_NOT_DISTINCT_FROM
in expression type handling
- Updated parser to properly negate IS DISTINCT FROM expressions when wrapped with NOT
- Fixed deliminator optimizer to preserve DISTINCT FROM semantics and avoid
incorrectly converting to != or adding unnecessary IS NOT NULL filters
- Added regression test for correlated EXISTS with derived tables
Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Was a tricky one to understand, but I finally understood it.
Just one small nit on the comments, otherwise LGTM.

// DELIM JOIN need to do that for not DELIM_GET side. Easiest way is to change the
// comparison expression type. See duckdb/duckdb#16803
if (delim_join.join_type != JoinType::MARK) {
// NOTE: We should NOT convert DISTINCT FROM to != in general, as they have different NULL semantics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: While this is true, I don't think these comments provide much value here, as there is a specific case where we do want to change it. See #16910 which has almost the same query, expect the original join is a != and not a compare distinct from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point!

// NOTE: We should NOT convert DISTINCT FROM to != in general, as they have different NULL semantics
// - DISTINCT FROM: NULL IS DISTINCT FROM NULL = FALSE (NULL-aware)
// - !=: NULL != NULL = NULL (filters out NULL)
// Only convert if the ORIGINAL join had != or = (not DISTINCT FROM variants)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 nice, this is a case I missed.

@henry8128
Copy link
Contributor Author

@Tmonster Thanks a lot for your review. I have updated the PR as you suggested.

Best, Henry

@duckdb-draftbot duckdb-draftbot marked this pull request as draft November 18, 2025 15:02
Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you again!

@hannes hannes marked this pull request as ready for review November 19, 2025 09:27
@Mytherin Mytherin merged commit b9aed20 into duckdb:main Nov 20, 2025
104 checks passed
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 31, 2025
Fix incorrect results with NOT EXISTS and IS DISTINCT FROM operator (duckdb/duckdb#19680) (duckdb/duckdb#19811)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 31, 2025
Fix incorrect results with NOT EXISTS and IS DISTINCT FROM operator (duckdb/duckdb#19680) (duckdb/duckdb#19811)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected Execution Result

4 participants