Skip to content

fix(opencypher): MATCH WHERE ID(n) = <expr> falls back to full scan when expr is dynamic#3865

Merged
lvca merged 2 commits intoArcadeData:mainfrom
ExtReMLapin:opt_match_expr
Apr 15, 2026
Merged

fix(opencypher): MATCH WHERE ID(n) = <expr> falls back to full scan when expr is dynamic#3865
lvca merged 2 commits intoArcadeData:mainfrom
ExtReMLapin:opt_match_expr

Conversation

@ExtReMLapin
Copy link
Copy Markdown
Contributor

Fixes #3864

Long story short, in

UNWIND $batch AS BatchEntry
MATCH (b:CHUNK) WHERE ID(b) = BatchEntry.destRID
CREATE (p:CHUNK_EMBEDDING {vector: BatchEntry.vector})
CREATE (p)-[:embb]->(b)  

MATCH (b:CHUNK) WHERE ID(b) = BatchEntry.destRID is doing a full scan on CHUNK

@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

Gemini wrote that, claude checked it

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements dynamic ID filtering within MatchNodeStep to optimize OpenCypher queries where the node ID is specified in a WHERE clause. It introduces logic to extract ID filters from boolean expressions and refactors the ExpressionEvaluator into a class member for reuse. Review feedback highlights a potential NullPointerException when a vertex is not found by its RID and suggests optimizing the AST traversal by pre-analyzing the filter expression.

Comment on lines +309 to 312
final RID rid = new RID(context.getDatabase(), effectiveIdFilter);
final Identifiable vertex = context.getDatabase().lookupByRID(rid, true);
// Return single-element iterator for the matched vertex
return List.of(vertex).iterator();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The List.of(vertex) call will throw a NullPointerException if the RID is valid but the vertex does not exist in the database (i.e., lookupByRID returns null). Since List.of does not allow null elements, you should check for null and return an empty iterator instead.

        final RID rid = new RID(context.getDatabase(), effectiveIdFilter);
        final Identifiable vertex = context.getDatabase().lookupByRID(rid, true);
        if (vertex == null)
          return Collections.emptyIterator();
        // Return single-element iterator for the matched vertex
        return List.of(vertex).iterator();

// Check for dynamic ID filter from WHERE clause if static idFilter is not present
String effectiveIdFilter = this.idFilter;
if ((effectiveIdFilter == null || effectiveIdFilter.isEmpty()) && whereFilter != null) {
effectiveIdFilter = extractDynamicIdFilter(whereFilter, currentInputResult);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The extractDynamicIdFilter method is called for every input row and performs a recursive traversal of the whereFilter AST. Since the structure of the whereFilter is constant for the duration of this execution step, this traversal is redundant.

It would be more efficient to pre-analyze the whereFilter once (e.g., in the constructor) to identify the Expression that provides the ID value, and then simply evaluate that expression here. Additionally, consider supporting elementId() in addition to id() for broader Cypher compatibility.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 15, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

According to claude (but I don't have any tokens left) There might be other cases where it's not fixed, for example

UNWIND $batch AS BatchEntry
MATCH (b:CHUNK {someKey: BatchEntry.value})  -- inline props, not WHERE clause
  1. MergeStep.findNode() / findAllNodes() - always full scan
    MERGE (n:CHUNK {name: $value})

@lvca lvca self-requested a review April 15, 2026 17:13
@lvca lvca added this to the 26.4.1 milestone Apr 15, 2026
@lvca lvca merged commit eb0faca into ArcadeData:main Apr 15, 2026
13 of 16 checks passed
@lvca
Copy link
Copy Markdown
Member

lvca commented Apr 15, 2026

It actually makes sense! Merged, thanks!! I'm going to write some test cases to avoid future regressions.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 38.82353% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.94%. Comparing base (973cb52) to head (801337d).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...edb/query/opencypher/executor/steps/MergeStep.java 16.66% 32 Missing and 8 partials ⚠️
...query/opencypher/executor/steps/MatchNodeStep.java 67.56% 5 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3865      +/-   ##
==========================================
+ Coverage   64.66%   64.94%   +0.28%     
==========================================
  Files        1579     1579              
  Lines      116503   116618     +115     
  Branches    24707    24749      +42     
==========================================
+ Hits        75335    75742     +407     
+ Misses      30871    30504     -367     
- Partials    10297    10372      +75     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mergify Bot added a commit that referenced this pull request May 3, 2026
Bumps the github-actions group with 2 updates: [anthropics/claude-code-action](https://github.com/anthropics/claude-code-action) and [github/codeql-action](https://github.com/github/codeql-action).
Updates `anthropics/claude-code-action` from 1.0.107 to 1.0.111
Release notes

*Sourced from [anthropics/claude-code-action's releases](https://github.com/anthropics/claude-code-action/releases).*

> v1.0.111
> --------
>
> **Full Changelog**: <anthropics/claude-code-action@v1...v1.0.111>
>
> v1.0.110
> --------
>
> **Full Changelog**: <anthropics/claude-code-action@v1...v1.0.110>
>
> v1.0.109
> --------
>
> What's Changed
> --------------
>
> * docs: pull\_request\_target guidance and base-action trust model by [`@​OctavianGuzu`](https://github.com/OctavianGuzu) in [anthropics/claude-code-action#1250](https://redirect.github.com/anthropics/claude-code-action/pull/1250)
>
> **Full Changelog**: <anthropics/claude-code-action@v1...v1.0.109>
>
> v1.0.108
> --------
>
> **Full Changelog**: <anthropics/claude-code-action@v1...v1.0.108>


Commits

* [`fefa07e`](anthropics/claude-code-action@fefa07e) chore: bump Claude Code to 2.1.126 and Agent SDK to 0.2.126
* [`ef50f12`](anthropics/claude-code-action@ef50f12) chore: bump Claude Code to 2.1.123 and Agent SDK to 0.2.123
* [`b3c0320`](anthropics/claude-code-action@b3c0320) chore: bump Claude Code to 2.1.122 and Agent SDK to 0.2.122
* [`c93e8fe`](anthropics/claude-code-action@c93e8fe) docs: pull\_request\_target guidance and base-action trust model ([#1250](https://redirect.github.com/anthropics/claude-code-action/issues/1250))
* [`11a9dad`](anthropics/claude-code-action@11a9dad) chore: bump Claude Code to 2.1.121 and Agent SDK to 0.2.121
* See full diff in [compare view](anthropics/claude-code-action@567fe95...fefa07e)
  
Updates `github/codeql-action` from 4.35.2 to 4.35.3
Release notes

*Sourced from [github/codeql-action's releases](https://github.com/github/codeql-action/releases).*

> v4.35.3
> -------
>
> * *Upcoming breaking change*: Add a deprecation warning for customers using CodeQL version 2.19.3 and earlier. These versions of CodeQL were discontinued on 9 April 2026 alongside GitHub Enterprise Server 3.15, and will be unsupported by the next minor release of the CodeQL Action. [#3837](https://redirect.github.com/github/codeql-action/pull/3837)
> * Configurations for private registries that use Cloudsmith or GCP OIDC are now accepted. [#3850](https://redirect.github.com/github/codeql-action/pull/3850)
> * Best-effort connection tests for private registries now use `GET` requests instead of `HEAD` for better compatibility with various registry implementations. For NuGet feeds, the test is now always performed against the service index. [#3853](https://redirect.github.com/github/codeql-action/pull/3853)
> * Fixed a bug where two diagnostics produced within the same millisecond could overwrite each other on disk, causing one of them to be lost. [#3852](https://redirect.github.com/github/codeql-action/pull/3852)
> * Update default CodeQL bundle version to [2.25.3](https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.3). [#3865](https://redirect.github.com/github/codeql-action/pull/3865)


Changelog

*Sourced from [github/codeql-action's changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md).*

> CodeQL Action Changelog
> =======================
>
> See the [releases page](https://github.com/github/codeql-action/releases) for the relevant changes to the CodeQL CLI and language packs.
>
> [UNRELEASED]
> ------------
>
> No user facing changes.
>
> 4.35.3 - 01 May 2026
> --------------------
>
> * *Upcoming breaking change*: Add a deprecation warning for customers using CodeQL version 2.19.3 and earlier. These versions of CodeQL were discontinued on 9 April 2026 alongside GitHub Enterprise Server 3.15, and will be unsupported by the next minor release of the CodeQL Action. [#3837](https://redirect.github.com/github/codeql-action/pull/3837)
> * Configurations for private registries that use Cloudsmith or GCP OIDC are now accepted. [#3850](https://redirect.github.com/github/codeql-action/pull/3850)
> * Best-effort connection tests for private registries now use `GET` requests instead of `HEAD` for better compatibility with various registry implementations. For NuGet feeds, the test is now always performed against the service index. [#3853](https://redirect.github.com/github/codeql-action/pull/3853)
> * Fixed a bug where two diagnostics produced within the same millisecond could overwrite each other on disk, causing one of them to be lost. [#3852](https://redirect.github.com/github/codeql-action/pull/3852)
> * Update default CodeQL bundle version to [2.25.3](https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.3). [#3865](https://redirect.github.com/github/codeql-action/pull/3865)
>
> 4.35.2 - 15 Apr 2026
> --------------------
>
> * The undocumented TRAP cache cleanup feature that could be enabled using the `CODEQL_ACTION_CLEANUP_TRAP_CACHES` environment variable is deprecated and will be removed in May 2026. If you are affected by this, we recommend disabling TRAP caching by passing the `trap-caching: false` input to the `init` Action. [#3795](https://redirect.github.com/github/codeql-action/pull/3795)
> * The Git version 2.36.0 requirement for improved incremental analysis now only applies to repositories that contain submodules. [#3789](https://redirect.github.com/github/codeql-action/pull/3789)
> * Python analysis on GHES no longer extracts the standard library, relying instead on models of the standard library. This should result in significantly faster extraction and analysis times, while the effect on alerts should be minimal. [#3794](https://redirect.github.com/github/codeql-action/pull/3794)
> * Fixed a bug in the validation of OIDC configurations for private registries that was added in CodeQL Action 4.33.0 / 3.33.0. [#3807](https://redirect.github.com/github/codeql-action/pull/3807)
> * Update default CodeQL bundle version to [2.25.2](https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.2). [#3823](https://redirect.github.com/github/codeql-action/pull/3823)
>
> 4.35.1 - 27 Mar 2026
> --------------------
>
> * Fix incorrect minimum required Git version for [improved incremental analysis](https://redirect.github.com/github/roadmap/issues/1158): it should have been 2.36.0, not 2.11.0. [#3781](https://redirect.github.com/github/codeql-action/pull/3781)
>
> 4.35.0 - 27 Mar 2026
> --------------------
>
> * Reduced the minimum Git version required for [improved incremental analysis](https://redirect.github.com/github/roadmap/issues/1158) from 2.38.0 to 2.11.0. [#3767](https://redirect.github.com/github/codeql-action/pull/3767)
> * Update default CodeQL bundle version to [2.25.1](https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.1). [#3773](https://redirect.github.com/github/codeql-action/pull/3773)
>
> 4.34.1 - 20 Mar 2026
> --------------------
>
> * Downgrade default CodeQL bundle version to [2.24.3](https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.3) due to issues with a small percentage of Actions and JavaScript analyses. [#3762](https://redirect.github.com/github/codeql-action/pull/3762)
>
> 4.34.0 - 20 Mar 2026
> --------------------
>
> * Added an experimental change which disables TRAP caching when [improved incremental analysis](https://redirect.github.com/github/roadmap/issues/1158) is enabled, since improved incremental analysis supersedes TRAP caching. This will improve performance and reduce Actions cache usage. We expect to roll this change out to everyone in March. [#3569](https://redirect.github.com/github/codeql-action/pull/3569)
> * We are rolling out improved incremental analysis to C/C++ analyses that use build mode `none`. We expect this rollout to be complete by the end of April 2026. [#3584](https://redirect.github.com/github/codeql-action/pull/3584)
> * Update default CodeQL bundle version to [2.25.0](https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.25.0). [#3585](https://redirect.github.com/github/codeql-action/pull/3585)
>
> 4.33.0 - 16 Mar 2026
> --------------------
>
> * Upcoming change: Starting April 2026, the CodeQL Action will skip collecting file coverage information on pull requests to improve analysis performance. File coverage information will still be computed on non-PR analyses. Pull request analyses will log a warning about this upcoming change. [#3562](https://redirect.github.com/github/codeql-action/pull/3562)
>
>   To opt out of this change:
>
>   + **Repositories owned by an organization:** Create a custom repository property with the name `github-codeql-file-coverage-on-prs` and the type "True/false", then set this property to `true` in the repository's settings. For more information, see [Managing custom properties for repositories in your organization](https://docs.github.com/en/organizations/managing-organization-settings/managing-custom-properties-for-repositories-in-your-organization). Alternatively, if you are using an advanced setup workflow, you can set the `CODEQL_ACTION_FILE_COVERAGE_ON_PRS` environment variable to `true` in your workflow.
>   + **User-owned repositories using default setup:** Switch to an advanced setup workflow and set the `CODEQL_ACTION_FILE_COVERAGE_ON_PRS` environment variable to `true` in your workflow.

... (truncated)


Commits

* [`e46ed2c`](github/codeql-action@e46ed2c) Merge pull request [#3867](https://redirect.github.com/github/codeql-action/issues/3867) from github/update-v4.35.3-8c6e48dbe
* [`b73d1d1`](github/codeql-action@b73d1d1) Add changelog entry for [#3853](https://redirect.github.com/github/codeql-action/issues/3853)
* [`24e0bb0`](github/codeql-action@24e0bb0) Reorder changelog entries
* [`ec298da`](github/codeql-action@ec298da) Update changelog for v4.35.3
* [`8c6e48d`](github/codeql-action@8c6e48d) Merge pull request [#3865](https://redirect.github.com/github/codeql-action/issues/3865) from github/update-bundle/codeql-bundle-v2.25.3
* [`7190983`](github/codeql-action@7190983) Add changelog note
* [`2bb2095`](github/codeql-action@2bb2095) Update default bundle to codeql-bundle-v2.25.3
* [`7851e55`](github/codeql-action@7851e55) Merge pull request [#3850](https://redirect.github.com/github/codeql-action/issues/3850) from github/mbg/private-registry/cloudsmith-gcp
* [`262a15f`](github/codeql-action@262a15f) Add generic non-printable chars test for OIDC configs
* [`a6109b1`](github/codeql-action@a6109b1) Merge pull request [#3853](https://redirect.github.com/github/codeql-action/issues/3853) from github/mbg/start-proxy/improved-checks
* Additional commits viewable in [compare view](github/codeql-action@95e58e9...e46ed2c)
  
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
Dependabot commands and options
  
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot show  ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore  major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself)
- `@dependabot ignore  minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself)
- `@dependabot ignore ` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore ` will remove all of the ignore conditions of the specified dependency
- `@dependabot unignore  ` will remove the ignore condition of the specified dependency and ignore conditions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cypher Batch creation is slow with vector indexes

2 participants