fix: prevent incorrect filter pushdown for WHERE predicates with nest…#3534
fix: prevent incorrect filter pushdown for WHERE predicates with nest…#3534
Conversation
…ed variable references The filter pushdown optimization in WhereClause was using a text-based heuristic (collectFromText) to detect referenced variables in InExpression and FunctionCallExpression. This heuristic only found variables followed by a dot (e.g., n.name) but missed bare variable references like b in labels(b). This caused WHERE predicates referencing variables bound in later MATCH steps to be incorrectly pushed down to earlier node scans, where those variables were not yet available, resulting in empty results. Fixed by properly traversing sub-expressions of InExpression, FunctionCallExpression, ListComprehensionExpression, ArithmeticExpression, ListExpression, and BooleanWrapperExpression to collect all referenced variables accurately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue in the query optimizer where WHERE predicates containing nested variable references were sometimes incorrectly pushed down, resulting in empty result sets. The core of the fix involves enhancing the variable collection mechanism within the abstract syntax tree (AST) traversal to ensure all variables are properly identified across various expression types. This prevents premature filtering and guarantees correct query execution, as demonstrated by the addition of a new, targeted test case. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🧪 CI InsightsHere's what we observed from your CI run for 4a6709f. 🟢 All jobs passed!But CI Insights is watching 👀 |
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a critical bug regarding incorrect filter pushdown by replacing a fragile text-based variable detection heuristic with a robust AST traversal. This is a significant improvement for query correctness. The new test case is also well-designed to validate the fix. I have one suggestion for a further refinement in WhereClause.java to make the variable collection even more accurate for ListComprehensionExpression, which could improve optimization opportunities in the future.
| } else if (expr instanceof ListComprehensionExpression listComp) { | ||
| collectExpressionVariables(listComp.getListExpression(), vars); | ||
| if (listComp.getWhereExpression() != null) | ||
| collectExpressionVariables(listComp.getWhereExpression(), vars); | ||
| if (listComp.getMapExpression() != null) | ||
| collectExpressionVariables(listComp.getMapExpression(), vars); |
There was a problem hiding this comment.
This correctly traverses the sub-expressions of ListComprehensionExpression. However, it doesn't account for the fact that the iteration variable is local to the comprehension. As a result, the local variable is collected as if it were an external dependency.
This is safe because it will prevent pushdown, thus avoiding the original bug of incorrect results. However, it is not fully accurate and can be overly conservative, preventing valid optimizations where a predicate could be pushed down.
For example, in [x IN n.items | x > 10], x is local. The expression only depends on n. The current logic will collect both {n, x}. Since x is not defined in the outer scope, this will prevent pushdown. A more accurate collection would only yield {n}, allowing for pushdown.
To fix this, the variable collection logic would need to be aware of local scopes introduced by constructs like list comprehensions. This might require a larger refactoring (e.g., passing a set of locally-scoped variables to ignore during traversal). While maybe out of scope for this PR, it's a potential future improvement for the optimizer.
Code ReviewThis PR correctly identifies and fixes a class of bugs where filter pushdown in Positive Aspects
Remaining
|
…versal Address code review feedback: replace text-based variable detection in StringMatchExpression, RegexExpression, and LabelCheckExpression with proper accessor-based traversal. Also assert the returned node type in the regression test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code ReviewThis is a well-targeted fix that replaces an unreliable text-based heuristic with proper AST traversal. The root cause (the Strengths
Issues1. List comprehension loop variable leaks into outer variable set In } else if (expr instanceof ListComprehensionExpression listComp) {
collectExpressionVariables(listComp.getListExpression(), vars); // correctly collects `b`
if (listComp.getWhereExpression() != null)
collectExpressionVariables(listComp.getWhereExpression(), vars);
if (listComp.getMapExpression() != null)
collectExpressionVariables(listComp.getMapExpression(), vars); // also collects `x` ← false positive
}The } else if (expr instanceof ListComprehensionExpression listComp) {
collectExpressionVariables(listComp.getListExpression(), vars);
final Set<String> inner = new HashSet<>(vars);
if (listComp.getWhereExpression() != null)
collectExpressionVariables(listComp.getWhereExpression(), inner);
if (listComp.getMapExpression() != null)
collectExpressionVariables(listComp.getMapExpression(), inner);
inner.remove(listComp.getVariable()); // strip loop variable
vars.addAll(inner);
}In practice this is a conservative over-approximation (prevents pushdown when the query happens to have a match variable with the same name as a loop variable), but it's technically wrong semantics. For example, a query like: MATCH (x)-->(b) WHERE x.name IN [x IN labels(b) | x] RETURN b...would incorrectly report that the predicate references 2. The } else if (expr instanceof PatternPredicateExpression) {
// Pattern predicates reference multiple variables, collect from text
collectFromText(expr.getText(), vars);
}This is the same incomplete heuristic that caused the original bug — it only detects 3. Missing expression types There are other Test CoverageThe single test case covers the reported scenario well. Consider adding:
MinorThe Overall: the core fix is correct and necessary. The list comprehension loop variable scoping issue (point 1) is the main thing worth addressing before merge; the others are either pre-existing or minor. |
…ization ExplainStatementExecutionTest: match actual plan output "SCAN WITH FILTER BUCKET" SelectStatementExecutionTest: accept FetchFromTypeWithFilterStep as valid no-index step Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Follow-up Review NotesCorrections and observations not covered in the earlier reviews: Correction to First ReviewMy first review comment incorrectly stated that SQL Test Changes Need ExplanationThe PR description focuses on the Cypher filter pushdown fix, but two SQL execution plan tests are also modified:
assertThat(executionPlanAsString).contains("FILTER ITEMS WHERE");to: assertThat(executionPlanAsString).contains("SCAN WITH FILTER BUCKET");
These changes suggest the SQL query planner is now generating It would help to clarify in the PR description or commit message whether these SQL test changes:
If incidental, it's worth noting since reviewers might otherwise wonder if the SQL planner was accidentally changed. Loop Variable Issue (Clarification)The loop variable over-approximation noted in the previous review (e.g., |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
…ed variable references
The filter pushdown optimization in WhereClause was using a text-based heuristic (collectFromText) to detect referenced variables in InExpression and FunctionCallExpression. This heuristic only found variables followed by a dot (e.g., n.name) but missed bare variable references like b in labels(b). This caused WHERE predicates referencing variables bound in later MATCH steps to be incorrectly pushed down to earlier node scans, where those variables were not yet available, resulting in empty results.
Fixed by properly traversing sub-expressions of InExpression, FunctionCallExpression, ListComprehensionExpression, ArithmeticExpression, ListExpression, and BooleanWrapperExpression to collect all referenced variables accurately.