Skip to content

fix: Cypher head(collect()) returning null in WITH clauses (#3307)#3309

Merged
robfrank merged 2 commits intomainfrom
fix-3307
Feb 3, 2026
Merged

fix: Cypher head(collect()) returning null in WITH clauses (#3307)#3309
robfrank merged 2 commits intomainfrom
fix-3307

Conversation

@robfrank
Copy link
Collaborator

@robfrank robfrank commented Feb 2, 2026

WithClause.hasAggregations() was not recursively detecting aggregations nested inside non-aggregation functions like head(collect()). This caused the query planner to select WithStep instead of AggregationStep, resulting in null values when using nested aggregations in WITH clauses.

Added recursive containsAggregation() helper to detect aggregations at any nesting level, matching the pattern already used in ReturnClause.

Includes comprehensive regression test covering:

  • head(collect(ID(doc))) with document IDs
  • head(collect(doc.name)) with property access
  • Multiple documents to verify head() gets first element
  • Edge case: no matching documents (graceful null handling)
  • head(collect(doc)) with full node objects

What does this PR do?

fixes #3307

@robfrank robfrank added this to the 26.2.1 milestone Feb 2, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the Cypher query engine where nested aggregation functions, specifically head(collect()) within WITH clauses, were not being correctly processed, leading to null results. The underlying cause was the query planner's inability to recursively identify aggregations, which resulted in the selection of an incorrect execution strategy. The solution involves enhancing the aggregation detection logic to be recursive, ensuring that the appropriate aggregation steps are used and correct results are returned for such queries.

Highlights

  • Bug Fix: Null values in head(collect()) with WITH clauses: Resolved an issue where Cypher queries using head(collect()) within WITH clauses would incorrectly return null values. This was due to the query planner failing to detect nested aggregations.
  • Recursive Aggregation Detection: The WithClause.hasAggregations() and hasNonAggregations() methods were updated to recursively detect aggregations, including those nested within other functions like head(). A new private helper method containsAggregation(Expression expr) was introduced for this purpose.
  • Comprehensive Regression Test: A new regression test, testHeadCollectInWithClause(), was added to OpenCypherCollectUnwindTest.java. This test covers various scenarios, including head(collect(ID(doc))), head(collect(doc.name)), head(collect(doc)), and graceful handling of empty collections.
  • Detailed Documentation: A new markdown file, 3307-cypher-head-collect-null-results.md, was added to thoroughly document the issue, root cause analysis, test strategy, implementation details, and impact of the fix.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify
Copy link
Contributor

mergify bot commented Feb 2, 2026

🧪 CI Insights

Here's what we observed from your CI run for 82dd5af.

🟢 All jobs passed!

But CI Insights is watching 👀

@claude
Copy link

claude bot commented Feb 2, 2026

Pull Request Review: Fix for Cypher head(collect()) Returning Null (#3307)

Summary

This PR fixes a critical bug where nested aggregation functions like head(collect()) were returning null values in WITH clauses. The root cause was that WithClause.hasAggregations() only performed shallow detection and missed aggregations nested inside wrapper functions.


✅ Code Quality and Best Practices

Strengths:

  1. Follows TDD Approach: Comprehensive test written before the fix (as per CLAUDE.md guidelines) ✅
  2. Code Style Adherence:
    • Uses final keyword appropriately on parameters ✅
    • Pattern matching with instanceof (modern Java) ✅
    • Single-statement ifs without braces (consistent with codebase) ✅
    • Proper JavaDoc comments ✅
  3. Consistency: The fix mirrors the existing ReturnClause.containsAggregation() implementation, maintaining consistency across the codebase
  4. Minimal Change: Only touches the necessary code to fix the issue, no over-engineering

Observations:

  • The recursive containsAggregation() helper is marked private static, which is appropriate since it's a utility method
  • The method signature matches the pattern in ReturnClause.java exactly (lines 104-117), which is good for maintainability

🐛 Potential Issues

Minor: Documentation File in Root

The file 3307-cypher-head-collect-null-results.md appears to be a detailed documentation/tracking file placed in the repository root. Consider:

  • Moving it to a docs/ or docs/fixes/ directory for better organization
  • Or removing it entirely if it was just for development tracking (per CLAUDE.md: "do not commit on git, I will do it after a review")

Recommendation: This appears to be a working document that should likely be removed before merge or relocated to a proper documentation directory.


⚡ Performance Considerations

Analysis:

  1. Recursion Depth: The recursive containsAggregation() method will traverse the expression tree. For typical Cypher queries, nesting depth is shallow (2-3 levels max), so performance impact is negligible.

  2. Method Call Overhead: The method is called during query planning (not execution), so any overhead is one-time and acceptable.

  3. Query Execution Performance:

    • Before fix: Used WithStep (incorrect) - row-by-row evaluation
    • After fix: Uses AggregationStep (correct) - consumes all rows first
    • The fix may actually improve performance for queries with nested aggregations by using the correct execution path

Verdict: ✅ No performance concerns; potentially slight improvement


🔒 Security Concerns

Verdict: ✅ No security issues identified

  • The changes are purely in query planning logic
  • No input validation needed (expressions come from parsed AST)
  • No risk of SQL injection, XSS, or other OWASP top 10 vulnerabilities
  • Follows ArcadeDB's principle of "keep in mind speed and security"

🧪 Test Coverage

Excellent Coverage:

The test testHeadCollectInWithClause() covers:

  1. head(collect(ID(doc))) - Document IDs (main bug scenario)
  2. head(collect(doc.name)) - Property access
  3. ✅ Multiple documents - Verifies head() returns first element
  4. ✅ Edge case: No matching documents - Graceful null handling
  5. head(collect(doc)) - Full node objects
  6. ✅ Multiple relationships per node - Verifies aggregation across all matches

Test Quality:

  • Uses AssertJ assertions (assertThat(...).isTrue()) per CLAUDE.md guidelines ✅
  • Clear test structure with comments explaining each scenario ✅
  • Tests actual graph relationships (CHUNK → DOCUMENT) matching the original issue ✅
  • Proper setup and teardown with @BeforeEach and @AfterEach

Missing Coverage:

Consider adding tests for:

  1. Other wrapper functions: last(collect()), size(collect()), min(collect()), max(collect())
  2. Deeper nesting: head(tail(collect())) (though less common)
  3. Mixed aggregations: WITH head(collect(x)) as a, sum(y) as b

Verdict: ✅ Test coverage is comprehensive for the reported issue; additional edge cases would be nice but not blocking


🎯 Correctness Analysis

Root Cause Identification:

The PR correctly identifies that:

  1. head() is not an aggregation function (it's a list function)
  2. collect() IS an aggregation function
  3. Non-recursive detection missed the nested collect() inside head()
  4. This caused query planner to select WithStep instead of AggregationStep (line 787-814 in CypherExecutionPlan.java)

Solution Validation:

The recursive approach is correct because:

  • It checks if the expression itself is an aggregation ✅
  • It recursively checks all function arguments ✅
  • It stops at the first aggregation found (efficient) ✅
  • It matches the proven pattern from ReturnClause

Verdict: ✅ Fix is technically sound and addresses the root cause


📋 Recommendations

Before Merge:

  1. Remove or relocate documentation file: 3307-cypher-head-collect-null-results.md should not be in the repository root
  2. Verify compilation: Ensure mvn clean install -DskipTests passes without warnings
  3. Run full OpenCypher test suite: Per the documentation, tests passed (271/271), but confirm this in CI

Future Enhancements (Non-Blocking):

  1. Consider adding a test for other wrapper functions (last(), size(), etc.)
  2. Consider extracting containsAggregation() to a shared utility class if it's needed elsewhere (currently duplicated between WithClause and ReturnClause)

🎬 Final Verdict

Status: ✅ APPROVED with minor recommendations

This is a well-crafted fix that:

  • Correctly identifies and solves the root cause
  • Follows TDD principles with comprehensive tests
  • Adheres to ArcadeDB coding standards
  • Has no performance or security concerns
  • Provides excellent documentation of the issue and solution

Action Items:

  1. Remove/relocate 3307-cypher-head-collect-null-results.md before merge
  2. Confirm all tests pass in CI
  3. Ready to merge after cleanup

Great work on this fix! The detailed investigation and testing demonstrate thorough engineering practices. 👍

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves issue #3307, which caused head(collect()) to return null values in WITH clauses due to incorrect aggregation detection. The core fix involves updating the WithClause.hasAggregations() and hasNonAggregations() methods to use a new recursive helper, containsAggregation(Expression expr). This helper correctly identifies aggregations nested within other functions, ensuring the query planner selects the appropriate AggregationStep. The changes are well-documented in the new markdown file 3307-cypher-head-collect-null-results.md, which provides a clear problem description, root cause analysis, and implementation details. Furthermore, a comprehensive regression test testHeadCollectInWithClause() has been added to OpenCypherCollectUnwindTest.java, covering various scenarios and edge cases, confirming the fix and preventing future regressions. The overall implementation is clean, follows existing code style, and significantly improves the robustness of Cypher query processing.

@ExtReMLapin
Copy link
Contributor

On my end claude generated this fix proposal, had to do multi-turn to fix others issues (multiple commits)

#3310

@codacy-production
Copy link

codacy-production bot commented Feb 2, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-8.94%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (1c28918) 100598 65636 65.25%
Head commit (82dd5af) 130162 (+29564) 73284 (+7648) 56.30% (-8.94%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3309) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

WithClause.hasAggregations() was not recursively detecting aggregations
nested inside non-aggregation functions like head(collect()). This caused
the query planner to select WithStep instead of AggregationStep, resulting
in null values when using nested aggregations in WITH clauses.

Added recursive containsAggregation() helper to detect aggregations at any
nesting level, matching the pattern already used in ReturnClause.

Includes comprehensive regression test covering:
- head(collect(ID(doc))) with document IDs
- head(collect(doc.name)) with property access
- Multiple documents to verify head() gets first element
- Edge case: no matching documents (graceful null handling)
- head(collect(doc)) with full node objects

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 3, 2026

Code Review - PR #3309

Overview

This PR fixes issue #3307 where head(collect()) in Cypher WITH clauses was returning null instead of the expected aggregated values. The root cause was correctly identified: WithClause.hasAggregations() was not detecting nested aggregations.


✅ Positive Findings

1. Excellent Root Cause Analysis

The documentation file provides exceptional detail on:

  • Clear problem description with reproducible examples
  • Accurate root cause identification (non-recursive aggregation detection)
  • Comprehensive explanation of why null values occurred
  • Proper understanding of the execution pipeline (AggregationStep vs WithStep)

2. Test-Driven Development Approach

  • Follows CLAUDE.md requirement: "Write the tests first (TDD approach) whenever possible" ✅
  • Comprehensive regression test with 5 distinct scenarios
  • Tests cover edge cases (empty collections, multiple elements, full node objects)
  • Uses proper AssertJ syntax: assertThat(property).isTrue(); as specified in CLAUDE.md ✅

3. Code Quality

The implementation (already merged to main) shows:

  • Uses final keyword on parameters ✅
  • Follows existing code style (simple if statements without braces) ✅
  • Consistent with ReturnClause pattern (good architectural consistency)
  • Leverages existing Expression.containsAggregation() method (reuse over reinvention) ✅

4. Thorough Testing

  • Test method testHeadCollectInWithClause() covers all scenarios from the documentation
  • 162 lines of well-structured test code with clear comments
  • Regression prevention for future changes
  • All assertions use descriptive messages (as("description"))

5. Documentation

  • Comprehensive 272-line documentation file
  • Tracks all implementation steps with checkboxes
  • Includes before/after test results
  • Provides monitoring recommendations

🔍 Observations & Minor Concerns

1. Documentation File Location

The file 3307-cypher-head-collect-null-results.md is at the repository root. This seems unusual for a bug fix documentation.

Question: Is this intended to be:

  • A temporary analysis file that should be removed before merge?
  • Permanent documentation that should move to a docs/ directory?
  • Part of the PR description instead of a committed file?

Per CLAUDE.md: "do not commit on git, I will do it after a review" - but this appears to be documentation rather than code.

2. Test Class Placement

The test was added to OpenCypherCollectUnwindTest.java which makes sense given the collect() function involvement. However:

  • The test name testHeadCollectInWithClause() focuses on the head() wrapper
  • It might also fit in a dedicated WITH clause test file if one exists
  • Current placement is acceptable but worth confirming it's the best location

3. Missing Verification Step

The CLAUDE.md states: "Run all the connected tests could be affected by the change to ensure nothing is broken"

Recommendation: The documentation shows:

  • Tests run: 13, Failures: 0 (test class)
  • Tests run: 271, Failures: 0 (all OpenCypher tests)

But we should also verify:

mvn test -Dtest="*WithClause*,*Aggregation*"

To ensure all aggregation-related and WITH clause tests still pass.


🔒 Security Assessment

No security concerns identified.

  • The fix is purely functional (detection logic)
  • Does not introduce new inputs or data handling
  • No changes to authentication, authorization, or data access
  • Test data is synthetic and safe

⚡ Performance Considerations

Positive Impact

  • Correct execution path: Using AggregationStep instead of WithStep for nested aggregations ensures proper aggregation semantics
  • No regression risk: containsAggregation() is already used in ReturnClause without performance issues

Neutral

  • Recursive traversal: The containsAggregation() method recursively checks expression trees
  • For deeply nested expressions, this adds minor overhead during query planning
  • However, query planning happens once per query, so impact is negligible
  • The fix corrects incorrect results, so any minor planning overhead is justified

🐛 Potential Issues

None identified in the actual code fix

The fix itself is sound - it leverages an existing method that already handles the recursion correctly.


📋 Recommendations

Before Merge:

  1. Clarify documentation file: Decide if 3307-cypher-head-collect-null-results.md should be:

    • Removed (move content to PR description)
    • Moved to docs/bugfixes/ or similar
    • Kept at root (if this is standard practice)
  2. Run broader test suite (as per CLAUDE.md):

    mvn test -Dtest="*Cypher*Aggregation*,*WithClause*"
  3. Verify with original issue: Test with the exact database backup mentioned in issue Cypher : Query now fails both on old and new engine #3307 to confirm complete resolution

  4. Remove System.out statements: Quick check to ensure no debug output was left behind (per CLAUDE.md)

    grep -r "System.out" engine/src/main/java/com/arcadedb/query/opencypher/ast/WithClause.java

Optional Enhancements:

  1. Consider similar functions: Audit other wrapper functions mentioned in docs (last(), size(), min(), max()) to ensure consistent behavior

  2. Add performance test: If this pattern is common in production queries, consider adding a performance regression test


📊 Test Coverage Assessment

Excellent coverage for the specific bug:

  • ✅ Basic case: head(collect(ID(doc)))
  • ✅ Property access: head(collect(doc.name))
  • ✅ Multiple elements (verifies head() gets first)
  • ✅ Empty collection (null handling)
  • ✅ Full node objects: head(collect(doc))

Missing coverage (optional):

  • Other wrapper functions: last(collect()), size(collect())
  • Deeply nested: head(head(collect())) (probably invalid, but good to verify error handling)
  • Mixed aggregations: WITH head(collect(a)) as x, count(b) as y

🎯 Final Verdict

APPROVED with minor clarifications needed

Strengths:

  • Correct root cause analysis and fix
  • Excellent test coverage
  • Follows TDD approach
  • Adheres to code style guidelines
  • No security or performance concerns
  • Comprehensive documentation

Action Items:

  1. Clarify the documentation file's intended location/fate
  2. Confirm broader test suite passes
  3. Consider removing the markdown file or moving it to docs/

Overall Quality: 9/10

This is a well-executed bug fix that demonstrates strong understanding of the codebase architecture and follows best practices for testing and documentation. The only deduction is for the unclear purpose of the committed documentation file.


Review completed following ArcadeDB CLAUDE.md guidelines

@robfrank robfrank merged commit 8d12402 into main Feb 3, 2026
9 of 11 checks passed
@robfrank robfrank deleted the fix-3307 branch February 3, 2026 09:12
@claude
Copy link

claude bot commented Feb 3, 2026

Pull Request Review - Issue 3309

Summary

This PR successfully fixes a bug where head(collect()) and other nested aggregation functions were returning null values in WITH clauses. The root cause was correctly identified: WithClause was not recursively detecting aggregations nested inside non-aggregation functions.

Code Quality - EXCELLENT

  • Uses final keyword on method parameters
  • Follows existing code style
  • Consistent with ReturnClause implementation pattern
  • Clean, readable JavaDoc comments
  • Minimal and surgical fix: Changed 2 methods to call containsAggregation() instead of isAggregation()

Technical Correctness - EXCELLENT

The fix works because Expression.containsAggregation() is a recursive method that traverses the entire expression tree, matching the pattern used in ReturnClause. Query planner now correctly uses AggregationStep instead of WithStep.

Test Coverage - COMPREHENSIVE

The testHeadCollectInWithClause() test covers:

  1. head(collect(ID(doc))) - Main bug scenario
  2. head(collect(doc.name)) - Property access
  3. Multiple documents - Verifies head() gets first element
  4. Edge case: No matching documents
  5. head(collect(doc)) - Full node objects

Performance & Security - NO CONCERNS

  • containsAggregation() only called during query planning
  • No security issues identified
  • Actually improves performance by selecting correct execution step

Minor Observations

  1. Documentation file 3307-cypher-head-collect-null-results.md should likely be removed before merge as it is investigation documentation
  2. Test file has some formatting changes (text blocks) unrelated to the bug fix, but benign

Recommendation: APPROVE with minor suggestion

The fix is technically sound, well-tested, and follows all project conventions. Only suggestion is to remove the markdown documentation file before merge.

Great work on this fix! The root cause analysis was thorough and the solution is elegant.

@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.08%. Comparing base (1c28918) to head (82dd5af).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3309      +/-   ##
==========================================
- Coverage   56.16%   56.08%   -0.09%     
==========================================
  Files        1368     1368              
  Lines      100598   100598              
  Branches    20483    20483              
==========================================
- Hits        56502    56418      -84     
- Misses      34996    35072      +76     
- Partials     9100     9108       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

robfrank added a commit that referenced this pull request Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cypher : Query now fails both on old and new engine

2 participants