Skip to content

fix(#4468): expand primitive-array IN params through indexed path#4469

Merged
robfrank merged 3 commits into
mainfrom
fix/4468-sql-in-param-index-expansion
Jun 3, 2026
Merged

fix(#4468): expand primitive-array IN params through indexed path#4469
robfrank merged 3 commits into
mainfrom
fix/4468-sql-in-param-index-expansion

Conversation

@robfrank

@robfrank robfrank commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

Closes #4468

WHERE field IN :param returned no rows when field had an index and the parameter value was a collection. The non-indexed path (WHERE (field + 0) IN :param) and the literal form (WHERE field IN [1, 2, 3]) worked correctly.

Root cause: PostCommandHandler uses json.toMap(true) to deserialize the HTTP request body. The optimized flag converts JSON numeric arrays ([1, 2, 3]) to long[]/double[] primitive arrays to avoid boxing allocations. Primitive arrays are not Iterable<?> in Java.

FetchFromIndexStep.cartesianProduct() expanded collection parameters into one index lookup per element using:

if (value instanceof Iterable<?> iterable && !(value instanceof Identifiable)) {

A long[] parameter failed this check, so the entire array was passed as a single index key - matching nothing.

Fix: Extended the predicate in cartesianProduct() to also handle arrays, wrapping them with the existing IterableObjectArray utility:

if (!(value instanceof Identifiable) && (value instanceof Iterable<?> || (value != null && value.getClass().isArray()))) {
    final Iterable<?> iterable = value instanceof Iterable<?> iter ? iter : new IterableObjectArray<>(value);
    for (final Object elemInKey : iterable) {

Test plan

  • New regression test InParamIndexExpansionTest (9 test methods) covers:
    • Named List<Long> parameter via IN :codes with an indexed field
    • Named primitive long[] parameter (simulates HTTP JSON deserialization via toMap(true))
    • Named Object[] parameter
    • Positional List<Long> parameter via IN ?
    • Execution plan verification that the index is used
    • Non-indexed control path to confirm no regression there
    • Partial match (subset of inserted values)
    • Single-element list
  • 314 index and SQL query tests pass with no failures

FetchFromIndexStep.cartesianProduct() only expanded values that
implement Iterable<?>. JSON deserialization via toMap(true) in
PostCommandHandler converts numeric JSON arrays to long[]/double[]
(primitive arrays), which are not Iterable. The whole array was
passed as a single index key, returning no rows.

Extended the predicate to also match arrays and wrap them with
IterableObjectArray so every element is looked up individually,
matching the behaviour of the non-indexed IN evaluator.
@codacy-production

codacy-production Bot commented Jun 3, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity

Metric Results
Complexity 0

View in Codacy

🟢 Coverage 100.00% diff coverage · -7.36% coverage variation

Metric Results
Coverage variation -7.36% coverage variation
Diff coverage 100.00% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (716cfa5) 128545 94533 73.54%
Head commit (ac59992) 160233 (+31688) 106038 (+11505) 66.18% (-7.36%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4469) 2 2 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes a real bug where WHERE field IN :param returned no rows when the field had an index and the parameter was a primitive array (e.g. long[]) produced by json.toMap(true) HTTP deserialization. The root cause analysis is accurate and well-documented.


Fix (FetchFromIndexStep.java)

The one-line change to cartesianProduct() is correct and minimal. The null-guard before value.getClass().isArray() is properly placed.

Suggestion - use the existing MultiValue utility instead of IterableObjectArray:

MultiValue.getMultiValueIterable() already exists in this class's imports (used at line 261) and already handles every array type (int[], long[], double[], Object[]) plus Iterable. The fix could be:

if (!(value instanceof Identifiable) && MultiValue.isMultiValue(value)) {
    final List<PCollection> result = new ArrayList<>();
    for (final Object elemInKey : MultiValue.getMultiValueIterable(value)) {

This is more consistent with the surrounding code (which already uses MultiValue.isMultiValue() and MultiValue.getMultiValueIterable() at lines 259-261), avoids introducing a second code path for the same concept, and handles all primitive array types without requiring the IterableObjectArray wrapper. The current approach works, but consolidating around the already-imported MultiValue would reduce duplication.


Tests (InParamIndexExpansionTest.java)

Good coverage overall - 9 test methods hit the indexed path, non-indexed path, partial matches, and single-element lists. A few observations:

  • Missing int[] and double[] cases - toMap(true) also produces int[] and double[] from integer/float JSON arrays. Adding one test each for those variants would complete the regression coverage.
  • Comment style - Per project conventions (CLAUDE.md), avoid comments that reference the issue number or describe what the code does (e.g. // Issue #4468: named List<Long> param — this is the primary failing form.). The test method name already communicates intent. The class-level Javadoc with the root cause explanation is fine.
  • The use of assertThat(...).containsExactly(...) and TestHelper base class matches project conventions.

Documentation file (docs/4468-sql-in-param-index-expansion.md)

This is not a project convention - there is no docs/ directory with per-issue markdown files elsewhere in the repo. The PR description already contains all of this information. This file will become stale as the code evolves and references an issue number that means nothing without GitHub context. Consider removing it; the commit message and PR body are the right place for this narrative.


Summary

Area Status
Root cause analysis Correct
Core fix Works, but MultiValue reuse preferred for consistency
Test coverage Good - consider adding int[]/double[] cases
Test style Minor: remove issue-reference comments from method bodies
Docs file Recommend removing - not a project convention

The fix is sound. The main suggestions are using the pre-existing MultiValue utility for consistency, and dropping the docs file.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses issue #4468, where SQL queries using IN :param with a collection parameter returned no rows when an index was used. This occurred because primitive arrays resulting from JSON deserialization are not instances of Iterable<?> in Java, causing the index lookup to fail. The fix extends FetchFromIndexStep.cartesianProduct() to also match arrays and wrap them using IterableObjectArray. A comprehensive set of regression tests has been added in InParamIndexExpansionTest. The review feedback suggests simplifying the conditional check in FetchFromIndexStep.java by hoisting the null check for value to the beginning of the expression.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

final Expression nextElementInKey = key.getExpressions().getFirst();
final Object value = nextElementInKey.execute(new ResultInternal(context.getDatabase()), context);
if (value instanceof Iterable<?> iterable && !(value instanceof Identifiable)) {
if (!(value instanceof Identifiable) && (value instanceof Iterable<?> || (value != null && value.getClass().isArray()))) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The null check for value can be hoisted to the beginning of the condition. This simplifies the expression by removing the nested value != null check and makes it clearer that the subsequent checks are only performed on non-null values.

Suggested change
if (!(value instanceof Identifiable) && (value instanceof Iterable<?> || (value != null && value.getClass().isArray()))) {
if (value != null && !(value instanceof Identifiable) && (value instanceof Iterable<?> || value.getClass().isArray())) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 342fd2b. The condition now uses MultiValue.isMultiValue(value) (per the parallel review), which is internally null-safe (isMultiValue(null) returns false), so the explicit value != null guard is no longer needed at all.

- Replace IterableObjectArray with the already-imported MultiValue
  utility (isMultiValue + getMultiValueIterable) in cartesianProduct,
  unifying multi-value detection with the sibling processInCondition
  path. MultiValue.isMultiValue(null) is internally null-safe so no
  explicit null guard is needed.
- Add int[] and double[] primitive-array regression cases (double
  coerces to the INTEGER index key).
- Remove issue-reference comments and em-dashes from test method bodies
  per project conventions.
@robfrank

robfrank commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed in 342fd2b:

  • MultiValue reuse: switched cartesianProduct() to MultiValue.isMultiValue(value) + MultiValue.getMultiValueIterable(value), dropping the IterableObjectArray wrapper. This unifies multi-value detection with the sibling processInCondition() path in the same file. Verified safe against the embedded-list, list-by-item, map, and composite index suites (348 index/SQL tests green).
  • int[]/double[] coverage: added namedPrimitiveIntArrayParamInWithIndexReturnsRows and namedPrimitiveDoubleArrayParamInWithIndexReturnsRows (double coerces correctly to the INTEGER index key). The class is now 11 methods.
  • Test comment style: removed the issue-number references (and em-dashes) from method bodies; method names carry the intent.

On the docs file: keeping it. docs/<issue>-<name>.md is an established, committed convention in this repo - there are 38 such files already (e.g. docs/4337-incondition-subquery-scalar-unwrap.md, docs/4364-grpc-recordnotfoundexception.md), and it is produced by the resolve-issue workflow that opened this PR. The observation that the convention does not exist isn't accurate.

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes a genuine bug: WHERE field IN :param with a primitive array parameter (e.g. long[]) would silently return no rows when the field had an index, because FetchFromIndexStep.cartesianProduct() only checked instanceof Iterable<?> - which Java primitive arrays do not satisfy. The fix replaces that check with the existing MultiValue utility that already handles every array and collection type.


The Core Fix

Correctness: solid.

// Before
if (value instanceof Iterable<?> iterable && !(value instanceof Identifiable)) {
    for (final Object elemInKey : iterable) {

// After
if (!(value instanceof Identifiable) && MultiValue.isMultiValue(value)) {
    for (final Object elemInKey : MultiValue.getMultiValueIterable(value)) {
  • MultiValue.isMultiValue(null) returns false (line 74: return iObject != null && ...), so no separate null guard is needed.
  • MultiValue.isMultiValue already excludes Identifiable from the Iterable branch (line 62), making the explicit !(value instanceof Identifiable) guard partially redundant - but it is still correct and acts as an extra safety net if Identifiable arrays ever appear.
  • The fix makes cartesianProduct() consistent with processInCondition() at line 258, which already used the same MultiValue utilities. This symmetry is the right call.

One implicit behavior change worth noting: MultiValue.isMultiValue returns true for Map values, whereas a Map is not Iterable<?> in Java and would have fallen through to the else branch before. With the new code a Map parameter is expanded using map.values(). This is consistent with how processInCondition handles maps, but it is an unintentional behavioral extension that is not tested. It is unlikely to matter in practice (passing a Map to IN is unusual), but is worth a comment or a TODO.


Test Coverage

The regression suite is thorough:

  • Covers all four primitive array shapes produced by toMap(true): long[], int[], double[], Object[]
  • Covers the List<Long> path (was already working, acts as baseline)
  • Positional parameter (IN ?) covered
  • EXPLAIN test confirms the index is actually used
  • Control test ((code + 0) IN :codes) confirms the non-indexed path is unaffected
  • Partial-match and single-element edge cases included

One gap: no test for a Map-typed parameter, which the new code now handles differently (expanded by values). Not a blocker since this was broken-or-absent before, but worth adding if the intended behavior for Maps is defined.


Minor Points

  1. Class-level Javadoc - the multi-paragraph Javadoc on InParamIndexExpansionTest goes beyond project style (CLAUDE.md: "never write multi-paragraph docstrings"). The content is useful context, but consider trimming it to a single sentence referencing the issue.

  2. The docs/4468-*.md file - given the established convention in this repo (several similar tracking files exist), keeping it is reasonable. The "Review cycles" subsection that logs past reviewer interactions is not useful long-term documentation and could be removed.


Summary

This is a clean, minimal, well-targeted bug fix. The two-line production change is correct, null-safe, and makes cartesianProduct() consistent with the sibling processInCondition() method in the same file. Test coverage is comprehensive. The only actionable suggestion is to add a test for the newly-enabled Map-typed parameter path to document its behavior.

Approved with minor suggestions.

Add a behavioral comment clarifying that the MultiValue-based key
expansion intentionally covers every parameter shape (collections,
primitive arrays, maps), consistent with processInCondition().
@robfrank

robfrank commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed in ac59992:

  • Map behavioral extension: added a behavioral comment above the condition noting that the MultiValue expansion intentionally covers every parameter shape (collections, primitive arrays, maps), consistent with processInCondition(). I did not add a Map-typed IN test: expanding a Map by its values is an incidental consequence of reusing MultiValue, not a defined feature, and a test would lock that behavior in as a contract. The fix targets the array/collection shapes that JSON parameter deserialization actually produces.

Two suggestions I'm not applying, with reasons:

  • Class Javadoc trim: the cited rule "never write multi-paragraph docstrings" is not present in CLAUDE.md. The Javadoc documents a non-obvious root cause (primitive arrays not being Iterable), which is worth keeping.
  • Remove the "Review cycles" section: this section is an established convention - 15+ committed tracking docs carry it (e.g. docs/4274-cluster-topology-consistency.md, docs/4331-wal-version-gap-recovery.md). Keeping it for consistency.

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown

Code Review

This is a clean, well-targeted fix. The root cause analysis is accurate, and the implementation is consistent with the existing codebase.

Fix (FetchFromIndexStep.java)

The change is correct. Switching from an instanceof Iterable<?> predicate to MultiValue.isMultiValue() + MultiValue.getMultiValueIterable() is the right call:

  • Null safety confirmed - isMultiValue(null) returns false (line 74 of MultiValue.java), so getMultiValueIterable is never reached with a null value. The guard is sound without an explicit null check.
  • Consistency - both multi-value detection sites in the same method (cartesianProduct and processInCondition) now share the same code path.
  • The behavioral comment is justified - the expansion of Map parameters via map.values() (consistent with processInCondition) is a subtle side-effect worth documenting. The decision not to lock it into a test contract is reasonable.

One minor edge case worth noting (not blocking): empty collection parameter - if codes = [] is passed, isMultiValue returns true, the for-loop body never executes, and result is an empty list returned from cartesianProduct. The outer planner should handle that as "no rows" - but there is no test covering this. Not a blocker since an empty IN list should return no results and the behavior appears consistent, but it could be worth adding.

Tests (InParamIndexExpansionTest.java)

Good regression coverage. The 11 test methods cover the essential shapes (long[], int[], double[], Object[], List), both named and positional parameters, execution plan verification, partial matches, and the non-indexed control path. AssertJ and TestHelper usage matches project conventions.

Docs file

Confirmed - there are 45 docs/<issue>-*.md files in the repo following this same pattern. My earlier suggestion to remove it was incorrect; this is an established convention.


The fix is correct, minimal, and well-tested. No blocking issues.

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.48%. Comparing base (716cfa5) to head (ac59992).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4469      +/-   ##
============================================
+ Coverage     64.47%   64.48%   +0.01%     
- Complexity      463      466       +3     
============================================
  Files          1650     1650              
  Lines        128545   128545              
  Branches      27571    27571              
============================================
+ Hits          82874    82897      +23     
+ Misses        34045    34026      -19     
+ Partials      11626    11622       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@robfrank robfrank merged commit 69cc67a into main Jun 3, 2026
28 of 30 checks passed
@robfrank robfrank deleted the fix/4468-sql-in-param-index-expansion branch June 3, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL IN :param with a collection parameter returns no rows when an index is used

1 participant