ESQL: Improve field reference tracking in `FORK` command by kanoshiou · Pull Request #137678 · elastic/elasticsearch

kanoshiou · 2025-11-06T13:23:22Z

This change fixes the field resolution inefficiency in FORK commands where queries like

FROM employees | fork (eval x = 1 | keep x) (eval y = 2 | keep y) (eval z = 3 | keep z)

were unnecessarily requesting all fields (*) from field_caps instead of only the minimal required fields.

The fix introduces branch-specific KEEP reference tracking through currentBranchKeepRefs and replaces the overly conservative empty reference check with nuanced logic that properly distinguishes between branches that genuinely need all fields (like filter-only branches) versus those with explicit field constraints, resulting in more efficient field resolution and reduced overhead during the logical plan optimization step.

Closes #137283

- Enhance field reference tracking for FORK command branches - Add branch-specific keep references tracking with `currentBranchKeepRefs` - Refine logic for determining when to project all fields in FORK branches - Update field collection strategy to handle more complex query scenarios - Modify test cases to validate new field reference tracking behavior The changes improve the field reference collection mechanism in ESQL's FORK command, providing more precise field selection and projection logic across different branch scenarios.

kanoshiou · 2025-11-06T13:29:28Z

The following tests in FieldNameUtilsTests are failing. I believe the test expectations are incorrect — they should be requesting all fields in the field-caps call.

testFuseWithStats
testForkBeforeStats
testForkBranchWithKeep
testForkBeforeStatsWithWhere
testForkFieldsWithStatsInOneBranch
testForkFieldsWithEnrichAndLookupJoins

elasticsearchmachine · 2025-11-06T13:51:54Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-12-15T13:13:53Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

fang-xing-esql

Thanks for the PR, @kanoshiou! Looking at the changes in the existing tests, I noticed some FORK tests changed to ALL_FIELDS. Is there a specific reason why they need to be ALL_FIELDS? The old(existing) behavior of FORK looks fine to me, but let me know if I'm missing something.

The original issue was created for subqueries, so I didn't expect the behavior changes to FORK unless there is a good reason.

Also a heaps up to @ioanatia, to confirm whether the changes related to FORK are fine.

fang-xing-esql · 2025-12-19T13:47:12Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/session/FieldNameUtilsTests.java

                   (STATS x = count(*), y=min(z))
            | WHERE x > y
-            """, Set.of("_index", "x", "y", "a", "c", "z", "y.*", "x.*", "z.*", "a.*", "c.*"));
+            """, ALL_FIELDS);


A few of the FORK related tests change to ALL_FIELDS, I wonder why they should be ALL_FIELDS?

kanoshiou · 2025-12-20T07:12:33Z

Thanks for the review, @fang-xing-esql!

I believe this is a bug in fork. In the following query:

FROM employees 
| FORK ( WHERE first_name == "Georgi" | KEEP first_name ) 
       ( WHERE first_name != "Georgi" )

it should return all fields, but currently only returns first_name and _fork.

I think this could be a good opportunity to fix the fork bug together with the Subquery issue discussed here.

ioanatia

thank you for looking into this.
while I agree this fixes a bug we currently have, @fang-xing-esql correctly noted that we might be returning more fields than we need
I left a suggestion for a change.
In the long term, I think FieldNameUtils would benefit from a refactor, as I myself find it hard to follow up what is going on.

ioanatia · 2026-01-09T13:20:20Z

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/session/FieldNameUtilsTests.java

            ( EVAL x = "abc" | EVAL y = "aaa" )
            | STATS a = count(*) WHERE _fork == "fork1",
-            b = max(_fork)""", Set.of("_index", "first_name", "emp_no", "last_name", "last_name.*", "first_name.*", "emp_no.*"));
+            b = max(_fork)""", ALL_FIELDS);


because we have a STATS after FORK, we don't need to return all fields.

ioanatia · 2026-01-09T13:30:25Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/FieldNameUtils.java

+                    // - "fork (eval x = 1 | keep x) (eval y = 2 | keep y)" → specific fields only (both branches have KEEP)
+                    if (currentBranchKeepRefs.get().isEmpty()
+                        && (referencesBuilder.get().isEmpty()
+                            || false == forkBranch.anyMatch(forkPlan -> shouldCollectReferencedFields(forkPlan, inlinestatsAggs)))) {


I think what we are missing here is another check to see if there are any plans after FORK that reduce the columns to a known set - such as Project or Aggregate.
If we have from test | fork (eval x = 1 | keep x) (where true) | stats c = count(*), with the approach from this PR we will be requesting all the fields.

we could have something like:

Holder<Boolean> reduceColumnsAfterFork = new Holder<>(false); forEachDownProcessor.set((LogicalPlan p, Holder<Boolean> breakEarly) -> {// go over each plan top-down if (lastSeenFork.get() == null && shouldCollectReferencedFields(p, inlinestatsAggs)) { reduceColumnsAfterFork.set(true); } if (p instanceof Fork fork) { ...

then this check could be:

if (currentBranchKeepRefs.get().isEmpty() && (referencesBuilder.get().isEmpty() || false == forkBranch.anyMatch(forkPlan -> shouldCollectReferencedFields(forkPlan, inlinestatsAggs))) && false == reduceColumnsAfterFork.get()) {

but we would also need to make sure we set lastSeenFork earlier, not after we collected the references from the FORK branches.

Add a `reduceColumnsAfterFork` flag to track if there are plans after FORK that reduce columns to a known set (e.g., Project, Aggregate). Previously, for queries like: from test | fork (eval x = 1 | keep x) (where true) | stats c = count(*) The second branch would trigger projectAll because it has no KEEP command, causing all fields to be requested from field_caps. With this change, we now check if there's a column-reducing plan after FORK before deciding to request all fields. If such a plan exists (like `stats`), we only request the necessary fields.

kanoshiou · 2026-01-10T11:40:26Z

Thank you @ioanatia, for the review and the detailed guidance!

I've implemented the reduceColumnsAfterFork check as you described. Now queries with column-reducing plans after FORK will correctly avoid requesting all fields.

Updated and ready for another look!

ioanatia · 2026-01-12T13:02:12Z

@elasticsearchmachine test this please

ioanatia · 2026-01-12T13:36:29Z

@elasticsearchmachine test this please

kanoshiou · 2026-01-12T14:57:57Z

Hi @ioanatia,

The tests pass locally for me. It might be an issue specific to the CI runner.

kanoshiou · 2026-01-13T09:38:18Z

Thanks a lot for your efforts on this PR, @ioanatia @fang-xing-esql! 😊

)

elasticsearchmachine added v9.3.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 6, 2025

kanoshiou added 2 commits November 6, 2025 21:31

Update tests

b4b76c0

Update docs/changelog/137678.yaml

bba903a

PeteGillinElastic added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Nov 6, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 6, 2025

Merge branch 'main' into fork-branch-keep-refs-tracking

04024c9

alex-spies added the :Search Relevance/ES|QL Search functionality in ES|QL label Dec 15, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Dec 15, 2025

elasticsearchmachine added v9.4.0 and removed v9.3.0 labels Dec 17, 2025

alex-spies assigned fang-xing-esql Dec 19, 2025

ncordon assigned ioanatia Dec 19, 2025

ncordon requested review from fang-xing-esql and ioanatia December 19, 2025 11:56

fang-xing-esql reviewed Dec 19, 2025

View reviewed changes

ioanatia reviewed Jan 9, 2026

View reviewed changes

kanoshiou added 2 commits January 10, 2026 15:50

Merge branch 'refs/heads/main' into fork-branch-keep-refs-tracking

092ca7f

ioanatia added the >non-issue label Jan 12, 2026

[CI] Auto commit changes from spotless

c2978d2

ioanatia approved these changes Jan 12, 2026

View reviewed changes

ioanatia merged commit 9938ba7 into elastic:main Jan 13, 2026
36 checks passed

eranweiss-elastic pushed a commit to eranweiss-elastic/elasticsearch that referenced this pull request Jan 15, 2026

ESQL: Improve field reference tracking in FORK command (elastic#137678

466961c

)

spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Jan 21, 2026

ESQL: Improve field reference tracking in FORK command (elastic#137678

5f123e9

)

Conversation

kanoshiou commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kanoshiou commented Nov 6, 2025

Uh oh!

elasticsearchmachine commented Nov 6, 2025

Uh oh!

elasticsearchmachine commented Dec 15, 2025

Uh oh!

fang-xing-esql left a comment

Choose a reason for hiding this comment

Uh oh!

fang-xing-esql Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

kanoshiou commented Dec 20, 2025

Uh oh!

ioanatia left a comment

Choose a reason for hiding this comment

Uh oh!

ioanatia Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

ioanatia Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

kanoshiou commented Jan 10, 2026

Uh oh!

ioanatia commented Jan 12, 2026

Uh oh!

ioanatia commented Jan 12, 2026

Uh oh!

kanoshiou commented Jan 12, 2026

Uh oh!

Uh oh!

kanoshiou commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kanoshiou commented Nov 6, 2025 •

edited

Loading