Skip to content

ESQL: Improve field reference tracking in FORK command#137678

Merged
ioanatia merged 7 commits intoelastic:mainfrom
kanoshiou:fork-branch-keep-refs-tracking
Jan 13, 2026
Merged

ESQL: Improve field reference tracking in FORK command#137678
ioanatia merged 7 commits intoelastic:mainfrom
kanoshiou:fork-branch-keep-refs-tracking

Conversation

@kanoshiou
Copy link
Copy Markdown
Contributor

@kanoshiou kanoshiou commented Nov 6, 2025

This change fixes the field resolution inefficiency in FORK commands where queries like

FROM employees | fork (eval x = 1 | keep x) (eval y = 2 | keep y) (eval z = 3 | keep z) 

were unnecessarily requesting all fields (*) from field_caps instead of only the minimal required fields.

The fix introduces branch-specific KEEP reference tracking through currentBranchKeepRefs and replaces the overly conservative empty reference check with nuanced logic that properly distinguishes between branches that genuinely need all fields (like filter-only branches) versus those with explicit field constraints, resulting in more efficient field resolution and reduced overhead during the logical plan optimization step.

Closes #137283

- Enhance field reference tracking for FORK command branches
- Add branch-specific keep references tracking with `currentBranchKeepRefs`
- Refine logic for determining when to project all fields in FORK branches
- Update field collection strategy to handle more complex query scenarios
- Modify test cases to validate new field reference tracking behavior
The changes improve the field reference collection mechanism in ESQL's FORK command, providing more precise field selection and projection logic across different branch scenarios.
@elasticsearchmachine elasticsearchmachine added v9.3.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 6, 2025
@kanoshiou
Copy link
Copy Markdown
Contributor Author

The following tests in FieldNameUtilsTests are failing. I believe the test expectations are incorrect — they should be requesting all fields in the field-caps call.

  • testFuseWithStats
  • testForkBeforeStats
  • testForkBranchWithKeep
  • testForkBeforeStatsWithWhere
  • testForkFieldsWithStatsInOneBranch
  • testForkFieldsWithEnrichAndLookupJoins

@PeteGillinElastic PeteGillinElastic added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Nov 6, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 6, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@alex-spies alex-spies added the :Search Relevance/ES|QL Search functionality in ES|QL label Dec 15, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Dec 15, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Copy Markdown
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @kanoshiou! Looking at the changes in the existing tests, I noticed some FORK tests changed to ALL_FIELDS. Is there a specific reason why they need to be ALL_FIELDS? The old(existing) behavior of FORK looks fine to me, but let me know if I'm missing something.

The original issue was created for subqueries, so I didn't expect the behavior changes to FORK unless there is a good reason.

Also a heaps up to @ioanatia, to confirm whether the changes related to FORK are fine.

(STATS x = count(*), y=min(z))
| WHERE x > y
""", Set.of("_index", "x", "y", "a", "c", "z", "y.*", "x.*", "z.*", "a.*", "c.*"));
""", ALL_FIELDS);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of the FORK related tests change to ALL_FIELDS, I wonder why they should be ALL_FIELDS?

@kanoshiou
Copy link
Copy Markdown
Contributor Author

Thanks for the review, @fang-xing-esql!

I believe this is a bug in fork. In the following query:

FROM employees 
| FORK ( WHERE first_name == "Georgi" | KEEP first_name ) 
       ( WHERE first_name != "Georgi" )

it should return all fields, but currently only returns first_name and _fork.

I think this could be a good opportunity to fix the fork bug together with the Subquery issue discussed here.

Copy link
Copy Markdown
Member

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for looking into this.
while I agree this fixes a bug we currently have, @fang-xing-esql correctly noted that we might be returning more fields than we need
I left a suggestion for a change.
In the long term, I think FieldNameUtils would benefit from a refactor, as I myself find it hard to follow up what is going on.

( EVAL x = "abc" | EVAL y = "aaa" )
| STATS a = count(*) WHERE _fork == "fork1",
b = max(_fork)""", Set.of("_index", "first_name", "emp_no", "last_name", "last_name.*", "first_name.*", "emp_no.*"));
b = max(_fork)""", ALL_FIELDS);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we have a STATS after FORK, we don't need to return all fields.

// - "fork (eval x = 1 | keep x) (eval y = 2 | keep y)" → specific fields only (both branches have KEEP)
if (currentBranchKeepRefs.get().isEmpty()
&& (referencesBuilder.get().isEmpty()
|| false == forkBranch.anyMatch(forkPlan -> shouldCollectReferencedFields(forkPlan, inlinestatsAggs)))) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we are missing here is another check to see if there are any plans after FORK that reduce the columns to a known set - such as Project or Aggregate.
If we have from test | fork (eval x = 1 | keep x) (where true) | stats c = count(*), with the approach from this PR we will be requesting all the fields.

we could have something like:

        Holder<Boolean> reduceColumnsAfterFork = new Holder<>(false);

        forEachDownProcessor.set((LogicalPlan p, Holder<Boolean> breakEarly) -> {// go over each plan top-down
            if (lastSeenFork.get() == null && shouldCollectReferencedFields(p, inlinestatsAggs)) {
                reduceColumnsAfterFork.set(true);
            }

            if (p instanceof Fork fork) {
             ...

then this check could be:

                    if (currentBranchKeepRefs.get().isEmpty()
                        && (referencesBuilder.get().isEmpty()
                            || false == forkBranch.anyMatch(forkPlan -> shouldCollectReferencedFields(forkPlan, inlinestatsAggs)))
                        && false == reduceColumnsAfterFork.get()) {

but we would also need to make sure we set lastSeenFork earlier, not after we collected the references from the FORK branches.

Add a `reduceColumnsAfterFork` flag to track if there are plans after
FORK that reduce columns to a known set (e.g., Project, Aggregate).

Previously, for queries like:
  from test | fork (eval x = 1 | keep x) (where true) | stats c = count(*)

The second branch would trigger projectAll because it
has no KEEP command, causing all fields to be requested from field_caps.

With this change, we now check if there's a column-reducing plan after
FORK before deciding to request all fields. If such a plan exists
(like `stats`), we only request the necessary fields.
@kanoshiou
Copy link
Copy Markdown
Contributor Author

Thank you @ioanatia, for the review and the detailed guidance!

I've implemented the reduceColumnsAfterFork check as you described. Now queries with column-reducing plans after FORK will correctly avoid requesting all fields.

Updated and ready for another look!

@ioanatia
Copy link
Copy Markdown
Member

@elasticsearchmachine test this please

@ioanatia
Copy link
Copy Markdown
Member

@elasticsearchmachine test this please

@kanoshiou
Copy link
Copy Markdown
Contributor Author

Hi @ioanatia,

The tests pass locally for me. It might be an issue specific to the CI runner.

@ioanatia ioanatia merged commit 9938ba7 into elastic:main Jan 13, 2026
36 checks passed
@kanoshiou
Copy link
Copy Markdown
Contributor Author

Thanks a lot for your efforts on this PR, @ioanatia @fang-xing-esql! 😊

eranweiss-elastic pushed a commit to eranweiss-elastic/elasticsearch that referenced this pull request Jan 15, 2026
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :Search Relevance/ES|QL Search functionality in ES|QL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ES|QL] Improve FieldNameUtils.resolveFieldNames to identify subquery field names for field caps call, instead of using all fields *

6 participants