Skip to content

ES|QL: Push down TopN into Fork branches#139605

Merged
ioanatia merged 16 commits intoelastic:mainfrom
ioanatia:fork_order_by_limit_pushdown
Jan 20, 2026
Merged

ES|QL: Push down TopN into Fork branches#139605
ioanatia merged 16 commits intoelastic:mainfrom
ioanatia:fork_order_by_limit_pushdown

Conversation

@ioanatia
Copy link
Copy Markdown
Member

@ioanatia ioanatia commented Dec 16, 2025

Related: #136820

Pushes OrderBy + Limit into FORK branches, when the FORK branch queries an index (EsRelation, not LocalRelation is present) and does not contain a pipeline breaker (and its output is unbounded).
This optimization will help when we remove the implicit LIMIT for FORK.

As a conceptual example:

FROM my-index
| FORK
   (WHERE x > 1000)
   (WHERE y > 1000)
| SORT z
| LIMIT 10              

becomes

FROM my-index
| FORK
          (WHERE x > 1000 | SORT z | LIMIT 10)
          (WHERE y > 1000 | SORT z | LIMIT 10)
| SORT z
| LIMIT 10      

Because right now an implicit limit is always added, this optimization does not have an effect.
In order to properly test it, we introduce a query pragma that tells the analyzer to not add the implicit limit.
Query pragmas are not user facing and are mostly used for internal testing.

Additional work is needed, in the following examples, TopN is not being pushed down:

FROM my-index
| FORK
  (WHERE x > 1000)
  (WHERE y > 1000)
| MV_EXPAND w
| SORT z
| LIMIT 10    


FROM my-index
| FORK
  (WHERE x > 1000)
  (WHERE y > 1000)
| EVAL a = a + 1
| SORT z
| LIMIT 10    

@ioanatia ioanatia added >non-issue Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/ES|QL Search functionality in ES|QL labels Dec 16, 2025
@ioanatia ioanatia marked this pull request as ready for review January 15, 2026 16:06
@ioanatia ioanatia requested a review from carlosdelest January 15, 2026 16:07
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Copy Markdown
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

A couple of questions, but nothing that prevents merging 👍

return outputMap;
}

private boolean shouldPushDownIntoForkBranch(LogicalPlan plan) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we push down if we SORT on a column that the fork branch does not produce?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't push down in this case, because right now we are only pushing down when SORT + LIMIT immediately follow a FORK. We should only be in this case when we sort on attributes produced by FORK.
What I can do for good measure is too add an assertion to explicitly check that when we push down in maybePushDownLimitAndOrderByToForkBranch

Copy link
Copy Markdown
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from functional perspective, thanks @ioanatia !

There is one performance related uncertainty comes to my mind. sort can be expensive, and pushing down sort into each branch implies we could potentially execute an expensive sort in parallel across multiple branches, and increase the total cost. On the other hand, limit is pushed down as well, each branch may return fewer rows potentially, which could help performance potentially, and offset the extra sorting cost. We will likely need performance benchmarks to validate the trade offs. This is behind a pragma, I think it is good for now.

@ioanatia
Copy link
Copy Markdown
Member Author

On the other hand, limit is pushed down as well, each branch may return fewer rows potentially, which could help performance potentially, and offset the extra sorting cost. We will likely need performance benchmarks to validate the trade offs. This is behind a pragma, I think it is good for now.

We only push down if there's no pipeline breaker in the FORK branch - we will need some performance benchmarks for sure.
There's some follow up improvements we could do here:

  • if we push down to all FORK branches, we know the pages outputted from FORK should always be sorted - so we could use a similar optimization to ES|QL: Sort faster (Optimize TopNOperator in final plans) #131221
  • when we push down to a FORK branch, we could skip the intermediary TopN that gets executed on the coordinator as part of the FORK branch, and just output the pages that are coming from the data nodes from the FORK branch. The main TopN that's present after FORK should already take in sorting those.

again - we would need some benchmarks to see the real benefits

@ioanatia ioanatia requested a review from carlosdelest January 20, 2026 15:00
@ioanatia ioanatia merged commit e91962d into elastic:main Jan 20, 2026
35 checks passed
@ioanatia ioanatia deleted the fork_order_by_limit_pushdown branch January 20, 2026 15:37
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/ES|QL Search functionality in ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants