ES|QL: Push down TopN into Fork branches#139605
Conversation
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
carlosdelest
left a comment
There was a problem hiding this comment.
Looks great!
A couple of questions, but nothing that prevents merging 👍
...g/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownLimitAndOrderByIntoForkTests.java
Show resolved
Hide resolved
| return outputMap; | ||
| } | ||
|
|
||
| private boolean shouldPushDownIntoForkBranch(LogicalPlan plan) { |
There was a problem hiding this comment.
Should we push down if we SORT on a column that the fork branch does not produce?
There was a problem hiding this comment.
we don't push down in this case, because right now we are only pushing down when SORT + LIMIT immediately follow a FORK. We should only be in this case when we sort on attributes produced by FORK.
What I can do for good measure is too add an assertion to explicitly check that when we push down in maybePushDownLimitAndOrderByToForkBranch
...g/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownLimitAndOrderByIntoForkTests.java
Show resolved
Hide resolved
fang-xing-esql
left a comment
There was a problem hiding this comment.
LGTM from functional perspective, thanks @ioanatia !
There is one performance related uncertainty comes to my mind. sort can be expensive, and pushing down sort into each branch implies we could potentially execute an expensive sort in parallel across multiple branches, and increase the total cost. On the other hand, limit is pushed down as well, each branch may return fewer rows potentially, which could help performance potentially, and offset the extra sorting cost. We will likely need performance benchmarks to validate the trade offs. This is behind a pragma, I think it is good for now.
We only push down if there's no pipeline breaker in the FORK branch - we will need some performance benchmarks for sure.
again - we would need some benchmarks to see the real benefits |
Related: #136820
Pushes OrderBy + Limit into FORK branches, when the FORK branch queries an index (EsRelation, not LocalRelation is present) and does not contain a pipeline breaker (and its output is unbounded).
This optimization will help when we remove the implicit LIMIT for FORK.
As a conceptual example:
becomes
Because right now an implicit limit is always added, this optimization does not have an effect.
In order to properly test it, we introduce a query pragma that tells the analyzer to not add the implicit limit.
Query pragmas are not user facing and are mostly used for internal testing.
Additional work is needed, in the following examples, TopN is not being pushed down: