ES|QL: Optimizations for when we remove implicit LIMIT for FORK

tracked in https://github.com/elastic/elasticsearch/issues/121652

We currently add an implicit LIMIT to each FORK branch which can be surprising.

Consider the following query, that will return a COUNT up to 2000 because of the implicit limit:

```
FROM my-index
| FORK (WHERE error_code == 200) (WHERE error_code == 400)
| STATS COUNT(*)

// this is not equivalent to:
FROM my-index | WHERE error_code == 200 OR error_code == 400 | STATS COUNT(*)

```

We can change this behaviour if we want and not have an implicit limit for FORK.

As a prerequisite, we need to be able to push every pipeline breaker to the FORK branches:
- [ ] STATS - we need to push an intermediary STATS to the FORK branches
`FROM my-index | FORK (...) (...) | STATS ...` becomes
`FROM my-index | FORK (... | STATS ...) (... | STATS ...) | STATS ...`. 
- [x] LIMIT
`FROM my-index | FORK (...) (...) | LIMIT 100` becomes
`FROM my-index | FORK (... | LIMIT 100) (... | LIMIT 100) | LIMIT 100`
  - https://github.com/elastic/elasticsearch/pull/139443
- [x] SORT + LIMIT 
 `FROM my-index | FORK (...) (...) | SORT ... | LIMIT 100` becomes
 `FROM my-index | FORK (... | SORT ...| LIMIT 100) (... | SORT ... | LIMIT 100) | SORT ... | LIMIT 100`
    - https://github.com/elastic/elasticsearch/pull/139605


There will be cases where we will need to push down more than just pipeline breakers:
```
FROM my-index
| FORK (WHERE error_code == 200) (WHERE error_code == 400)
| WHERE url_agent LIKE "Safari"
| STATS COUNT(*)
```
In this case we will need to transform to something similar to this - where the STATS commands in the FORK branches are intermediary STATS (the same way STATS is broken into intermediary and final STATS on the data nodes/coordinator):
```
FROM my-index
| FORK (WHERE error_code == 200 | WHERE url_agent LIKE "Safari" | STATS COUNT(*))
            (WHERE error_code == 400 |  WHERE url_agent LIKE "Safari") | STATS COUNT(*))
| STATS COUNT(*)
```

After we ensure that each pipeline breaker is pushed down to the FORK branches, we should be able to *just* remove the implicit LIMIT we add in the Analyzer.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES|QL: Optimizations for when we remove implicit LIMIT for FORK #136820

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ES|QL: Optimizations for when we remove implicit LIMIT for FORK #136820

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions