Skip to content

ES|QL: Optimizations for when we remove implicit LIMIT for FORK #136820

@ioanatia

Description

@ioanatia

tracked in #121652

We currently add an implicit LIMIT to each FORK branch which can be surprising.

Consider the following query, that will return a COUNT up to 2000 because of the implicit limit:

FROM my-index
| FORK (WHERE error_code == 200) (WHERE error_code == 400)
| STATS COUNT(*)

// this is not equivalent to:
FROM my-index | WHERE error_code == 200 OR error_code == 400 | STATS COUNT(*)

We can change this behaviour if we want and not have an implicit limit for FORK.

As a prerequisite, we need to be able to push every pipeline breaker to the FORK branches:

  • STATS - we need to push an intermediary STATS to the FORK branches
    FROM my-index | FORK (...) (...) | STATS ... becomes
    FROM my-index | FORK (... | STATS ...) (... | STATS ...) | STATS ....
  • LIMIT
    FROM my-index | FORK (...) (...) | LIMIT 100 becomes
    FROM my-index | FORK (... | LIMIT 100) (... | LIMIT 100) | LIMIT 100
  • SORT + LIMIT
    FROM my-index | FORK (...) (...) | SORT ... | LIMIT 100 becomes
    FROM my-index | FORK (... | SORT ...| LIMIT 100) (... | SORT ... | LIMIT 100) | SORT ... | LIMIT 100

There will be cases where we will need to push down more than just pipeline breakers:

FROM my-index
| FORK (WHERE error_code == 200) (WHERE error_code == 400)
| WHERE url_agent LIKE "Safari"
| STATS COUNT(*)

In this case we will need to transform to something similar to this - where the STATS commands in the FORK branches are intermediary STATS (the same way STATS is broken into intermediary and final STATS on the data nodes/coordinator):

FROM my-index
| FORK (WHERE error_code == 200 | WHERE url_agent LIKE "Safari" | STATS COUNT(*))
            (WHERE error_code == 400 |  WHERE url_agent LIKE "Safari") | STATS COUNT(*))
| STATS COUNT(*)

After we ensure that each pipeline breaker is pushed down to the FORK branches, we should be able to just remove the implicit LIMIT we add in the Analyzer.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions