tracked in #121652
We currently add an implicit LIMIT to each FORK branch which can be surprising.
Consider the following query, that will return a COUNT up to 2000 because of the implicit limit:
FROM my-index
| FORK (WHERE error_code == 200) (WHERE error_code == 400)
| STATS COUNT(*)
// this is not equivalent to:
FROM my-index | WHERE error_code == 200 OR error_code == 400 | STATS COUNT(*)
We can change this behaviour if we want and not have an implicit limit for FORK.
As a prerequisite, we need to be able to push every pipeline breaker to the FORK branches:
There will be cases where we will need to push down more than just pipeline breakers:
FROM my-index
| FORK (WHERE error_code == 200) (WHERE error_code == 400)
| WHERE url_agent LIKE "Safari"
| STATS COUNT(*)
In this case we will need to transform to something similar to this - where the STATS commands in the FORK branches are intermediary STATS (the same way STATS is broken into intermediary and final STATS on the data nodes/coordinator):
FROM my-index
| FORK (WHERE error_code == 200 | WHERE url_agent LIKE "Safari" | STATS COUNT(*))
(WHERE error_code == 400 | WHERE url_agent LIKE "Safari") | STATS COUNT(*))
| STATS COUNT(*)
After we ensure that each pipeline breaker is pushed down to the FORK branches, we should be able to just remove the implicit LIMIT we add in the Analyzer.
tracked in #121652
We currently add an implicit LIMIT to each FORK branch which can be surprising.
Consider the following query, that will return a COUNT up to 2000 because of the implicit limit:
We can change this behaviour if we want and not have an implicit limit for FORK.
As a prerequisite, we need to be able to push every pipeline breaker to the FORK branches:
FROM my-index | FORK (...) (...) | STATS ...becomesFROM my-index | FORK (... | STATS ...) (... | STATS ...) | STATS ....FROM my-index | FORK (...) (...) | LIMIT 100becomesFROM my-index | FORK (... | LIMIT 100) (... | LIMIT 100) | LIMIT 100FROM my-index | FORK (...) (...) | SORT ... | LIMIT 100becomesFROM my-index | FORK (... | SORT ...| LIMIT 100) (... | SORT ... | LIMIT 100) | SORT ... | LIMIT 100There will be cases where we will need to push down more than just pipeline breakers:
In this case we will need to transform to something similar to this - where the STATS commands in the FORK branches are intermediary STATS (the same way STATS is broken into intermediary and final STATS on the data nodes/coordinator):
After we ensure that each pipeline breaker is pushed down to the FORK branches, we should be able to just remove the implicit LIMIT we add in the Analyzer.