For queries like ... | INLINESTATS total_count = COUNT(*) where the InlineJoin has no BY clause, the INLINEJOIN that's performed in the 2nd phase of execution is actually equivalent with an EVAL total_count = <some literal number>. In this case, it makes no sense to perform a proper join, and it also does not make sense to force the inline join onto the coordinator - said EVAL can easily be pushed down to data nodes, thus increasing parallelization.
The same is true for a more general INLINESTATS ... BY group_field1, group_field2 in the case when it turns out that there are only very few combinations of group_field1, group_field2, so that it would be better to perform the InlineJoin on the data nodes.
Let's attempt this optimization + have optimizer tests that document correct behavior here.
For queries like
... | INLINESTATS total_count = COUNT(*)where theInlineJoinhas noBYclause, theINLINEJOINthat's performed in the 2nd phase of execution is actually equivalent with anEVAL total_count = <some literal number>. In this case, it makes no sense to perform a proper join, and it also does not make sense to force the inline join onto the coordinator - saidEVALcan easily be pushed down to data nodes, thus increasing parallelization.The same is true for a more general
INLINESTATS ... BY group_field1, group_field2in the case when it turns out that there are only very few combinations ofgroup_field1, group_field2, so that it would be better to perform theInlineJoinon the data nodes.Let's attempt this optimization + have optimizer tests that document correct behavior here.