Conversation
This implements `INLINESTATS`. Most of the heavy lifting is done by `LOOKUP`, with this change mostly adding a new abstraction to logical plans, and interface I'm calling `Phased`. Implementing this interface allows a logical plan node to cut the query into phases. `INLINESTATS` implements it by asking for a "first phase" that's the same query, up to `INLINESTATS`, but with `INLINESTATS` replaced with `STATS`. The next phase replaces the `INLINESTATS` with a `LOOKUP` on the results of the first phase. So, this query: ``` FROM foo | EVAL bar = a * b | INLINESTATS m = MAX(bar) BY b | WHERE m = bar | LIMIT 1 ``` gets split into ``` FROM foo | EVAL bar = a * b | STATS m = MAX(bar) BY b ``` followed by ``` FROM foo | EVAL bar = a * b | LOOKUP (results of m = MAX(bar) BY b) ON b | WHERE m = bar | LIMIT 1 ```
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/InlineStats.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java
Outdated
Show resolved
Hide resolved
|
This is riddled with NOCOMMITs, as any good prototype deserves to be. It's also not clear that this is the right way to go about this. I mean, cutting the request into two is the right way to do it, I think. The reworking of the callbacks to make that possible seems like the sane way to do that. But from a planning side, this is an "interesting" choice. We analyze the |
|
I've extracted #110445 out of this one so I can get it and not have to deal with merge conflicts. |
|
There's a fun bug that I'm going to probably leave for a followup: Hits the rule execution limit. It started from a typo - I meant to do |
Pushed a link. |
| shadowingLimit0 | ||
| required_capability: inlinestats | ||
|
|
||
| ROW left = "left", client_ip = "172.21.0.5", env = "env", right = "right" | ||
| | INLINESTATS env=VALUES(right) BY client_ip | ||
| | LIMIT 0 | ||
| ; |
There was a problem hiding this comment.
I don't think we need the limit0 tests added here; the limit0 tests are only present in enrich.csv-spec so we can run at least the logical optimizer against tests that otherwise would need enrich_load.
| ROW city = "Zürich" | ||
| | INLINESTATS x=VALUES(city), x=VALUES(city) |
There was a problem hiding this comment.
Let's also add a shadowingSelf test!
| INLINESTATS city = COUNT(city)
There was a problem hiding this comment.
Oh. That.... Probably isn't going to work. I guess we do want it to work....
There was a problem hiding this comment.
That'd be required to be consistent with enrich, eval, dissect and grok. I think we can add this to the list of follow ups, as this probably requires work on LOOKUP resp. JOIN.
| Object value = BlockUtils.toJavaObject(p.getBlock(i), 0); | ||
| values.add(new Alias(source(), s.name(), null, new Literal(source(), value, s.dataType()), aggregates.get(i).id())); | ||
| } | ||
| return new Eval(source(), child(), values); |
There was a problem hiding this comment.
Yeahp, this should work and is conceptually a bit nicer (IMHO) than doing that down in the physical mapping!
| } | ||
|
|
||
| @Override | ||
| public LogicalPlan nextPhase(List<Attribute> schema, List<Page> firstPhaseResult) { |
There was a problem hiding this comment.
I think using the schema from the first phase is reasonable, but throwing an IllegalArgumentException if the schema doesn't line up with what we expected will make our lives much easier, esp. if this should blow up in production.
| * <p>If there are multiple {@linkplain Phased} nodes in the plan we always | ||
| * operate on the lowest one first, counting from the data source "upwards". |
|
I've put this behind a feature flag and removed There are a bunch of extra follow ups. I had a conversation with @costin about moving around how we trigger the Phased nature. The plan now is to merge this as is and rework some things. I'm going to push a few more tests and see if we can land this. Folks can experiment with it some more while we finish up. |
| } | ||
|
|
||
| @Override | ||
| public LogicalPlan nextPhase(List<Attribute> schema, List<Page> firstPhaseResult) { |
There was a problem hiding this comment.
Thanks Nik, this is much better now!
I have one last nit: the check is slightly insufficient because
firstPhase().output().equals(schema) == false
will not look at the name ids (they are not checked in NamedExpression.equals(), nor in NamedExpression's descendants). The plan may become inconsistent if the name ids do not line up, even if the first phase produces the correct names and data types. I think what we need is a little helper that we should call here.
public static equalsAndSemanticEquals(List<Attribute> left, List<Attribute> right) {
if (left.equals(right) == false) {
return false;
}
for (int i = 0; i < left.size(); i++) {
if (left.get(i).semanticEquals(right.get(i)) == false) {
return false;
}
}
return true;
}
We could put that into the Expressions class (not Expression).
Either this, or we ignore name ids from the first phase: in ungroupedNextPhase we already do this, because we obtain the name ids from the aggregates. In groupedNextPhase, however, we put the schema's attributes directly as the attributes of the local relation.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlPlugin.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java
Show resolved
Hide resolved
|
Release tests look unrelated. We'll fight with them as fight comes up. |
* main: (39 commits) Update README.asciidoc (elastic#111244) ESQL: INLINESTATS (elastic#109583) ESQL: Document a little of `DataType` (elastic#111250) Relax assertions in segment level field stats (elastic#111243) LogsDB data generator - support nested object field (elastic#111206) Validate `Authorization` header in Azure test fixture (elastic#111242) Fixing HistoryStoreTests.testPut() and testStoreWithHideSecrets() (elastic#111246) [ESQL] Remove Named Expcted Types map from testing infrastructure (elastic#111213) Change visibility of createWriter to allow tests from a different package to override it (elastic#111234) [ES|QL] Remove EsqlDataTypes (elastic#111089) Mute org.elasticsearch.repositories.azure.AzureBlobContainerRetriesTests testReadNonexistentBlobThrowsNoSuchFileException elastic#111233 Abstract codec lookup by name, to make CodecService extensible (elastic#111007) Add HTTPS support to `AzureHttpFixture` (elastic#111228) Unmuting tests related to free_context action being processed in ESSingleNodeTestCase (elastic#111224) Upgrade Azure SDK (elastic#111225) Collapse transport versions for 8.14.0 (elastic#111199) Make sure contender uses logs templates (elastic#111183) unmute HistogramPercentileAggregationTests.testBoxplotHistogram (elastic#111223) Refactor Quality Assurance test infrastructure (elastic#111195) Mute org.elasticsearch.xpack.restart.FullClusterRestartIT testDisableFieldNameField {cluster=UPGRADED} elastic#111222 ... # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
|
FTR, I've raised an meta ticket to track my progress at #112266 |
|
@costin would it be possible to update this blog https://www.elastic.co/search-labs/blog/esql-piped-query-language-goes-ga to note that |
|
I'm removing the |

This implements
INLINESTATS. Most of the heavy lifting is done byLOOKUP, with this change mostly adding a new abstraction to logical plans, an interface I'm callingPhased. Implementing this interface allows a logical plan node to cut the query into phases.INLINESTATSimplements it by asking for a "first phase" that's the same query, up toINLINESTATS, but withINLINESTATSreplaced withSTATS. The next phase replaces theINLINESTATSwith a hash join on the results of the first phase.So, this query:
gets split into
followed by
Here's an example of the syntax:
Closes #107589