You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All sortable types. Eventually. Good enough to do just longs for a while.
The first parameter to either FIRST or LAST (i.e.: <field>) shall support the following types:
long / int / double
keyword / text
boolean
IP
If <field> is NULL at the highest sort key, do we return that, or the value at the next highest sort key?
Security wants the behavior above. But does everyone? No. Let's allow nulls.
What about ties in the sort key? The prototype presumes it's allowed to return any of the values with the lowest sort key.
Figure out what sort does and probably do the same thing.
If we push to Lucene AND there is one shard THEN we do break ties! We use the doc _id. Which isn't perfect, but it's something.
Otherwise it's not consistent which one you get.
We might be ok if the sort isn't stable.
We could use the VALUE as a tie breaker.
For now, I'm going to return any value.
What about a multivalued sort key?
Should first use the minimum value and last use the maximum value? That's what the prototype does and it feels quite natural. It's as though FIRST(<field>, <@timestamp>) was implied to be FIRST(<field>, MV_MIN(<@timestamp>)).
We should double check what we do for sort and do that.
What if the value at the min sort key is multivalued? The prototype returns one of those values.
It's much less efficient if we have to return multivalued results. And the security folks know for lots of uses that the value is always single valued.
We could keep all of the speed the prototype has by specializing MV_FIRST(FIRST(<field>, <@timestamp>)) or FIRST(MV_FIRST(<field>), <@timestamp>).
We should return multivalued fields.
Open questions - Not Required for Tech Preview
Should @timestamp be implied if not provided? So you could call it with FIRST(<field>).
We think we should make an alias - EARLIEST/LATEST - that defaults to @timestamp.
Is FIRST and LAST the right name? Splunk has EARLIEST and LATEST for this concept. SQL has a FIRST_VALUE and LAST_VALUE but it's very window-function-y. And we're not window-function-y.
Leave the name FIRST/LAST and make a shortcut for EARLIEST/LATEST that operates on just @timestamp.
Tests we'll need
Huge number of multivalues in <field> and <@timestamp>
Parameter for the function name
The regular tests that usually ship with scalar functions
Unit tests for each type of aggregators, for both ungrouped and grouped cases. E.g.: FirstLongByTimestampAggregatorFunctionTests and grouping equivalent.
Things to circle back to after tech preview
Vector-only dispatch.
Special handling for block values that can be vectorized has been disabled in #138390; it was a no-op. At some point, we'd like to enable it back and actually have it implemented for performance reasons as the vector path won't have to check for nulls, or mult-ivalues.
Add support for date / data_nanos / float in the value parameter
Make aliases that default to @timestamp. Call them EARLIEST/LATEST.
Support aggregate functions -
FIRST(<field>, <@timestamp>)andLAST(<field>, <@timestamp>)Open questions - Available in Tech Preview
Do we cover sorting by things other than dates? [ES|QL] Selecting top count for an aggregation doesn't seem to work as expected #128612 + [ES|QL] Support first(), last() functions #108385 (comment)
longs for a while.FIRSTorLAST(i.e.:<field>) shall support the following types:long/int/doublekeyword/textbooleanIPIf
<field>isNULLat the highest sort key, do we return that, or the value at the next highest sort key?What about ties in the sort key? The prototype presumes it's allowed to return any of the values with the lowest sort key.
What about a multivalued sort key?
FIRST(<field>, <@timestamp>)was implied to beFIRST(<field>, MV_MIN(<@timestamp>)).What if the value at the min sort key is multivalued? The prototype returns one of those values.
MV_FIRST(FIRST(<field>, <@timestamp>))orFIRST(MV_FIRST(<field>), <@timestamp>).Open questions - Not Required for Tech Preview
@timestampbe implied if not provided? So you could call it withFIRST(<field>).@timestamp.FIRSTandLASTthe right name? Splunk hasEARLIESTandLATESTfor this concept. SQL has aFIRST_VALUEandLAST_VALUEbut it's very window-function-y. And we're not window-function-y.@timestamp.Tests we'll need
<field>and<@timestamp>FirstLongByTimestampAggregatorFunctionTestsand grouping equivalent.Things to circle back to after tech preview
Special handling for block values that can be vectorized has been disabled in #138390; it was a no-op. At some point, we'd like to enable it back and actually have it implemented for performance reasons as the vector path won't have to check for nulls, or mult-ivalues.
date/data_nanos/floatin the value parameter@timestamp. Call them EARLIEST/LATEST.