Skip to content

Channels issue when an agg is called with the same field #142180

@mouhc1ne

Description

@mouhc1ne

Description

Scope
This only affects agg functions that can take two fields, when both conditions below are met:

  1. The agg is passed the exact same field twice
  2. The field's type passes all type-safety checks, so the actual aggregator start crunching the blocks and pages.

This is an edge case that only FIRST/LAST can hit at the moment. As far as I can tell, no other agg functions can hit this scenario.

How to reproduce
Running the following query

ROW timestamp = TO_DATETIME("2026-01-22T12:34:53.713Z") | STATS FIRST(timestamp, timestamp)

results in

{
    "error": {
        "root_cause": [
            {
                "type": "array_index_out_of_bounds_exception",
                "reason": "Index 1 out of bounds for length 1"
            }
        ],
        "type": "array_index_out_of_bounds_exception",
        "reason": "Index 1 out of bounds for length 1"
    },
    "status": 500
}

with this stacktrace. (see the attacked file for the full logs)

[WARN ][08:20:11][rest.suppressed][runTask-0] path: /_query, params: {format=txt}, status: 500 java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
        at java.base/java.util.ImmutableCollections$ListN.get(ImmutableCollections.java:725)
        at org.elasticsearch.compute.aggregation.AllFirstLongByTimestampAggregatorFunction.addRawInputNotMasked(AllFirstLongByTimestampAggregatorFunction.java:77)
        at org.elasticsearch.compute.aggregation.AllFirstLongByTimestampAggregatorFunction.addRawInput(AllFirstLongByTimestampAggregatorFunction.java:63)
        at org.elasticsearch.compute.aggregation.Aggregator.processPage(Aggregator.java:43)
        at org.elasticsearch.compute.operator.AggregationOperator.addInput(AggregationOperator.java:110)
        at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:308)
        at org.elasticsearch.compute.operator.Driver.run(Driver.java:193)
        at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:442)
        at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at org.elasticsearch.compute.operator.DriverScheduler$1.doRun(DriverScheduler.java:57)
        at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
        at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1114)
        at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
        at java.base/java.lang.Thread.run(Thread.java:1474)

To generalize, all queries like the two below will do.

.... | STATS x = FIRST(distance_to_earth, distance_to_earth)
or 
.... | STATS x = LAST(@timestamp, @timestamp)

The actual problem
In the aggs framework, and under the hood, stuff like this snippet below happens all the time.

  private void addRawInputNotMasked(Page page) {
    BytesRefBlock valuesBlock = page.getBlock(channels.get(0));  <---- nice
    LongBlock timestampsBlock = page.getBlock(channels.get(1)); <---- nice until it isn't
    addRawBlock(valuesBlock, timestampsBlock);
  }

So what's the issue? Well, the ground-zero of this problem is the call to channels.get(...). Because there's only one single channel if same field is passed twice. So the call channels.get(1) blows up, with an ArrayIndexOutOfBoundsException.

Workaround

Type conversion via calls to functions perfectly camouflages the issue; the two example below will work. Inline casting (::type) will not work.

.... | STATS x = FIRST(TO_LONG(long_field), long_field)
or 
.... | STATS x = LAST(TO_DATE_NANOS(@timestamp), @timestamp)

Surrogates

In this particular case, calling MIN(x) if the query was FIRST(x, x) will not work because of the way multivalues are supported. WhileMIN returns one single min value, FIRST will return all multivalues found, if any of course.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions