Skip to content

ESQL avg() stats leads to arithmetic overflow exception on large data #99575

@craigtaverner

Description

@craigtaverner

In the nyc_taxis dataset used in benchmarking, I’m getting an arithmetic exception when calculating avg of many longs.

The query:

from nyc_taxis
| eval epoch = to_long(dropoff_datetime)
| stats ae=avg(epoch), count(epoch)
| eval at=to_datetime(ae)

And the result:

{
    "error": {
        "root_cause": [
            {
                "type": "arithmetic_exception",
                "reason": "long overflow"
            }
        ],
        "type": "arithmetic_exception",
        "reason": "long overflow"
    },
    "status": 500
}

In ESQL arithmetic operations can lead to numerical overflows, and the principle is to return null and add a warning. We could do that in this case. However, since the avg function will return a double anyway, we could cast the longs to doubles up-front. Right now avg() is implemented as a sum()/count() and the division will return a double, we could instead change this to sum(to_double())/count() and get the conversion to double done one step earlier.

As a test, the following workaround works:

from nyc_taxis
| eval epoch = to_double(dropoff_datetime)
| stats ae=avg(epoch), count(epoch)
| eval at=to_datetime(ae)
{
    "columns": [
        {
            "name": "ae",
            "type": "double"
        },
        {
            "name": "count(epoch)",
            "type": "long"
        },
        {
            "name": "at",
            "type": "date"
        }
    ],
    "values": [
        [
            1431759064681.463,
            8320000,
            "2015-05-16T06:51:04.681Z"
        ]
    ]
}

So, three options here:

  • Implement the conversion to null and warnings for aggregations that have numerical overflow
  • Add an implicit cast to double avg() -> sum(to_double())/count()
  • Support big integers on sum(), requiring support for an additional type (big integer)

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions