ESQL: consider better behavior of NULL type in expressions

Kinda a continuation of https://github.com/elastic/elasticsearch/issues/140577. Relates #138888 .

`SET unmapped_fields="nullify"` introduced more usage for the `NULL` type that was previously reserved for the `null` literal. A field that is missing from the indices used in `FROM` will be "nullified", that is, treated like the `null` literal and be of type `NULL`.

Unfortunately, we sometimes guess the output type of an expression when it uses the `NULL` type. This applies e.g. to `SUM` and to arithmetic operators. It can lead to queries unnecessarily becoming invalid when a field goes missing (e.g. due to small mapping changes becoming effective after rollover). An example is

```
SET unmapped_fields="nullify";
FROM idx
| STATS s = SUM(field)
| EVAL foo = CASE(bar, 10000::long, s)
```

When `field` is mapped as a long, this query works fine. But when `field` becomes unmapped, this query fails because `SUM(field)` is then assumed to be `double`, which breaks the `CASE` expression due to type mismatch in its output branches.

As commented [here](https://github.com/elastic/elasticsearch/issues/140577#issuecomment-3846250551), I think we may have more useful behavior for the `NULL` data type if we postulate two rules that expressions consuming the `NULL` type must follow:

1. Every function and operator accepts the null type for any argument that can be a field or other non-literal column. That is, if I can write valid queries with `... | EVAL y = function(x, ...)` and `x` is a field or reference of some non-`NULL` type, `function` must also work if we shadow `x` with a `NULL` literal: `... | EVAL x = null | EVAL y = function(x, ...)`. This does not cover function parameters that need to be folded before the query is run, e.g. `PERCENTILE(field, percentage)` does have to accept a `NULL` field, but not a `NULL` percentage.
2. If a function can output different types depending on the argument types, it must output the type NULL if one of the input types that determines the output type is NULL.

This behavior implies no guessing of output types when we cannot really determine this from the expression; it would also mean that more queries remain valid when using `SET unmapped_fields="nullify"` and one of the fields using in the query is unmapped or missing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: consider better behavior of NULL type in expressions #142537

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ESQL: consider better behavior of NULL type in expressions #142537

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions