Kinda a continuation of #140577. Relates #138888 .
SET unmapped_fields="nullify" introduced more usage for the NULL type that was previously reserved for the null literal. A field that is missing from the indices used in FROM will be "nullified", that is, treated like the null literal and be of type NULL.
Unfortunately, we sometimes guess the output type of an expression when it uses the NULL type. This applies e.g. to SUM and to arithmetic operators. It can lead to queries unnecessarily becoming invalid when a field goes missing (e.g. due to small mapping changes becoming effective after rollover). An example is
SET unmapped_fields="nullify";
FROM idx
| STATS s = SUM(field)
| EVAL foo = CASE(bar, 10000::long, s)
When field is mapped as a long, this query works fine. But when field becomes unmapped, this query fails because SUM(field) is then assumed to be double, which breaks the CASE expression due to type mismatch in its output branches.
As commented here, I think we may have more useful behavior for the NULL data type if we postulate two rules that expressions consuming the NULL type must follow:
- Every function and operator accepts the null type for any argument that can be a field or other non-literal column. That is, if I can write valid queries with
... | EVAL y = function(x, ...) and x is a field or reference of some non-NULL type, function must also work if we shadow x with a NULL literal: ... | EVAL x = null | EVAL y = function(x, ...). This does not cover function parameters that need to be folded before the query is run, e.g. PERCENTILE(field, percentage) does have to accept a NULL field, but not a NULL percentage.
- If a function can output different types depending on the argument types, it must output the type NULL if one of the input types that determines the output type is NULL.
This behavior implies no guessing of output types when we cannot really determine this from the expression; it would also mean that more queries remain valid when using SET unmapped_fields="nullify" and one of the fields using in the query is unmapped or missing.
Kinda a continuation of #140577. Relates #138888 .
SET unmapped_fields="nullify"introduced more usage for theNULLtype that was previously reserved for thenullliteral. A field that is missing from the indices used inFROMwill be "nullified", that is, treated like thenullliteral and be of typeNULL.Unfortunately, we sometimes guess the output type of an expression when it uses the
NULLtype. This applies e.g. toSUMand to arithmetic operators. It can lead to queries unnecessarily becoming invalid when a field goes missing (e.g. due to small mapping changes becoming effective after rollover). An example isWhen
fieldis mapped as a long, this query works fine. But whenfieldbecomes unmapped, this query fails becauseSUM(field)is then assumed to bedouble, which breaks theCASEexpression due to type mismatch in its output branches.As commented here, I think we may have more useful behavior for the
NULLdata type if we postulate two rules that expressions consuming theNULLtype must follow:... | EVAL y = function(x, ...)andxis a field or reference of some non-NULLtype,functionmust also work if we shadowxwith aNULLliteral:... | EVAL x = null | EVAL y = function(x, ...). This does not cover function parameters that need to be folded before the query is run, e.g.PERCENTILE(field, percentage)does have to accept aNULLfield, but not aNULLpercentage.This behavior implies no guessing of output types when we cannot really determine this from the expression; it would also mean that more queries remain valid when using
SET unmapped_fields="nullify"and one of the fields using in the query is unmapped or missing.