Skip to content

Large integers from CEL arrive in ES as Doubles #43659

@chrisberkhout

Description

@chrisberkhout

Elasticsearch is okay with integers like 235549249 and larger, and can convert them to strings.

Expand for an Elasticsearch example
PUT _ingest/pipeline/test_convert
{
  "description": "Pipeline to test conversion of large integers",
  "processors": [
    {
      "convert": {
        "field": "small",
        "target_field": "small_string",
        "type": "string"
      }
    },
    {
      "convert": {
        "field": "large",
        "target_field": "large_string",
        "type": "string"
      }
    }
  ]
}

POST _ingest/pipeline/test_convert/_simulate
{
  "docs": [
    {
      "_source": {
        "small": 12345,
        "large": 2355492490000000000
      }
    }
  ]
}
{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_version": "-3",
        "_id": "_id",
        "_source": {
          "small": 12345,
          "large": 2355492490000000000,
          "small_string": "12345",
          "large_string": "2355492490000000000"
        },
        "_ingest": {
          "timestamp": "2025-04-03T08:58:03.968461461Z"
        }
      }
    }
  ]
}

However, when receiving such a number from Beats (at least from the CEL input), it is of type Double.

The convert processor will produce the string "2.35549249E8". Trying to convert to an integer before converting to a string fails with an error message saying it can't convert the string "2.35549249E8" (so, the convert processor doesn't natively convert from Double to integer).

Expand for a script processor that does a type check and conversion in Painless
  - script:
      description: Convert large integer to a string without an exponent
      lang: painless
      source: |
        def value = (Object) ctx.json.id;
        if (value instanceof Integer) {
            ctx.json["id_type_name"] = "Integer";
        } else if (value instanceof Double) {
            ctx.json["id_type_name"] = "Double";
        } else if (value instanceof String) {
            ctx.json["id_type_name"] = "String";
        } else {
            ctx.json["id_type_name"] = "Unknown";
        }
        ctx.json.id = Long.toString((long) ctx.json.id);

Result:

json.id_type_name = "Double"
json.id = 235549249

This requires special handling, such as in the ti_anomali integration. In that case, notice that the original server response is an integer in JSON (e.g. "id":235548914 in the system test response), and the CEL also reserializes into JSON as an integer.

This raises some questions:

  • What JSON is being sent to Elasticsearch in such cases?
  • Why is this happening for relatively small integers?
  • Is there an issue in cel-go or the CEL input or Beats that could be fixed to improve this?

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions