Skip to content

[BUG] Script size limit exceeded in PPL queries with temporal functions and aggregations on large schemas #4547

@alexey-temnikov

Description

@alexey-temnikov

Describe the bug

PPL queries combining eval with temporal functions (TIMESTAMP, TIMESTAMPDIFF) and aggregations (stats, top) fail with script size limit error on indices with large schemas (e.g., OCSF 1.1.0 with 618+ fields).

Error:

exceeded max allowed inline script size [65535] with size [202478]

To Reproduce

Query:

source=ocsf-1.1.0-6003 
| eval start_time_ts = TIMESTAMP(start_time_dt), time_ts = TIMESTAMP(time_dt), time_diff = TIMESTAMPDIFF(SECOND, start_time_ts, time_ts) 
| where time_diff > 0 
| stats count() by activity_name 
| sort - count 
| head 5

Schema: OCSF 1.1.0 (618 fields)

Result: Script size 202,478 bytes (exceeds 65,535 limit)

Tentative Root Cause

This is a preliminary analysis and requires further investigation.

Location: RelJsonSerializer.serialize() in opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializer.java

The serializer includes the entire table schema (rowType) in the script even though the RexNode expression only references 2 fields. For large schemas, this causes massive script bloat.

Code (lines 88-93):

String rexNodeJson = jsonBuilder.toJsonString(relJson.toJson(rexNode));
Object rowTypeJsonObj = relJson.toJson(rowType);  // ← Serializes ALL fields
String rowTypeJson = jsonBuilder.toJsonString(rowTypeJsonObj);

Tentative Proposed Fix

This is a preliminary analysis and requires further investigation.

Design: Only serialize fields actually referenced in the RexNode expression.

Implementation:

  1. Extract referenced field indices from RexNode using the visitor pattern
  2. Build a minimal RelDataType with only those fields
  3. Filter fieldTypes map to referenced fields only
  4. Serialize minimal schema instead of full schema

Pseudocode:

Set<Integer> referencedFields = extractReferencedFields(rexNode);
RelDataType minimalRowType = createMinimalRowType(rowType, referencedFields);
Map<String, ExprType> minimalFieldTypes = filterFieldTypes(fieldTypes, referencedFields);
// Serialize minimal versions instead of full ones

Impact

  • Severity: High - blocks legitimate queries on large schemas
  • Workaround: Increase script.max_size_in_bytes (masks problem, doesn't scale)
  • Scope: All PPL queries with eval + temporal functions + aggregations

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing languagebugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions