-
Notifications
You must be signed in to change notification settings - Fork 190
[BUG] Script size limit exceeded in PPL queries with temporal functions and aggregations on large schemas #4547
Description
Describe the bug
PPL queries combining eval with temporal functions (TIMESTAMP, TIMESTAMPDIFF) and aggregations (stats, top) fail with script size limit error on indices with large schemas (e.g., OCSF 1.1.0 with 618+ fields).
Error:
exceeded max allowed inline script size [65535] with size [202478]
To Reproduce
Query:
source=ocsf-1.1.0-6003
| eval start_time_ts = TIMESTAMP(start_time_dt), time_ts = TIMESTAMP(time_dt), time_diff = TIMESTAMPDIFF(SECOND, start_time_ts, time_ts)
| where time_diff > 0
| stats count() by activity_name
| sort - count
| head 5Schema: OCSF 1.1.0 (618 fields)
Result: Script size 202,478 bytes (exceeds 65,535 limit)
Tentative Root Cause
This is a preliminary analysis and requires further investigation.
Location: RelJsonSerializer.serialize() in opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializer.java
The serializer includes the entire table schema (rowType) in the script even though the RexNode expression only references 2 fields. For large schemas, this causes massive script bloat.
Code (lines 88-93):
String rexNodeJson = jsonBuilder.toJsonString(relJson.toJson(rexNode));
Object rowTypeJsonObj = relJson.toJson(rowType); // ← Serializes ALL fields
String rowTypeJson = jsonBuilder.toJsonString(rowTypeJsonObj);Tentative Proposed Fix
This is a preliminary analysis and requires further investigation.
Design: Only serialize fields actually referenced in the RexNode expression.
Implementation:
- Extract referenced field indices from RexNode using the visitor pattern
- Build a minimal RelDataType with only those fields
- Filter fieldTypes map to referenced fields only
- Serialize minimal schema instead of full schema
Pseudocode:
Set<Integer> referencedFields = extractReferencedFields(rexNode);
RelDataType minimalRowType = createMinimalRowType(rowType, referencedFields);
Map<String, ExprType> minimalFieldTypes = filterFieldTypes(fieldTypes, referencedFields);
// Serialize minimal versions instead of full onesImpact
- Severity: High - blocks legitimate queries on large schemas
- Workaround: Increase
script.max_size_in_bytes(masks problem, doesn't scale) - Scope: All PPL queries with eval + temporal functions + aggregations
Metadata
Metadata
Assignees
Labels
Type
Projects
Status