A case where somebody was trying to import a CSV file with 3000 numeric fields revealed that it's possible to get a stack overflow exception when executing an ingest pipeline with many processors.
The ingest pipeline was ingest_pipeline.json, consisting of a CSV processor to parse the CSV followed by 3000 convert processors to convert the strings parsed from the CSV to numbers.
On executing this pipeline it fails with a stack overflow exception:
[2022-02-23T10:16:31,678][INFO ][o.e.c.m.MetadataCreateIndexService] [runTask-0] [test] creating index, cause [api], templates [], shards [1]/[1]
[2022-02-23T10:16:32,541][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [runTask-0] fatal error in thread [elasticsearch[runTask-0][write][T#12]], exiting
java.lang.StackOverflowError: null
at java.lang.String.startsWith(String.java:2297) ~[?:?]
at org.elasticsearch.ingest.IngestDocument$FieldPath.<init>(IngestDocument.java:877) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.IngestDocument.getFieldValue(IngestDocument.java:102) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.IngestDocument.getFieldValue(IngestDocument.java:122) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.common.ConvertProcessor.execute(ConvertProcessor.java:185) ~[?:?]
at org.elasticsearch.ingest.Processor.execute(Processor.java:41) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
.
.
.
(The stack trace continues with the same 3 calls over and over again.)
Could CompoundProcessor.innerExecute be changed to use iteration rather than recursion to avoid this?
The sample CSV file that goes with the ingest pipeline is test.csv.
A case where somebody was trying to import a CSV file with 3000 numeric fields revealed that it's possible to get a stack overflow exception when executing an ingest pipeline with many processors.
The ingest pipeline was ingest_pipeline.json, consisting of a CSV processor to parse the CSV followed by 3000 convert processors to convert the strings parsed from the CSV to numbers.
On executing this pipeline it fails with a stack overflow exception:
(The stack trace continues with the same 3 calls over and over again.)
Could
CompoundProcessor.innerExecutebe changed to use iteration rather than recursion to avoid this?The sample CSV file that goes with the ingest pipeline is test.csv.