Fix parsing of partial result when corrupted record field is present#518
Fix parsing of partial result when corrupted record field is present#518srowen merged 1 commit intodatabricks:masterfrom
Conversation
| var i = 0 | ||
| while (i < actualSchema.length) { | ||
| val from = actualSchema(i) | ||
| resultRow(schema.fieldIndex(from.name)) = partialResult.map(_.get(i)).orNull |
There was a problem hiding this comment.
@HyukjinKwon this was the bug. I call your attention to it because as the comment above says, this is copied from Spark's CSV parser. If I'm really right about this I need to fix a similar problem in Spark.
The bug is basically: actualSchema is the schema without the corrupt record field. We iterate over those (index i) and set the field in the partial result according to its real fieldIndex. But we read from partialResult at i, which is an index into actualSchema, which has one less field, while partialResult has the same schema as the resulting Row. If it's in the middle, then everything after is off by 1 when returned to catalyst.
There was a problem hiding this comment.
I'm tracking a similar fix in Spark in https://issues.apache.org/jira/browse/SPARK-34422
Closes #517