Fix parsing of partial result when corrupted record field is present by srowen · Pull Request #518 · databricks/spark-xml

srowen · 2021-02-10T19:16:44Z

Closes #517

srowen · 2021-02-10T19:18:44Z

src/main/scala/com/databricks/spark/xml/parsers/StaxXmlParser.scala

-        var i = 0
-        while (i < actualSchema.length) {
-          val from = actualSchema(i)
-          resultRow(schema.fieldIndex(from.name)) = partialResult.map(_.get(i)).orNull


@HyukjinKwon this was the bug. I call your attention to it because as the comment above says, this is copied from Spark's CSV parser. If I'm really right about this I need to fix a similar problem in Spark.

The bug is basically: actualSchema is the schema without the corrupt record field. We iterate over those (index i) and set the field in the partial result according to its real fieldIndex. But we read from partialResult at i, which is an index into actualSchema, which has one less field, while partialResult has the same schema as the resulting Row. If it's in the middle, then everything after is off by 1 when returned to catalyst.

I'm tracking a similar fix in Spark in https://issues.apache.org/jira/browse/SPARK-34422

Fix parsing of partial result when corrupted record field is present

53646d7

srowen added the bug label Feb 10, 2021

srowen added this to the 0.12.0 milestone Feb 10, 2021

srowen self-assigned this Feb 10, 2021

srowen commented Feb 10, 2021

View reviewed changes

srowen merged commit f8d200c into databricks:master Feb 11, 2021

srowen deleted the Issue517 branch February 11, 2021 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parsing of partial result when corrupted record field is present#518

Fix parsing of partial result when corrupted record field is present#518
srowen merged 1 commit intodatabricks:masterfrom
srowen:Issue517

srowen commented Feb 10, 2021

Uh oh!

srowen Feb 10, 2021

Uh oh!

srowen Feb 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

srowen commented Feb 10, 2021

Uh oh!

srowen Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

srowen Feb 11, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant