Data loss when input file partitioned through rowTag element

Hi,

Thanks for all the effort put into this library!
We still seem to be having this issue related to #399 with 0.9.0 :(
We have large xmlfiles - 10+ GB with format like this:

```
...
<SoundRecording>
...
</SoundRecording>
...
<Release>
...
</Release>
...
<ReleaseTransactions>
...
</ReleaseTransactions>
```

When I count the number of SoundRecording/Release/ReleaseTransactions in the files it is the same (and should be), but processing the files like this:
spark.read.format("com.databricks.spark.xml").....option("rowTag","SoundRecording")
Gives me different counts of SoundRecording/Release/ReleaseTransactions for some files processed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data loss when input file partitioned through rowTag element #450

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data loss when input file partitioned through rowTag element #450

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions