Reading multiple xml files in parallel results invalid schema for the xml

There is an issue w.r.t xml connector, if 2 xml files are read at same time then there is a high possibility that one of the xml doesn't parse the schema. This issue is because of the below code

   ```
 context.hadoopConfiguration.set(XmlInputFormat.START_TAG_KEY, s"<$rowTag>")
    context.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, s"</$rowTag>")
    context.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY, charset)
```

**Steps to reproduce**

```
val failedAgesSet = mutable.Set[Long]()
    val threads_ages = (1 to 10).map { i =>
      new Thread {
        override def run() {
          val df = spark.read.option("rowTag", "person").format("xml")
            .load(resDir + "ages.xml")
          if (df.schema.fields.isEmpty) {
            failedAgesSet.add(i)
          }
        }
      }
    }

    val failedBooksSet = mutable.Set[Long]()
    val threads_books = (11 to 20).map { i =>
      new Thread {
        override def run() {
          val df = spark.read.option("rowTag", "book").format("xml")
            .load(resDir + "books.xml")
          if (df.schema.fields.isEmpty) {
            failedBooksSet.add(i)
          }
        }
      }
    }

    threads_ages.foreach(_.start())
    threads_books.foreach(_.start())
    threads_ages.foreach(_.join())
    threads_books.foreach(_.join())
    assert(failedBooksSet.isEmpty)
    assert(failedAgesSet.isEmpty)
```

Correct Log

 `22/05/31 20:53:12 INFO |Executor task launch worker for task 0.0 in stage 6.0 (TID 6)|  xml.XmlRecordReader: file is file:/Users/sandeep.katta/sourcecode/databricks/spark-xml/spark-xml/src/test/resources/books.xml:0+5542 and startTag is <book> and endTag is </book>`

In-Correct Log which parses incorrect tag book, ideally the tag should be **person**

`22/05/31 20:53:12 INFO |Executor task launch worker for task 0.0 in stage 5.0 (TID 5)|  xml.XmlRecordReader: file is file:/Users/sandeep.katta/sourcecode/databricks/spark-xml/spark-xml/src/test/resources/ages.xml:0+265 and startTag is <book> and endTag is </book>`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading multiple xml files in parallel results invalid schema for the xml #581

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reading multiple xml files in parallel results invalid schema for the xml #581

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions