You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 24, 2025. It is now read-only.
There is an issue w.r.t xml connector, if 2 xml files are read at same time then there is a high possibility that one of the xml doesn't parse the schema. This issue is because of the below code
val failedAgesSet = mutable.Set[Long]()
val threads_ages = (1 to 10).map { i =>
new Thread {
override def run() {
val df = spark.read.option("rowTag", "person").format("xml")
.load(resDir + "ages.xml")
if (df.schema.fields.isEmpty) {
failedAgesSet.add(i)
}
}
}
}
val failedBooksSet = mutable.Set[Long]()
val threads_books = (11 to 20).map { i =>
new Thread {
override def run() {
val df = spark.read.option("rowTag", "book").format("xml")
.load(resDir + "books.xml")
if (df.schema.fields.isEmpty) {
failedBooksSet.add(i)
}
}
}
}
threads_ages.foreach(_.start())
threads_books.foreach(_.start())
threads_ages.foreach(_.join())
threads_books.foreach(_.join())
assert(failedBooksSet.isEmpty)
assert(failedAgesSet.isEmpty)
Correct Log
22/05/31 20:53:12 INFO |Executor task launch worker for task 0.0 in stage 6.0 (TID 6)| xml.XmlRecordReader: file is file:/Users/sandeep.katta/sourcecode/databricks/spark-xml/spark-xml/src/test/resources/books.xml:0+5542 and startTag is <book> and endTag is </book>
In-Correct Log which parses incorrect tag book, ideally the tag should be person
22/05/31 20:53:12 INFO |Executor task launch worker for task 0.0 in stage 5.0 (TID 5)| xml.XmlRecordReader: file is file:/Users/sandeep.katta/sourcecode/databricks/spark-xml/spark-xml/src/test/resources/ages.xml:0+265 and startTag is <book> and endTag is </book>
There is an issue w.r.t xml connector, if 2 xml files are read at same time then there is a high possibility that one of the xml doesn't parse the schema. This issue is because of the below code
Steps to reproduce
Correct Log
22/05/31 20:53:12 INFO |Executor task launch worker for task 0.0 in stage 6.0 (TID 6)| xml.XmlRecordReader: file is file:/Users/sandeep.katta/sourcecode/databricks/spark-xml/spark-xml/src/test/resources/books.xml:0+5542 and startTag is <book> and endTag is </book>In-Correct Log which parses incorrect tag book, ideally the tag should be person
22/05/31 20:53:12 INFO |Executor task launch worker for task 0.0 in stage 5.0 (TID 5)| xml.XmlRecordReader: file is file:/Users/sandeep.katta/sourcecode/databricks/spark-xml/spark-xml/src/test/resources/ages.xml:0+265 and startTag is <book> and endTag is </book>