Clone HadoopConf to avoid cross usage of tags while parsing the xml by sandeep-katta0102 · Pull Request #582 · databricks/spark-xml

sandeep-katta0102 · 2022-06-02T07:03:03Z

This code is to fix the issue 581.

Added unit tests and also verified manually by using below code

import scala.collection.JavaConverters._
import scala.collection.mutable
val jobGroudId_ages = mutable.Set[Long]()

val threads_ages = (1001 to 1010).map { i =>
  new Thread {
    override def run() {
      sc.setJobGroup(s"$i", s"$i")
      val df = spark.read.option("rowTag", "person").format("xml").load("file:/Users/XXXX/spark-xml/src/test/resources/ages.xml") 
      if(df.schema.fields.isEmpty) {
        println(s"found repro for the ages run $i **********************")
        jobGroudId_ages.add(i)
      }
    }
  }
}


import scala.collection.JavaConverters._
import scala.collection.mutable
val jobGroudId_books = mutable.Set[Long]()

val threads = (1 to 10).map { i =>
  new Thread {
    override def run() {
      sc.setJobGroup(s"$i", s"$i")
      val df = spark.read.option("rowTag", "book").format("xml").load("file:/Users/XXXX/spark-xml/src/test/resources/books.xml") 
      if(df.schema.fields.isEmpty) {
        println(s"found repro for the book run $i **********************")
        jobGroudId_books.add(i)
      }
    }
  }
}

threads_ages.foreach(_.start())
threads.foreach(_.start())
threads_ages.foreach(_.join())
threads.foreach(_.join())
println(s" jobGroudId_books is ${jobGroudId_books.size} ")
println(s" jobGroudId_ages is ${jobGroudId_ages.size} ")

Before fix

After fix

HyukjinKwon

LGTM, thanks @sandeep-katta0102

HyukjinKwon · 2022-06-02T11:35:45Z

cc @srowen FYI if you find some time to take a look 🙏

HyukjinKwon · 2022-06-03T00:27:22Z

@srowen just out of curiosity, when do we roughly plan to have the next release?

srowen · 2022-06-03T00:32:44Z

No particular schedule -- on demand. Is this is a sorta important fix? it's easy to roll a new release, and it has been 7 months or so since the last one, so seems OK to me.

HyukjinKwon · 2022-06-03T00:39:27Z

not super critical but I think it's good to have one ... could we make a release maybe? I will take a look and try the release around next week if you couldn't find to take a look 👍

srowen · 2022-06-03T01:27:50Z

OK I can do it tomorrow I think

HyukjinKwon · 2022-06-03T01:37:16Z

Thank you

srowen · 2022-06-03T17:20:37Z

Done, 0.15.0 is released with this change

HyukjinKwon · 2022-06-04T00:13:21Z

Thanks!!!!

Clone hadoopConf and use

451d823

HyukjinKwon approved these changes Jun 2, 2022

View reviewed changes

srowen approved these changes Jun 2, 2022

View reviewed changes

srowen merged commit 1e25d7b into databricks:master Jun 2, 2022

srowen assigned sandeep-katta0102 Jun 2, 2022

srowen added the bug label Jun 2, 2022

srowen added this to the 0.15.0 milestone Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clone HadoopConf to avoid cross usage of tags while parsing the xml#582

Clone HadoopConf to avoid cross usage of tags while parsing the xml#582
srowen merged 1 commit intodatabricks:masterfrom
sandeep-katta0102:master

sandeep-katta0102 commented Jun 2, 2022 •

edited

Loading

Uh oh!

HyukjinKwon left a comment

Uh oh!

HyukjinKwon commented Jun 2, 2022

Uh oh!

HyukjinKwon commented Jun 3, 2022

Uh oh!

srowen commented Jun 3, 2022

Uh oh!

HyukjinKwon commented Jun 3, 2022 •

edited

Loading

Uh oh!

srowen commented Jun 3, 2022

Uh oh!

HyukjinKwon commented Jun 3, 2022

Uh oh!

srowen commented Jun 3, 2022

Uh oh!

HyukjinKwon commented Jun 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sandeep-katta0102 commented Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jun 2, 2022

Uh oh!

HyukjinKwon commented Jun 3, 2022

Uh oh!

srowen commented Jun 3, 2022

Uh oh!

HyukjinKwon commented Jun 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Jun 3, 2022

Uh oh!

HyukjinKwon commented Jun 3, 2022

Uh oh!

srowen commented Jun 3, 2022

Uh oh!

HyukjinKwon commented Jun 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sandeep-katta0102 commented Jun 2, 2022 •

edited

Loading

HyukjinKwon commented Jun 3, 2022 •

edited

Loading