Skip to content
This repository was archived by the owner on Mar 24, 2025. It is now read-only.
This repository was archived by the owner on Mar 24, 2025. It is now read-only.

attributes are ignored when specifying a schema #375

@Jeongmin-Lee

Description

@Jeongmin-Lee

I want to get a nested xml as String. But, Attributes are ignored.
this is sample xml.

<record>
  <title>hihihihi &amp; tttt  </title>
  <info type="text" word="apps">
    <test1>test_</test1>
    <test2>tttt</test2>
    <test3 ab="test" xvf="sample"><aa>sample</aa></test3>
  </info>
</record>
val schema = StructType(Array(
  StructField("title", StringType),
  StructField("info", StringType)
))

val sample = spark.read.format("com.databricks.spark.xml").option("rowTag", "record").schema(schema).load("temp/sample.xml")

sample.show(false)

result,

+-----------------+---------------------------------------------------------------------+
|title            |info                                                                 |
+-----------------+---------------------------------------------------------------------+
|hihihihi & tttt  |<test1>test_</test1><test2>tttt</test2><test3><aa>sample</aa></test3>|
+-----------------+---------------------------------------------------------------------+

info value expected,

<test1>test_</test1><test2>tttt</test2><test3 ab="test" xvf="sample"><aa>sample</aa></test3>

How can I get nested xml text completely?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions