Skip to content
This repository was archived by the owner on Mar 24, 2025. It is now read-only.

Add XSD validation to from_xml. Clarify semantics of parse mode for from_xml#433

Merged
srowen merged 4 commits intodatabricks:masterfrom
srowen:Issue432
Jan 31, 2020
Merged

Add XSD validation to from_xml. Clarify semantics of parse mode for from_xml#433
srowen merged 4 commits intodatabricks:masterfrom
srowen:Issue432

Conversation

@srowen
Copy link
Copy Markdown
Collaborator

@srowen srowen commented Jan 30, 2020

Add support for XSD validation (rowValidationXSDPath) when using functions like from_xml.

Also, improve and clarify the semantics of parse mode when using from_xml. If the "corrupt record" column is not in the supplied schema, and parse mode is PERMISSIVE, then don't cause an error, but instead assume DROPMALFORMED mode, which results in a null value.

CC @HyukjinKwon purely FYI. No action needed.

Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good if tests pass

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jan 31, 2020

Codecov Report

Merging #433 into master will increase coverage by 0.15%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #433      +/-   ##
==========================================
+ Coverage    88.3%   88.46%   +0.15%     
==========================================
  Files          17       17              
  Lines         804      806       +2     
  Branches       68       71       +3     
==========================================
+ Hits          710      713       +3     
+ Misses         94       93       -1
Impacted Files Coverage Δ
...m/databricks/spark/xml/parsers/StaxXmlParser.scala 97.6% <100%> (+0.79%) ⬆️
...a/com/databricks/spark/xml/XmlDataToCatalyst.scala 75% <0%> (+5%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1693208...55a6296. Read the comment docs.

@srowen srowen merged commit a71f735 into databricks:master Jan 31, 2020
@srowen srowen deleted the Issue432 branch February 29, 2020 14:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants