Skip to content
This repository was archived by the owner on Mar 24, 2025. It is now read-only.

XSD -> schema tool with a test#457

Merged
srowen merged 6 commits intodatabricks:masterfrom
srowen:Issue449
Jun 18, 2020
Merged

XSD -> schema tool with a test#457
srowen merged 6 commits intodatabricks:masterfrom
srowen:Issue449

Conversation

@srowen
Copy link
Copy Markdown
Collaborator

@srowen srowen commented Jun 16, 2020

Relates to #449 and @seddonm1 's gist.

Adds a basic XSDToSchema utility that can parse a Spark schema from some XSDs, those defining a table-like schema with simple, complex and sequence types.

* @param xsdFile XSD file
* @return Spark-compatible schema
*/
@Experimental
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these experimental "API" methods

Constants.XSD_ANYTYPE =>
StructField(baseName, StringType)
}
case _ => StructField(baseName, StringType)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about the only substantive change, which helps with 'any' type fields with no further content restrictions. Just treat them as strings. Other changes to the code are cosmetic simplifications (IMHO)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

@srowen srowen changed the title [DO-NOT-MERGE] Prototype of seddonm1's XSD -> schema tool with a test XSD -> schema tool with a test Jun 17, 2020
@srowen srowen added this to the 0.10.0 milestone Jun 17, 2020
@srowen srowen linked an issue Jun 17, 2020 that may be closed by this pull request
@srowen srowen closed this Jun 17, 2020
@srowen srowen reopened this Jun 17, 2020
matchType match {
case Constants.XSD_BOOLEAN => StructField(baseName, BooleanType)
case Constants.XSD_BYTE => StructField(baseName, BinaryType)
case Constants.XSD_DATE |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you removed Date/DaateTime? You could release this with #448

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just let them fall into the general string case. But yeah maybe just as well to try them as a date or time type.

Copy link
Copy Markdown
Collaborator Author

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in as a start. We can iterate as we get more tests that may exercise it better.

@srowen srowen merged commit 160a092 into databricks:master Jun 18, 2020
@srowen srowen deleted the Issue449 branch June 18, 2020 14:11
@vim89
Copy link
Copy Markdown

vim89 commented Jul 29, 2020

Can I pull this version & compile myself? Before 0.10 gets released

@srowen
Copy link
Copy Markdown
Collaborator Author

srowen commented Jul 29, 2020

Yes you can just build it with SBT and try it. I should just make an 0.10 release soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PROPOSAL: Spark Schema generation from XSD Schema

3 participants