Skip to content
This repository was archived by the owner on Mar 24, 2025. It is now read-only.
This repository was archived by the owner on Mar 24, 2025. It is now read-only.

PROPOSAL: Spark Schema generation from XSD Schema #449

@seddonm1

Description

@seddonm1

Now that spark-xml supports XML Schema definitions (XSD files) it would be good to use the schema information to generate the Spark Dataframe schema instead of either relying on schema-inference or having to manually create the schema.

To do this I think we need a few things:

  • agreement that this is an idea worth pursuing.
  • agreement that it is ok to add an import of a library like the Apache Web Services XmlSchema to parse the XSD.
  • sufficient tests.

I have already build a proof of concept which correctly parses the XSD to a StructType using the XmlSchema library above (which identified issues with the DateTime and Date types #448). I have tested this against a set of ISO20022 messages.

My challenge is I only have a limited knowledge of XSD so having correct XSD and valid XML files to run against and some help on this would be very useful. I am not sure how representative the ISO20022 XSDs really are.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions