You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 24, 2025. It is now read-only.
Now that spark-xml supports XML Schema definitions (XSD files) it would be good to use the schema information to generate the Spark Dataframe schema instead of either relying on schema-inference or having to manually create the schema.
I have already build a proof of concept which correctly parses the XSD to a StructType using the XmlSchema library above (which identified issues with the DateTime and Date types #448). I have tested this against a set of ISO20022 messages.
My challenge is I only have a limited knowledge of XSD so having correct XSD and valid XML files to run against and some help on this would be very useful. I am not sure how representative the ISO20022 XSDs really are.
Now that
spark-xmlsupports XML Schema definitions (XSD files) it would be good to use the schema information to generate the Spark Dataframe schema instead of either relying on schema-inference or having to manually create the schema.To do this I think we need a few things:
I have already build a proof of concept which correctly parses the XSD to a StructType using the
XmlSchemalibrary above (which identified issues with the DateTime and Date types #448). I have tested this against a set of ISO20022 messages.My challenge is I only have a limited knowledge of XSD so having correct XSD and valid XML files to run against and some help on this would be very useful. I am not sure how representative the
ISO20022XSDs really are.