-
Notifications
You must be signed in to change notification settings - Fork 421
feat: Add Geometry & Geography Types #2859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement support for Iceberg v3 geospatial types as specified in the
Iceberg specification:
- Add GeometryType(crs) and GeographyType(crs, algorithm) to types.py
- Default CRS is "OGC:CRS84", default algorithm is "spherical"
- Types require format version 3 (minimum_format_version() returns 3)
- Values are stored as WKB (Well-Known Binary) bytes at runtime
- Avro schema conversion maps to "bytes"
- PyArrow conversion maps to large_binary()
- Add type string parsing for geometry('CRS') and geography('CRS', 'algo')
- Add visitor pattern support in schema.py and resolver.py
Note: JSON single-value encoding (WKB<->WKT) raises NotImplementedError
as it requires external libraries (e.g., Shapely) which are not included
to avoid heavy dependencies.
|
Hopefully, I hope that the maintainers will be more reactive to integrate this PR than mine: #2224 |
|
Thanks for the PR! Great to see This PR already look pretty modular but if there's any way to break the PR into components, that would be very helpful. |
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with geo support, but from a PyIceberg perspective this looks like a great first start 👍
|
Thank you for the reviews! This PR is the groundwork. The next phase, building on the existing GeometryType/GeographyType foundation, will be:
I'll be sure to gate the dependencies with the modularity My interest in this functionality is mostly personal, I've been playing around with exported Google Timeline data (that they provide in json via Google Takeout and would like to set something up for regular export and storage in a much more performant and stable format. Happy to change/move/remove the documentation I supplied also. I felt like the RFC was necessary to include for discussion, but I'm not sure about the team's desire to keep those docs in the repo. |
Closes #1820
Rationale for this change
Apache Iceberg v3 introduces native
geometryandgeographyprimitive types.This PR adds spec-compliant support for those types in PyIceberg, including:
A full design and scope discussion is available in the accompanying RFC:
📄 RFC: Iceberg v3 Geospatial Primitive Types
The RFC documents scope, non-goals, compatibility constraints, and known limitations.
Are these changes tested?
Yes.
__str__/__repr__round-trippingAre there any user-facing changes?
Yes.
geometryandgeographycolumns in Iceberg v3 schemas