-
Notifications
You must be signed in to change notification settings - Fork 3k
spec: add clarification about the Geometry type calculation #13227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@rdblue @szehon-ho please review :-) |
format/spec.md
Outdated
| | | **`fixed(L)`** | Fixed-length byte array of length L | | | ||
| | | **`binary`** | Arbitrary-length byte array | | | ||
| | [v3](#version-3) | **`geometry(C)`** | Geospatial features from [OGC – Simple feature access][1001]. Edge-interpolation is always linear/planar. See [Appendix G](#appendix-g-geospatial-notes). Parameterized by CRS C. If not specified, C is `OGC:CRS84`. | | | ||
| | [v3](#version-3) | **`geometry(C)`** | Geospatial features from [OGC – Simple feature access][1001]. Edge-interpolation is always linear/planar, and geometric calculations are performed using Cartesian mathematics along straight lines. See [Appendix G](#appendix-g-geospatial-notes). Parameterized by CRS C. If not specified, C is `OGC:CRS84`. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
Edge-interpolation is always linear/planar, ie geometric calculations are performed using Cartesian mathematics along straight lines.
Then its more clear its a clarification rather than another fact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
| | | **`fixed(L)`** | Fixed-length byte array of length L | | | ||
| | | **`binary`** | Arbitrary-length byte array | | | ||
| | [v3](#version-3) | **`geometry(C)`** | Geospatial features from [OGC – Simple feature access][1001]. Edge-interpolation is always linear/planar. See [Appendix G](#appendix-g-geospatial-notes). Parameterized by CRS C. If not specified, C is `OGC:CRS84`. | | | ||
| | [v3](#version-3) | **`geometry(C)`** | Geospatial features from [OGC – Simple feature access][1001]. Edge-interpolation is always linear/planar, i.e., geometric calculations are performed using Cartesian mathematics along straight lines. See [Appendix G](#appendix-g-geospatial-notes). Parameterized by CRS C. If not specified, C is `OGC:CRS84`. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention that the purpose of Cartesian mathematics is to avoid the need to interpret the CRS?
Something like:
The CRS may be used to interpret points as geographic locations, but does not affect calculations, which are always Cartesian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed. Please review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but I think @mkaravel suggests some good changes to make to clarify the bounding boxes for geometry.
|
@jiayuasu Thank you for driving this! I am not able to comment directly at appropriate place since there are no changes there, which is why I writing this as a high level comment in the PR. In line 684 of the existing spec (prior to this PR) we state:
I believe that in order to make the spec self-consistent it would make sense to rephrase this and specify that How about the following slight modification?
|
|
@mkaravel I think this is relevant to our discussion in Slack. After giving some thoughts, I think it makes sense to consider that together with the |
|
@jiayuasu, can you help me understand why we wouldn't make the clarifications @mkaravel suggests in this PR? If we don't expect the CRS to an effect on calculations for |
|
@rdblue I think it makes sense to drop On the other hand, I'd like to also add I am fine adding the sentence to disallow wraparound here and create another PR to add Please let me know what you think |
@jiayuasu I am obviously (since I have suggested it in the past) totally on board with this plan. This will make the semantics of the types very clean (geometry always Cartesian, calculations independent of the CRS) and allow geography to have an easy, practical, and meaningful way to deal with the antimeridian geometries (also very clean IMHO).
My thinking when I proposed the changes above in this PR was to make it self-consistent and self-contained regarding the @rdblue Are you okay with this? Thoughts? |
|
That's great there's agreement from @jiayuasu on adding the linear algorithm to Geography . Im also ok either way to add the change here or in another pr. If we have agreement we can do it here to minimize disruptions, as the changes seem related. The only thing to check I had is, is it safe to add a new Edge Interpolation algorithm after V3 is adopted. I am thinking yes, as implementations would just not be able to read if its a CRS or Edge algorithm they dont know. The other thing is, I guess adding the new algorithm seems its large enough to requires a vote. cc @rdblue for thoughts? |
|
I agree that we can move forward here and then discuss the change to add linear separately. I think clarifying how geometry works (that it is always independent of the CRS and cartesian) is independently valuable. Then we can follow up to add linear if it is needed next. |
|
Let's finalize this PR then by completing Cartesian semantics for GEOMETRY and do GEOGRAPHY as a follow up. |
|
Any chance linear interpolation for GEOGRAPHY can make it to Iceberg v3? |
|
Ping geo folks and see if they have comments @jorisvandenbossche |
|
Thank you for the ping! Just a note that I don't think this clarification is needed...a Cartesian bounding box like the one described here is perfectly legal in for the geometry type in GeoArrow's box type, GeoParquet's file metadata, GeoJSON, Parquet GeoStatistics, and the current Iceberg spec (even though all of these allow for the wraparound case as well). I personally found it to be a nice simplification that evaluating a filter expression against a Parquet GeoStatistics is the same whether the type is GEOMETRY or GEOGRAPHY. I'll save my comments regarding the possibility of a "linear" edge algorithm for the GEOGRAPHY type for the PR implementing that change (the short version is: call it something other than GEOGRAPHY because that's not what that type means in any other database, file format, or spec of which I am aware). |
How does a writer know what bounding box to produce? In all of those specs, is it the writer's decision whether to use a purely cartesian bounding box or one that wraps around? What I want to avoid is needing to interpret the CRS to know how to compute the bounding box. If writers are allowed to produce a box that is cartesian and ignore whether longitude is periodic, then maybe that is fine because it is the norm. But my expectation is that this would be determined by the CRS and if we wanted to produce a simpler bounding box it would be incorrect without this change. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Background
Several engines including Spark/Sedona and Presto explicitly mention the calculations on geometries are done via Cartesian mathematics. It would be good if we add clarification to the Iceberg spec
Proposed change
Added the following sentence