GEOMETRY Rework: Part 7 - More Coordinate Reference System Support#20721
GEOMETRY Rework: Part 7 - More Coordinate Reference System Support#20721Mytherin merged 16 commits intoduckdb:v1.5-variegatafrom
GEOMETRY Rework: Part 7 - More Coordinate Reference System Support#20721Conversation
GEOMETRY Rework: Part 7 - More Coordinate Reference System Support
|
Who canceled my CI?? >:( |
|
@carlopi Do you think this failure is also curl or unrelated? |
|
Failures in What's weird about this failure is the "Expected 9, Actual 12", I suspect there might be some concurrency issues in the JSON reader now that CachingFileSystem kicked in. |
|
@Maxxen: short answer is: not on you. |
Mytherin
left a comment
There was a problem hiding this comment.
Thanks! The PR looks good to me - could you just re-run the settings generation script?
…o crs-provider
|
@Mytherin I think we should be good to go now |
|
Thanks! |
This is from CI failure, The slow test is introduced at this PR, which is used to verify (1) the correctness of httpfs extension with logarithmically read buffer growth; and (2) the actual GET count is less than certain threshold, used to check the effectiveness of buffer growth. |
|
@dentiny, thanks for following up on this. There was originally a problem where the patch was not correctly applied, and that caused the 8 vs 9 failures. What's still open is that I have seen for example at https://github.com/duckdb/duckdb/actions/runs/21678139259/job/62510582938?pr=20810#step:28:168 another failure with 12 vs 8. My question would be: are there reasonable cases where 12 (or 13, also that happeend) could be returned, or that means there is some concurrency issue where multiple threads are reading the file, and depending on timing behaviour would change? I was also unable to repro locally, there results are consistent also varying the number of threads, but in CI it seems to fail every 20 runs or so. |
|
I'm not 100% sure but I vaguely remember seeing this test fail with 12 or 13 requests when GH actions was having issues. It could also be related to retrying requests? I would say the best way forward is to check that the number of requests is < 20. It was 100+ before the logarithmic growth so that should be fine. |
Date: 2026-02-04 22:36:23 +0100 `GEOMETRY` Rework: Part 7 - More Coordinate Reference System Support (duckdb/duckdb#20721) Bump and remove patch httpfs (duckdb/duckdb#20790)
Date: 2026-02-04 22:36:23 +0100 `GEOMETRY` Rework: Part 7 - More Coordinate Reference System Support (duckdb/duckdb#20721) Bump and remove patch httpfs (duckdb/duckdb#20790)
Date: 2026-02-04 22:36:23 +0100 `GEOMETRY` Rework: Part 7 - More Coordinate Reference System Support (duckdb/duckdb#20721) Bump and remove patch httpfs (duckdb/duckdb#20790)
Date: 2026-02-04 22:36:23 +0100 `GEOMETRY` Rework: Part 7 - More Coordinate Reference System Support (duckdb/duckdb#20721) Bump and remove patch httpfs (duckdb/duckdb#20790)
This is a followup PR that builds on top of #20143, please have a look at #19136 for the context behind this PR.
This PR makes additional changes to how coordinate systems are handled for the
GEOMETRYtype.Shrinking, Expansion, and Identification of coordinate systems
In the initial iteration of parameterizing geometry types with coordinate systems, we basically allowed any string to be stored as the CRS, and then tried to parse and identify the format (projjson, wkt2:2019, auth:code, srid) before extracting a "name" or "identifier" which we stored separately to use when printing the type.
This has the major downside that the textual representation of a geometry type (or SQL schema containing geometry types) no longer round-trips. I.e. if you parse it back, you no longer get the same type. This is primarily a problem when doing a
EXPORT DATABASE,SUMMARIZEor calling.schemain the shell. However, the alternative of always printing the full definition is also... untenable as it makes the SQL extremely unfriendly to read.The compromise implemented in this PR is to alway print what's actually stored in the type info, but also try to "shrink" the actual CRS definition to e.g. its
auth:codewhen parsing a CRS, if the definition is a CRS that we recognize (and should therefore be able to "expand" into a full definition again later).As an example:
We also by-default now throw an error if we try to create a geometry type with an incomplete unrecognized CRS. I.e. a auth:code or opaque identifier. We always allow PROJJSON or WKT2 definitions even if we don't recognize them, as they are complete in the sense that they can be interpreted on their own, but we don't shrink them if we don't know them. This handling of unrecognized coordinate system identifiers can be controlled with the
ignore_unknown_crssetting.This means that you can still just pass around complete projjson or wkt2 definitions and deal with the ugliness if you really want to use your own custom coordinate systems, but in practice 99.9% of coordinate systems will be recognized by
spatial.While you can't define your own "known" coordinate systems through SQL, you can do it through your own extension (or application that embeds DuckDB) by providing instances of the new
CoordinateSystemCatalogEntryin the system catalog.Coordinate System Catalog Entries
There is now a new type of catalog entry to store coordinate system definitions, the
CoordinateSystemCatalogEntry. These can be registered by extensions to provide additional coordinate system definitions. For example, thespatialextension now registers its list of EPSG and OGC-defined coordinate systems by lazily pulling them from the embeddedPROJlibrary.But this PR also adds "OGC:CRS84" and "OGC:CRS83" definitions in core. This list of built-in definitions may or may not be extended in the future. Or we may create a separate dedicated extension that only supplies coordinate system definitions (similar to
icuandencodings).Support for CRS propagation through (Geo)Arrow import/export
This PR also adds support for propagating the CRS when exporting/importing from (Geo)Arrow. I had to make some changes to drill-down the client context into the arrow extension code, but we always have it available when resolving extension types anyway so the changes only really touch the internals.
A nice consequence of this is that
spatial:sGDALintegration automatically handles CRS propagation now too as its based on arrow, meaning thatST_Read()outputsGEOMETRYcolumns with the CRS specified by the underlying file, andCOPY ... TO (FORMAT GDAL)also encodes the CRS properly.Update
spatialto v1.5 BranchThis PR also adds back and bumps spatial to the v1.5 branch.