Skip to content

GEOMETRY Rework: Part 7 - More Coordinate Reference System Support#20721

Merged
Mytherin merged 16 commits intoduckdb:v1.5-variegatafrom
Maxxen:crs-provider
Feb 4, 2026
Merged

GEOMETRY Rework: Part 7 - More Coordinate Reference System Support#20721
Mytherin merged 16 commits intoduckdb:v1.5-variegatafrom
Maxxen:crs-provider

Conversation

@Maxxen
Copy link
Member

@Maxxen Maxxen commented Jan 28, 2026

This is a followup PR that builds on top of #20143, please have a look at #19136 for the context behind this PR.

This PR makes additional changes to how coordinate systems are handled for the GEOMETRY type.

Shrinking, Expansion, and Identification of coordinate systems

In the initial iteration of parameterizing geometry types with coordinate systems, we basically allowed any string to be stored as the CRS, and then tried to parse and identify the format (projjson, wkt2:2019, auth:code, srid) before extracting a "name" or "identifier" which we stored separately to use when printing the type.

This has the major downside that the textual representation of a geometry type (or SQL schema containing geometry types) no longer round-trips. I.e. if you parse it back, you no longer get the same type. This is primarily a problem when doing a EXPORT DATABASE, SUMMARIZE or calling .schema in the shell. However, the alternative of always printing the full definition is also... untenable as it makes the SQL extremely unfriendly to read.

The compromise implemented in this PR is to alway print what's actually stored in the type info, but also try to "shrink" the actual CRS definition to e.g. its auth:code when parsing a CRS, if the definition is a CRS that we recognize (and should therefore be able to "expand" into a full definition again later).

As an example:

  • If we read a GeoParquet file, which has the default coordinate system "CRS84" in projjson format, we only store "OGC:CRS84" in the geometry type, because DuckDB knows how to convert "OGC:CRS84" back into the projjson (or WKT2) later if needed.
  • But if we read a GeoParquet file with some other unrecognized CRS definition (e.g. "SOME_ORG:1337", but in projjson) we store the full projjson, and simply live with the fact that the type will be hideous to display.

We also by-default now throw an error if we try to create a geometry type with an incomplete unrecognized CRS. I.e. a auth:code or opaque identifier. We always allow PROJJSON or WKT2 definitions even if we don't recognize them, as they are complete in the sense that they can be interpreted on their own, but we don't shrink them if we don't know them. This handling of unrecognized coordinate system identifiers can be controlled with the ignore_unknown_crs setting.

This means that you can still just pass around complete projjson or wkt2 definitions and deal with the ugliness if you really want to use your own custom coordinate systems, but in practice 99.9% of coordinate systems will be recognized by spatial.

While you can't define your own "known" coordinate systems through SQL, you can do it through your own extension (or application that embeds DuckDB) by providing instances of the new CoordinateSystemCatalogEntry in the system catalog.

Coordinate System Catalog Entries

There is now a new type of catalog entry to store coordinate system definitions, the CoordinateSystemCatalogEntry. These can be registered by extensions to provide additional coordinate system definitions. For example, the spatial extension now registers its list of EPSG and OGC-defined coordinate systems by lazily pulling them from the embedded PROJ library.

But this PR also adds "OGC:CRS84" and "OGC:CRS83" definitions in core. This list of built-in definitions may or may not be extended in the future. Or we may create a separate dedicated extension that only supplies coordinate system definitions (similar to icu and encodings).

Support for CRS propagation through (Geo)Arrow import/export

This PR also adds support for propagating the CRS when exporting/importing from (Geo)Arrow. I had to make some changes to drill-down the client context into the arrow extension code, but we always have it available when resolving extension types anyway so the changes only really touch the internals.

A nice consequence of this is that spatial:s GDAL integration automatically handles CRS propagation now too as its based on arrow, meaning that ST_Read() outputs GEOMETRY columns with the CRS specified by the underlying file, and COPY ... TO (FORMAT GDAL) also encodes the CRS properly.

Update spatial to v1.5 Branch

This PR also adds back and bumps spatial to the v1.5 branch.

@Maxxen Maxxen changed the title Coordinate Reference System Work GEOMETRY Rework: Part 7 - More Coordinate Reference System Support Jan 28, 2026
@Maxxen Maxxen marked this pull request as ready for review February 2, 2026 18:49
@Maxxen
Copy link
Member Author

Maxxen commented Feb 2, 2026

Who canceled my CI?? >:(

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 3, 2026 09:32
@Maxxen Maxxen marked this pull request as ready for review February 3, 2026 11:25
@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 3, 2026 21:44
@Maxxen Maxxen marked this pull request as ready for review February 3, 2026 23:44
@Maxxen
Copy link
Member Author

Maxxen commented Feb 4, 2026

@carlopi Do you think this failure is also curl or unrelated?

@carlopi
Copy link
Contributor

carlopi commented Feb 4, 2026

Failures in test/sql/json/table/internal_issue_6807.test_slow like "Expected 9, Actual 8" are known, an will be solved via #20790

What's weird about this failure is the "Expected 9, Actual 12", I suspect there might be some concurrency issues in the JSON reader now that CachingFileSystem kicked in.

1. /duckdb_build_dir/build/release/_deps/httpfs_extension_fc-src/test/sql/json/table/internal_issue_6807.test_slow:15
================================================================================
Wrong result in query! (/duckdb_build_dir/build/release/_deps/httpfs_extension_fc-src/test/sql/json/table/internal_issue_6807.test_slow:15)!
================================================================================
SELECT count(*) FROM duckdb_logs_parsed('HTTP') WHERE request.type = 'GET' GROUP BY request.type;
================================================================================
Mismatch on row 1, column count_star()(index 1)
12 <> 9
================================================================================
Expected result:
================================================================================
9

================================================================================
Actual result:
================================================================================
12

@lnkuiper / @dentiny: it's this failure mode also expected?

@carlopi
Copy link
Contributor

carlopi commented Feb 4, 2026

@Maxxen: short answer is: not on you.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The PR looks good to me - could you just re-run the settings generation script?

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 4, 2026 10:07
@Maxxen Maxxen marked this pull request as ready for review February 4, 2026 10:07
@Maxxen
Copy link
Member Author

Maxxen commented Feb 4, 2026

@Mytherin I think we should be good to go now

@Mytherin Mytherin merged commit 8e6b15b into duckdb:v1.5-variegata Feb 4, 2026
104 of 105 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Feb 4, 2026

Thanks!

@dentiny
Copy link
Contributor

dentiny commented Feb 5, 2026

  1. /duckdb_build_dir/build/release/_deps/httpfs_extension_fc-src/test/sql/json/table/internal_issue_6807.test_slow:15
    ================================================================================
    Wrong result in query! (/duckdb_build_dir/build/release/_deps/httpfs_extension_fc-src/test/sql/json/table/internal_issue_6807.test_slow:15)!
    ================================================================================
    SELECT count(*) FROM duckdb_logs_parsed('HTTP') WHERE request.type = 'GET' GROUP BY request.type;
    ================================================================================
    Mismatch on row 1, column count_star()(index 1)
    8 <> 9
    ================================================================================
    Expected result:
    ================================================================================
    9
    ================================================================================
    Actual result:
    ================================================================================
    8

This is from CI failure,
I think I should be apply the patch in merged PR, but removed at a recent PR (well I actually ran it a few times locally, never saw request count 12 or 13, embarrassingly).

The slow test is introduced at this PR, which is used to verify (1) the correctness of httpfs extension with logarithmically read buffer growth; and (2) the actual GET count is less than certain threshold, used to check the effectiveness of buffer growth.
Do you think it's better to deflake the test by testing the request count is within certain range, instead of a precise value?
Example: dentiny/duckdb-httpfs#1

@carlopi
Copy link
Contributor

carlopi commented Feb 5, 2026

@dentiny, thanks for following up on this.

There was originally a problem where the patch was not correctly applied, and that caused the 8 vs 9 failures.
That's now solved.

What's still open is that I have seen for example at https://github.com/duckdb/duckdb/actions/runs/21678139259/job/62510582938?pr=20810#step:28:168 another failure with 12 vs 8.

My question would be: are there reasonable cases where 12 (or 13, also that happeend) could be returned, or that means there is some concurrency issue where multiple threads are reading the file, and depending on timing behaviour would change?

I was also unable to repro locally, there results are consistent also varying the number of threads, but in CI it seems to fail every 20 runs or so.
It might be possible to reproduce consistently running that single test in a loop in CI setting, but I have not tried that.

@lnkuiper
Copy link
Collaborator

lnkuiper commented Feb 9, 2026

I'm not 100% sure but I vaguely remember seeing this test fail with 12 or 13 requests when GH actions was having issues. It could also be related to retrying requests? I would say the best way forward is to check that the number of requests is < 20. It was 100+ before the logarithmic growth so that should be fine.

krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Feb 28, 2026
Date: 2026-02-04 22:36:23 +0100

`GEOMETRY` Rework: Part 7 - More Coordinate Reference System Support (duckdb/duckdb#20721)
Bump and remove patch httpfs (duckdb/duckdb#20790)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Feb 28, 2026
Date: 2026-02-04 22:36:23 +0100

`GEOMETRY` Rework: Part 7 - More Coordinate Reference System Support (duckdb/duckdb#20721)
Bump and remove patch httpfs (duckdb/duckdb#20790)
krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Feb 28, 2026
Date: 2026-02-04 22:36:23 +0100

`GEOMETRY` Rework: Part 7 - More Coordinate Reference System Support (duckdb/duckdb#20721)
Bump and remove patch httpfs (duckdb/duckdb#20790)
krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Feb 28, 2026
Date: 2026-02-04 22:36:23 +0100

`GEOMETRY` Rework: Part 7 - More Coordinate Reference System Support (duckdb/duckdb#20721)
Bump and remove patch httpfs (duckdb/duckdb#20790)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants