Skip to content

GEOMETRY Rework: Part 4 - Fixup Parquet Extension + Add Arrow Support#19476

Merged
Mytherin merged 7 commits intoduckdb:mainfrom
Maxxen:core-geom-step-4-fixup-parquet
Nov 7, 2025
Merged

GEOMETRY Rework: Part 4 - Fixup Parquet Extension + Add Arrow Support#19476
Mytherin merged 7 commits intoduckdb:mainfrom
Maxxen:core-geom-step-4-fixup-parquet

Conversation

@Maxxen
Copy link
Member

@Maxxen Maxxen commented Oct 23, 2025

This is a followup PR that builds on top of #19439. Please have a look at #19136 for the context behind this PR.

This PR fixes up the remaining issues in the parquet extension related to geometries. When reading geometry columns we now push an expression column reader on top of the underlying blob column reader to perform the WKB parsing with ST_GeomFromWKB. ST_GeomFromWKB now actually checks that the input is valid WKB and also converts from big-endian WKB to little-endian If required. This can be optimized further, but It's good enough for now.

I've also added support for converting geometry columns to/from arrow arrays with geoarrow extension metadata. This code is basically lifted straight from the spatial extension.

@Maxxen Maxxen force-pushed the core-geom-step-4-fixup-parquet branch from 7d1d0ac to d45280e Compare October 28, 2025 08:03
@Maxxen Maxxen force-pushed the core-geom-step-4-fixup-parquet branch from 2966146 to 5a669ba Compare October 31, 2025 14:53
@Maxxen Maxxen changed the title GEOMETRY Rework: Part 4 - Fixup Parquet Extension GEOMETRY Rework: Part 4 - Fixup Parquet Extension + Add Arrow Support Oct 31, 2025
@Maxxen Maxxen marked this pull request as ready for review October 31, 2025 15:22
@Maxxen Maxxen force-pushed the core-geom-step-4-fixup-parquet branch from 94f9052 to 92b55ba Compare November 3, 2025 08:00
@duckdb-draftbot duckdb-draftbot marked this pull request as draft November 3, 2025 08:01
@Maxxen Maxxen marked this pull request as ready for review November 3, 2025 11:27
@Maxxen Maxxen force-pushed the core-geom-step-4-fixup-parquet branch from 92b55ba to ccce39d Compare November 3, 2025 16:55
@duckdb-draftbot duckdb-draftbot marked this pull request as draft November 3, 2025 16:55
@Maxxen Maxxen marked this pull request as ready for review November 3, 2025 17:07
@Maxxen Maxxen force-pushed the core-geom-step-4-fixup-parquet branch from ccce39d to 9a61101 Compare November 5, 2025 13:20
@duckdb-draftbot duckdb-draftbot marked this pull request as draft November 5, 2025 13:20
@Maxxen Maxxen marked this pull request as ready for review November 5, 2025 13:24
@Maxxen Maxxen force-pushed the core-geom-step-4-fixup-parquet branch from 9a61101 to c40be99 Compare November 6, 2025 09:55
@duckdb-draftbot duckdb-draftbot marked this pull request as draft November 6, 2025 10:02
@Maxxen Maxxen marked this pull request as ready for review November 6, 2025 10:19

// Otherwise, unrecognized encoding
throw NotImplementedException("Unsupported geometry encoding");
// TODO: Pass the actual target type here so we get the CRS information too
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

struct ArrowGeometry {
static unique_ptr<ArrowType> GetType(const ArrowSchema &schema, const ArrowSchemaMetadata &schema_metadata) {
// Validate extension metadata. This metadata also contains a CRS, which we drop
// because the GEOMETRY type does not implement a CRS at the type level (yet).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boo! (Kidding, I know this is hard)

Comment on lines +36 to +38
statement ok
insert into t_all_types values
(1, 'POINT (1 2)'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of examples at https://github.com/apache/parquet-testing/blob/master/data/geospatial/geospatial.yaml as well if you ever get burnt out coming up with these (I frequently do 🙂 )

@Mytherin Mytherin merged commit b387478 into duckdb:main Nov 7, 2025
96 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Nov 7, 2025

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Nov 7, 2025
`GEOMETRY` Rework: Part 4 - Fixup Parquet Extension + Add Arrow Support (duckdb/duckdb#19476)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Nov 7, 2025
`GEOMETRY` Rework: Part 4 - Fixup Parquet Extension + Add Arrow Support (duckdb/duckdb#19476)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Mytherin added a commit that referenced this pull request Nov 20, 2025
…19848)

This is a followup PR that builds on top of
#19476. Please have a look at
#19136 for the context behind this
PR.

I realized I the `Geometry::FromBinary`/`Geometry::ToBinary` helper
functions need to be adjusted slightly so that they can be used to
implement the cast functions provided in `duckdb-spatial`. These casts
may move to core eventually, but for now this is required to integrate
the spatial extension with the new geometry type smoothly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants