Skip to content

Fix roundtripping of stringified nested types#16304

Merged
Mytherin merged 63 commits intoduckdb:mainfrom
Tishj:escape_stringified_lists
Mar 19, 2025
Merged

Fix roundtripping of stringified nested types#16304
Mytherin merged 63 commits intoduckdb:mainfrom
Tishj:escape_stringified_lists

Conversation

@Tishj
Copy link
Contributor

@Tishj Tishj commented Feb 18, 2025

This PR fixes https://github.com/duckdblabs/duckdb-internal/issues/3945

Follow up to #15944

Single quotes are added around the value when it contains special characters, trailing/leading whitespace or the value is "NULL".
If single quotes or backslashes are present in the value, they are escaped.

Added benchmark results:
New (this PR):

benchmark/micro/cast/cast_varcharlist_string.benchmark  1       0.217045
benchmark/micro/cast/cast_varcharlist_string.benchmark  2       0.215973
benchmark/micro/cast/cast_varcharlist_string.benchmark  3       0.224540
benchmark/micro/cast/cast_varcharlist_string.benchmark  4       0.230012
benchmark/micro/cast/cast_varcharlist_string.benchmark  5       0.226749
benchmark/micro/cast/cast_varcharmap_string.benchmark   1       1.266191
benchmark/micro/cast/cast_varcharmap_string.benchmark   2       1.297789
benchmark/micro/cast/cast_varcharmap_string.benchmark   3       1.297194
benchmark/micro/cast/cast_varcharmap_string.benchmark   4       1.292299
benchmark/micro/cast/cast_varcharmap_string.benchmark   5       1.285754
benchmark/micro/cast/cast_varcharstruct_string.benchmark        1       0.407835
benchmark/micro/cast/cast_varcharstruct_string.benchmark        2       0.415330
benchmark/micro/cast/cast_varcharstruct_string.benchmark        3       0.407894
benchmark/micro/cast/cast_varcharstruct_string.benchmark        4       0.405755
benchmark/micro/cast/cast_varcharstruct_string.benchmark        5       0.405011

Old (main):

benchmark/micro/cast/cast_varcharlist_string.benchmark  1       0.206750
benchmark/micro/cast/cast_varcharlist_string.benchmark  2       0.211269
benchmark/micro/cast/cast_varcharlist_string.benchmark  3       0.206543
benchmark/micro/cast/cast_varcharlist_string.benchmark  4       0.207847
benchmark/micro/cast/cast_varcharlist_string.benchmark  5       0.210660

(casting from map used to be done by appending to a std::string, which explains the speed up)

benchmark/micro/cast/cast_varcharmap_string.benchmark   1       1.904629
benchmark/micro/cast/cast_varcharmap_string.benchmark   2       1.997251
benchmark/micro/cast/cast_varcharmap_string.benchmark   3       1.975662
benchmark/micro/cast/cast_varcharmap_string.benchmark   4       2.078949
benchmark/micro/cast/cast_varcharmap_string.benchmark   5       2.047442
benchmark/micro/cast/cast_varcharstruct_string.benchmark        1       0.244424
benchmark/micro/cast/cast_varcharstruct_string.benchmark        2       0.261174
benchmark/micro/cast/cast_varcharstruct_string.benchmark        3       0.257688
benchmark/micro/cast/cast_varcharstruct_string.benchmark        4       0.246197
benchmark/micro/cast/cast_varcharstruct_string.benchmark        5       0.244702

Misc

generate_extension_entries.py received some changes.
Most notably the deserializing of the generated extension_entries.hpp array entries has been changed from a regex approach to manually parsing the input (to address an unrelated issue, related to entries that span multiple lines).

We used to query the shell with a -csv flag, expecting this to produce unquoted values, which is no longer the case as we need to escape VARCHAR[] among others.
This has been changed to use -json instead, so we can use the json python library to reliably retrieve the results.

In built_in_functions.cpp, BuiltinFunctions::RegisterExtensionOverloads now parses the list of parameter types by casting to VARCHAR[] then iterating over the produced list entries.

Tishj added 30 commits February 10, 2025 17:34
…will already have escaped whatever was necessary to be escaped
… LIST and STRUCT, add escape support as well
…, only escaping singlequotes and backslashes within that string
@Tishj Tishj marked this pull request as ready for review March 7, 2025 09:28
@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 13, 2025 11:38
@Tishj Tishj marked this pull request as ready for review March 13, 2025 11:57
@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 13, 2025 16:06
@Tishj Tishj marked this pull request as ready for review March 14, 2025 08:25
@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 19, 2025 09:32
@Tishj Tishj marked this pull request as ready for review March 19, 2025 09:32
@Mytherin Mytherin merged commit d89d5fc into duckdb:main Mar 19, 2025
50 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

@szarnyasg szarnyasg added the Needs Documentation Use for issues or PRs that require changes in the documentation label Mar 21, 2025
Mytherin added a commit that referenced this pull request Mar 21, 2025
…roundtripping of stringified nested types` PR (#16775)

This PR updates the expected results according to the changes merged
with [this PR](#16304), which
should resolve the issue:
duckdblabs/duckdb-internal#4477
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Fix roundtripping of stringified nested types (duckdb/duckdb#16304)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Fix roundtripping of stringified nested types (duckdb/duckdb#16304)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025
Fix roundtripping of stringified nested types (duckdb/duckdb#16304)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025
Fix roundtripping of stringified nested types (duckdb/duckdb#16304)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
Fix roundtripping of stringified nested types (duckdb/duckdb#16304)
hannes added a commit that referenced this pull request Aug 11, 2025
…8541)

and use the `ExtensionLoader` to register functions instead of the old
API.

PR #16304 fixed some `LIST` casting
issues regarding round-tripping (e.g., adding quotes or escaping them),
which introduced casting issues for JSON. These issues were found by the
JSON tests, but the tests were simply changed to match the new behavior
instead of fixing the issues.

The issue seemed like a rendering issue only, but when returning `JSON`
to our clients, e.g., Python, we cast to `VARCHAR`, which now adds
quotes or escapes them, creating invalid JSON. This PR fixes these
issues by creating special-case casts from `VARCHAR` to `JSON` and vice
versa.

I don't think this is needed for structs, as the default cast (which is
to reinterpret) is used for structs, and we shouldn't get invalid JSON
(e.g., surrounded by single quotes) when getting a struct that contains
JSON into a client like Python.

Fixes:
 * #17647
 * duckdblabs/duckdb-internal#5498
@Tishj Tishj deleted the escape_stringified_lists branch November 7, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Documentation Use for issues or PRs that require changes in the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants