Skip to content

docs/data/csv: auto_type_candidates#5459

Merged
szarnyasg merged 2 commits intoduckdb:mainfrom
Djfe:auto_type_candidates
Oct 20, 2025
Merged

docs/data/csv: auto_type_candidates#5459
szarnyasg merged 2 commits intoduckdb:mainfrom
Djfe:auto_type_candidates

Conversation

@Djfe
Copy link
Contributor

@Djfe Djfe commented May 31, 2025

I split both changes so 09901dd can be applied later to docs folder 1.2

Both commits should be applied to the folder stable as well.

Disclaimer: This was verified using the Python API and SQL commands.
I could not find the code that maps NULL to SQLNULL. The DuckDB code base uses SQLNULL and TIMESTAMP_TZ instead of NULL and TIMESTAMPTZ like the external SQL API.

Djfe added 2 commits May 31, 2025 12:38
improve documentation of the csv_reader option auto_type_candidates
- rename SQLNULL to NULL, add NULL to default set of candidate types
- reorder candidate types by their weight of specificity [0]
- add additional explanations

Sources:
[0] https://github.com/duckdb/duckdb/blob/dcf0e1c8936d74be48fd1cc0309638117b43aa47/src/execution/operator/csv_scanner/util/csv_reader_options.cpp#L523-L530
[1] https://github.com/duckdb/duckdb/blob/dcf0e1c8936d74be48fd1cc0309638117b43aa47/src/include/duckdb/execution/operator/csv_scanner/csv_reader_options.hpp#L82-L86

Signed-off-by: Felix Baumann <felix.bau@gmx.de>
add TIMESTAMPTZ to default set of auto_type_candidates

DuckDB 1.3.0 added support for TIMESTAMPTZ in type detection
[0] duckdb/duckdb@a3bc569

Signed-off-by: Felix Baumann <felix.bau@gmx.de>
@Djfe
Copy link
Contributor Author

Djfe commented May 31, 2025

Right, I forgot to mention: NULL doesn't appear to make a difference to auto_type_candidates. Though adding it doesn't hurt either.
Even empty lines are of type VARCHAR for some reason (despite SQLNULL having the heighest weight of specificity).
And I cannot cast VARCHAR columns to type NULL which means I'm not even sure, it makes sense to mention that it belongs to the auto_type_candidates since it might serve no purpose.

EDIT:
It might be worth mentioning, that DECIMAL doesn't detect precision and defaults to DECIMAL(18,6). You need to set (prec, scale) manually.
And it only supports one of these DECIMAL statements (it chooses the last one provided and ignores all others). So listing several and letting DuckDB choose required precision is not possible atm/yet.
duckdb/duckdb#11639 (comment)

Mytherin added a commit to duckdb/duckdb that referenced this pull request Jun 2, 2025
correct explanation about specificity by inverting it

SQLNULL has the highest specificity not the lowest VARCHAR is the
fallback and has therefore the lowest specificity

See
https://github.com/duckdb/duckdb/blob/dcf0e1c8936d74be48fd1cc0309638117b43aa47/src/execution/operator/csv_scanner/util/csv_reader_options.cpp#L523-L530

I was confused by the code comment in the header file while improving
the documentation since it contradicts the actual specifity weights in
the cpp file.
duckdb/duckdb-web#5459
@szarnyasg szarnyasg requested a review from pdet June 3, 2025 13:19
Copy link
Contributor

@pdet pdet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm also fine with adding the info about the DECIMAL type.

@Djfe
Copy link
Contributor Author

Djfe commented Jun 4, 2025

How does autodetection of type NULL look like? Or does that rather concern single rows and not a full column?
Should we drop it from documentation or is it related to the read_csv parameter nullstr?

@szarnyasg
Copy link
Collaborator

Sorry about dropping this, this should have been merged a long time ago...!

@szarnyasg szarnyasg merged commit 4b94eb6 into duckdb:main Oct 20, 2025
4 checks passed
@szarnyasg
Copy link
Collaborator

Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants