Add function to retrieve client context for table functions by VGSML · Pull Request #18232 · duckdb/duckdb

VGSML · 2025-07-14T07:01:40Z

This PR adds new CAPI function to retrieve the client context for the table functions bind, it's required to get a connection Id that bind the function

taniabogatsch

Looks good! Could you just add a small test to test/api/capi/capi_table_functions.cpp? Similar to how we test this for duckdb_scalar_function_get_client_context in test/api/capi/capi_scalar_functions.cpp.

…uckdb#18395) Follow-up from duckdb#18390 This PR implements metadata re-use at the row group level, so if we are e.g. appending to a large table we no longer rewrite the metadata of unchanged row-groups, and instead refer to the existing metadata on-disk. In addition, this PR also performs a few fixes where we would eagerly load columns unnecessarily. ### Performance Consider a database storing TPC-H SF100. The biggest table (lineitem) has 600M rows. ```sql CALL dbgen(sf=100); ``` Now insert a single row into the largest table (`lineitem`) and checkpoint. We measure two times: the time of the `CHECKPOINT` command, and the full runtime of opening the database, running the insert + checkpoint, and closing the database. ```sql INSERT INTO lineitem VALUES (600000001, 0, 0, 0, 0, 0, 0, 0, '', '', DATE '2000-01-01', DATE '2000-01-01', DATE '2000-01-01', '', '', ''); CHECKPOINT; ``` | Operation | v1.3.2 | New | |------------|--------|-------| | Checkpoint | 0.5s | 0.11s | | Full | 0.63s | 0.13s |

Clean up error messages. - Use the same error messages in UHUGEINT and HUGEINT - Fix operation symbols in those error messages - Fix operation names in those error messages

…s for converting queries back to SQL (duckdb#18394) For execution, whether or not a cross join is implicit or not does not matter - but it matters for converting to SQL as implicit and explicit cross joins are handled in different orders. This fixes an issue where `ToString()` would emit a query that was not re-executable.

This PR fixes discussion mirroring. The `gh` CLI tool has limited support to work with discussions (cli/cli#4212), so the recommendation is to script using GraphQL. However, we actually do not need to do much with the discussions to mirror them: we simply need to open an internal issue. There is no need to adjust the labels of discussions.

…Metadata blocks (duckdb#18398) Follow-up from duckdb#18395 This PR adds storage for `extra_metadata_blocks` in the row-group metadata when the latest storage version (v1.4.0) is used. This is a list of metadata blocks that are referenced by the row group metadata, but **not** present in the list of data pointers (`data_pointers`). These blocks can be present when either (1) columns are very wide due to e.g. being deeply nested, or (2) the final column pointer "crosses" the metadata block threshold, as `data_pointers` points only to the beginning of the column metadata. Usually this list is empty - and therefore storing this does not take up a lot of extra storage space. The presence of this list allows us to more efficiently re-use metadata as we know exactly which metadata blocks a row group points to, without having to do any additional deserialization. ### Only Flush Dirty Metadata blocks Previously our metadata manager would flush all metadata blocks, incurring a lot of unnecessary I/O now that we are doing a lot of metadata re-use. This PR reworks this so that we keep track of which blocks are dirty and only flush the dirty blocks. ### Performance Running the benchmark in duckdb#18395 again, we now get the following timings: | Operation | v1.3.2 | Re-Use | New | |------------|--------|-------|-------| | Checkpoint | 0.5s | 0.11s | 0.04s | | Full | 0.63s | 0.13s | 0.07s |

* implement materialized sorting interface * add materialization to cdc * Create hashed sort infrastructure * Add Finalize call * Add merged callback (for linear mask computation) * First passing test! * Create hash groups during Combine * Remove redundant (and incorrect) counting * Correctly scan sort columns that are not at the front of the payload * Materialise sort expressions * Handle no distinct values when computing masks * Move OVER() hash group construction to Finalize * Optimize single sort keys * Correctly call DistinctFrom for ORDER BY keys * Finish conversion of the collection scanner to pure chunks. * Explicitly ReferenceStructColumns in ComputeMasks instead of trying to guess. * Apply BatchedDataCollection patch. * Disable unrelated GetAllNeighborSets asserts. * Fix allocation threading in WindowDistinctAggregatorLocalState * Fix non-deterministic window tests with unstable sorting * Fix amalgamation build header problem.

…obal)

…s set

… default schema

…p verify_fetch_row skipped tests

We had a bug in the error message construction if too many type parameters were passed (duckdb#18452). This PR fixes this.

…root binder stored in the binder itself

# Conflicts: # src/transaction/duck_transaction_manager.cpp

- Follow-up to duckdb#17754 and duckdb#17875 Related issue: duckdblabs/duckdb-internal#4839.

Support these forms: - `DuckDB.Config(option=value)` and `Config(["option" => "value"])` instead of requiring separate `Config()` + set options one by on. - `DuckDB.DB(..., config=(;option=value))` - special `readonly` argument, `DuckDB.DB(..., readonly=true)`, that already exists in the python client

Just improving the ToString() for AttachInfo to add a missing space.

Changed enum value UnicodeType::UNICODE to UnicodeType::UTF8 to avoid conflict with /DUNICODE on MSVC

Fixes duckdblabs/duckdb-internal#5475 This removes the assertion `D_ASSERT(!entry_value.empty())`. This assertion was based on the assumption that an empty string was an invalid state at this point. However, I think empty strings are a valid case and are handled gracefully by downstream code. The assertion was therefore too aggressive and has been removed.

* CSVBuffer: check if block is loaded in CSVBuffer::Pin * Prevent rolling back already committed transaction in the MetaTransaction * Use AddBlob in DictFSST - it can also be used to store blobs which are invalid UTF8 * Align RLE count downwards to ensure pointers are aligned * Only do parsed statement verification for query verification, clean up verify_fetch_row skipped tests * Increase depth for correlated subqueries correctly, and keep depth + root binder stored in the binder itself * Re-organize oss-fuzz cases to be in sequential order instead of using the random case numbers

…nto feat/table-udf-client-context # Conflicts: # src/include/duckdb.h # src/include/duckdb/main/capi/extension_api.hpp

taniabogatsch · 2025-08-06T06:14:36Z

I think something must've gone wrong on the merge here?

VGSML · 2025-08-06T07:08:39Z

Yes, I have made a new one PR

This PR adds new CAPI function to retrieve the client context for the table functions bind, it's required to get a connection Id that bind the function. This replaces #18232

VGSML marked this pull request as draft July 14, 2025 07:02

Mytherin requested a review from taniabogatsch July 14, 2025 07:39

taniabogatsch reviewed Jul 14, 2025

View reviewed changes

Mytherin added the Changes Requested label Jul 18, 2025

sheldonrobinson and others added 26 commits July 24, 2025 15:55

Merge branch 'duckdb:main' into main

da45c36

Merge branch 'window-sorting' into sort-aggregate

94062c5

Merge branch 'window-sorting' into naive-sorting

a5651c4

Tabwidth 4, 2 Tab indent

feee2df

Tabwidth 4, 2 Tab indent

7b98095

Wrap runner.ExecuteFile, otherwise cleanup is not properly performed

08bb037

Merge remote-tracking branch 'origin/main' into merge_13_in_main

d15b716

Correct and consistent integer arithmetic error messages (duckdb#18393)

26cad5b

Clean up error messages. - Use the same error messages in UHUGEINT and HUGEINT - Fix operation symbols in those error messages - Fix operation names in those error messages

ff

2d6849c

Use GetSetting/GetEnum for more settings

5962f56

feat: switch to using extension-ci-tools for extension builds

c6eefdf

error on unresolved template types after the bind callback

f8903ac

Merge branch 'main' into index-append-buffer

8fc9682

add overload for list_value

2c39f2a

fix error msg

f4367dd

move index binding into bind_insert

f174154

Allow extension settings to have a defined default scope (local or gl…

9da893e

…obal)

Remove incorrect assertion

b338425

Add DuckLake back in

30f5c1f

switch to main extension-ci-tools

a1c03b1

When resizing we only need to care if it's same sign and MSB of MSD i…

4888984

…s set

aplavin and others added 25 commits August 4, 2025 10:49

bump minimal supported Julia version

2831b1c

restore previously existing test

bf90b8e

In ParseInfo::QualifierToString, also stringify schema when it is the…

04f3fb4

… default schema

Only do parsed statement verification for query verification, clean u…

3ef1032

…p verify_fetch_row skipped tests

Remove unnecessarily skipped tests

ddd70d1

Oss-Fuzz cases + clean-up

9f52d57

Fix accidental internal exception in type transformation (duckdb#18492)

13bef88

We had a bug in the error message construction if too many type parameters were passed (duckdb#18452). This PR fixes this.

Increase depth for correlated subqueries correctly, and keep depth + …

28ada04

…root binder stored in the binder itself

Add new ossfuzz cases

df2b3c5

Merge branch 'main' into bugfixes

d155bb2

fix: add missing space in AttachInfo::ToString()

bf8b00b

Merge branch 'v1.3-ossivalis' into merge-ossivalis

2b36ddd

# Conflicts: # src/transaction/duck_transaction_manager.cpp

Revert collate fix

8db04a0

Format fix

9d4cd2d

[Profiling] Add client context into read functions (duckdb#18438)

405e363

- Follow-up to duckdb#17754 and duckdb#17875 Related issue: duckdblabs/duckdb-internal#4839.

fix: add missing space in AttachInfo::ToString() (duckdb#18500)

7704ef4

Just improving the ToString() for AttachInfo to add a missing space.

Merge ossivalis (duckdb#18502)

3b49e82

Merge branch 'main' into bugfixes

ceee089

Change UNICODE to UTF8 (duckdb#17586)

6f1022b

Changed enum value UnicodeType::UNICODE to UnicodeType::UTF8 to avoid conflict with /DUNICODE on MSVC

feat: Add functions to retrieve client context for table functions

97a1ca0

feat: rebase

078ecb4

Merge remote-tracking branch 'origin/feat/table-udf-client-context' i…

ad85600

…nto feat/table-udf-client-context # Conflicts: # src/include/duckdb.h # src/include/duckdb/main/capi/extension_api.hpp

VGSML mentioned this pull request Aug 6, 2025

Add CAPI to retrieve client context for table functions #18520

Merged

VGSML closed this Aug 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function to retrieve client context for table functions#18232

Add function to retrieve client context for table functions#18232
VGSML wants to merge 967 commits intoduckdb:mainfrom
VGSML:feat/table-udf-client-context

VGSML commented Jul 14, 2025

Uh oh!

taniabogatsch left a comment

Uh oh!

taniabogatsch commented Aug 6, 2025

Uh oh!

VGSML commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

VGSML commented Jul 14, 2025

Uh oh!

taniabogatsch left a comment

Choose a reason for hiding this comment

Uh oh!

taniabogatsch commented Aug 6, 2025

Uh oh!

VGSML commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants