Skip to content

Add function to retrieve client context for table functions#18232

Closed
VGSML wants to merge 967 commits intoduckdb:mainfrom
VGSML:feat/table-udf-client-context
Closed

Add function to retrieve client context for table functions#18232
VGSML wants to merge 967 commits intoduckdb:mainfrom
VGSML:feat/table-udf-client-context

Conversation

@VGSML
Copy link
Contributor

@VGSML VGSML commented Jul 14, 2025

This PR adds new CAPI function to retrieve the client context for the table functions bind, it's required to get a connection Id that bind the function

@VGSML VGSML marked this pull request as draft July 14, 2025 07:02
@Mytherin Mytherin requested a review from taniabogatsch July 14, 2025 07:39
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Could you just add a small test to test/api/capi/capi_table_functions.cpp? Similar to how we test this for duckdb_scalar_function_get_client_context in test/api/capi/capi_scalar_functions.cpp.

sheldonrobinson and others added 26 commits July 24, 2025 15:55
…uckdb#18395)

Follow-up from duckdb#18390

This PR implements metadata re-use at the row group level, so if we are
e.g. appending to a large table we no longer rewrite the metadata of
unchanged row-groups, and instead refer to the existing metadata
on-disk. In addition, this PR also performs a few fixes where we would
eagerly load columns unnecessarily.



### Performance

Consider a database storing TPC-H SF100. The biggest table (lineitem)
has 600M rows.

```sql
CALL dbgen(sf=100);
```


Now insert a single row into the largest table (`lineitem`) and
checkpoint. We measure two times: the time of the `CHECKPOINT` command,
and the full runtime of opening the database, running the insert +
checkpoint, and closing the database.

```sql
INSERT INTO lineitem VALUES (600000001, 0, 0, 0, 0, 0, 0, 0, '', '', DATE '2000-01-01', DATE '2000-01-01', DATE '2000-01-01', '', '', '');
CHECKPOINT;
```

| Operation  | v1.3.2 |  New  |
|------------|--------|-------|
| Checkpoint | 0.5s   | 0.11s |
| Full       | 0.63s  | 0.13s |
Clean up error messages.

- Use the same error messages in UHUGEINT and HUGEINT
- Fix operation symbols in those error messages
- Fix operation names in those error messages
…s for converting queries back to SQL (duckdb#18394)

For execution, whether or not a cross join is implicit or not does not
matter - but it matters for converting to SQL as implicit and explicit
cross joins are handled in different orders. This fixes an issue where
`ToString()` would emit a query that was not re-executable.
This PR fixes discussion mirroring.

The `gh` CLI tool has limited support to work with discussions
(cli/cli#4212), so the recommendation is
to script using GraphQL. However, we actually do not need to do much
with the discussions to mirror them: we simply need to open an internal
issue. There is no need to adjust the labels of discussions.
…Metadata blocks (duckdb#18398)

Follow-up from duckdb#18395

This PR adds storage for `extra_metadata_blocks` in the row-group
metadata when the latest storage version (v1.4.0) is used. This is a
list of metadata blocks that are referenced by the row group metadata,
but **not** present in the list of data pointers (`data_pointers`).
These blocks can be present when either (1) columns are very wide due to
e.g. being deeply nested, or (2) the final column pointer "crosses" the
metadata block threshold, as `data_pointers` points only to the
beginning of the column metadata.

Usually this list is empty - and therefore storing this does not take up
a lot of extra storage space. The presence of this list allows us to
more efficiently re-use metadata as we know exactly which metadata
blocks a row group points to, without having to do any additional
deserialization.

### Only Flush Dirty Metadata blocks

Previously our metadata manager would flush all metadata blocks,
incurring a lot of unnecessary I/O now that we are doing a lot of
metadata re-use. This PR reworks this so that we keep track of which
blocks are dirty and only flush the dirty blocks.

### Performance

Running the benchmark in duckdb#18395
again, we now get the following timings:

| Operation  | v1.3.2 |  Re-Use  |  New  |
|------------|--------|-------|-------|
| Checkpoint | 0.5s   | 0.11s | 0.04s |
| Full       | 0.63s  | 0.13s | 0.07s |
* implement materialized sorting interface
* add materialization to cdc
* Create hashed sort infrastructure
* Add Finalize call
* Add merged callback (for linear mask computation)
* First passing test!
* Create hash groups during Combine
* Remove redundant (and incorrect) counting
* Correctly scan sort columns that are not at the front of the payload
* Materialise sort expressions
* Handle no distinct values when computing masks
* Move OVER() hash group construction to Finalize
* Optimize single sort keys
* Correctly call DistinctFrom for ORDER BY keys
* Finish conversion of the collection scanner to pure chunks.
* Explicitly ReferenceStructColumns in ComputeMasks instead of trying to
guess.
* Apply BatchedDataCollection patch.
* Disable unrelated GetAllNeighborSets asserts.
* Fix allocation threading in WindowDistinctAggregatorLocalState
* Fix non-deterministic window tests with unstable sorting
* Fix amalgamation build header problem.
aplavin and others added 25 commits August 4, 2025 10:49
We had a bug in the error message construction if too many type
parameters were passed (duckdb#18452). This PR fixes this.
# Conflicts:
#	src/transaction/duck_transaction_manager.cpp
Support these forms:
- `DuckDB.Config(option=value)` and `Config(["option" => "value"])`
instead of requiring separate `Config()` + set options one by on.
- `DuckDB.DB(..., config=(;option=value))`
- special `readonly` argument, `DuckDB.DB(..., readonly=true)`, that
already exists in the python client
Just improving the ToString() for AttachInfo to add a missing space.
Changed enum value UnicodeType::UNICODE to UnicodeType::UTF8 to avoid
conflict with /DUNICODE on MSVC
Fixes duckdblabs/duckdb-internal#5475

This removes the assertion `D_ASSERT(!entry_value.empty())`.

This assertion was based on the assumption that an empty string was an
invalid state at this point. However, I think empty strings are a valid
case and are handled gracefully by downstream code. The assertion was
therefore too aggressive and has been removed.
* CSVBuffer: check if block is loaded in CSVBuffer::Pin
* Prevent rolling back already committed transaction in the
MetaTransaction
* Use AddBlob in DictFSST - it can also be used to store blobs which are
invalid UTF8
* Align RLE count downwards to ensure pointers are aligned
* Only do parsed statement verification for query verification, clean up
verify_fetch_row skipped tests
* Increase depth for correlated subqueries correctly, and keep depth +
root binder stored in the binder itself
* Re-organize oss-fuzz cases to be in sequential order instead of using
the random case numbers
…nto feat/table-udf-client-context

# Conflicts:
#	src/include/duckdb.h
#	src/include/duckdb/main/capi/extension_api.hpp
@taniabogatsch
Copy link
Contributor

I think something must've gone wrong on the merge here?

@VGSML
Copy link
Contributor Author

VGSML commented Aug 6, 2025

Yes, I have made a new one PR

Mytherin added a commit that referenced this pull request Aug 7, 2025
This PR adds new CAPI function to retrieve the client context for the
table functions bind, it's required to get a connection Id that bind the
function.

This replaces #18232
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.