Add function to retrieve client context for table functions#18232
Closed
VGSML wants to merge 967 commits intoduckdb:mainfrom
Closed
Add function to retrieve client context for table functions#18232VGSML wants to merge 967 commits intoduckdb:mainfrom
VGSML wants to merge 967 commits intoduckdb:mainfrom
Conversation
Contributor
taniabogatsch
left a comment
There was a problem hiding this comment.
Looks good! Could you just add a small test to test/api/capi/capi_table_functions.cpp? Similar to how we test this for duckdb_scalar_function_get_client_context in test/api/capi/capi_scalar_functions.cpp.
…uckdb#18395) Follow-up from duckdb#18390 This PR implements metadata re-use at the row group level, so if we are e.g. appending to a large table we no longer rewrite the metadata of unchanged row-groups, and instead refer to the existing metadata on-disk. In addition, this PR also performs a few fixes where we would eagerly load columns unnecessarily. ### Performance Consider a database storing TPC-H SF100. The biggest table (lineitem) has 600M rows. ```sql CALL dbgen(sf=100); ``` Now insert a single row into the largest table (`lineitem`) and checkpoint. We measure two times: the time of the `CHECKPOINT` command, and the full runtime of opening the database, running the insert + checkpoint, and closing the database. ```sql INSERT INTO lineitem VALUES (600000001, 0, 0, 0, 0, 0, 0, 0, '', '', DATE '2000-01-01', DATE '2000-01-01', DATE '2000-01-01', '', '', ''); CHECKPOINT; ``` | Operation | v1.3.2 | New | |------------|--------|-------| | Checkpoint | 0.5s | 0.11s | | Full | 0.63s | 0.13s |
Clean up error messages. - Use the same error messages in UHUGEINT and HUGEINT - Fix operation symbols in those error messages - Fix operation names in those error messages
…s for converting queries back to SQL (duckdb#18394) For execution, whether or not a cross join is implicit or not does not matter - but it matters for converting to SQL as implicit and explicit cross joins are handled in different orders. This fixes an issue where `ToString()` would emit a query that was not re-executable.
This PR fixes discussion mirroring. The `gh` CLI tool has limited support to work with discussions (cli/cli#4212), so the recommendation is to script using GraphQL. However, we actually do not need to do much with the discussions to mirror them: we simply need to open an internal issue. There is no need to adjust the labels of discussions.
…Metadata blocks (duckdb#18398) Follow-up from duckdb#18395 This PR adds storage for `extra_metadata_blocks` in the row-group metadata when the latest storage version (v1.4.0) is used. This is a list of metadata blocks that are referenced by the row group metadata, but **not** present in the list of data pointers (`data_pointers`). These blocks can be present when either (1) columns are very wide due to e.g. being deeply nested, or (2) the final column pointer "crosses" the metadata block threshold, as `data_pointers` points only to the beginning of the column metadata. Usually this list is empty - and therefore storing this does not take up a lot of extra storage space. The presence of this list allows us to more efficiently re-use metadata as we know exactly which metadata blocks a row group points to, without having to do any additional deserialization. ### Only Flush Dirty Metadata blocks Previously our metadata manager would flush all metadata blocks, incurring a lot of unnecessary I/O now that we are doing a lot of metadata re-use. This PR reworks this so that we keep track of which blocks are dirty and only flush the dirty blocks. ### Performance Running the benchmark in duckdb#18395 again, we now get the following timings: | Operation | v1.3.2 | Re-Use | New | |------------|--------|-------|-------| | Checkpoint | 0.5s | 0.11s | 0.04s | | Full | 0.63s | 0.13s | 0.07s |
* implement materialized sorting interface * add materialization to cdc * Create hashed sort infrastructure * Add Finalize call * Add merged callback (for linear mask computation) * First passing test! * Create hash groups during Combine * Remove redundant (and incorrect) counting * Correctly scan sort columns that are not at the front of the payload * Materialise sort expressions * Handle no distinct values when computing masks * Move OVER() hash group construction to Finalize * Optimize single sort keys * Correctly call DistinctFrom for ORDER BY keys * Finish conversion of the collection scanner to pure chunks. * Explicitly ReferenceStructColumns in ComputeMasks instead of trying to guess. * Apply BatchedDataCollection patch. * Disable unrelated GetAllNeighborSets asserts. * Fix allocation threading in WindowDistinctAggregatorLocalState * Fix non-deterministic window tests with unstable sorting * Fix amalgamation build header problem.
…p verify_fetch_row skipped tests
We had a bug in the error message construction if too many type parameters were passed (duckdb#18452). This PR fixes this.
…root binder stored in the binder itself
# Conflicts: # src/transaction/duck_transaction_manager.cpp
- Follow-up to duckdb#17754 and duckdb#17875 Related issue: duckdblabs/duckdb-internal#4839.
Support these forms: - `DuckDB.Config(option=value)` and `Config(["option" => "value"])` instead of requiring separate `Config()` + set options one by on. - `DuckDB.DB(..., config=(;option=value))` - special `readonly` argument, `DuckDB.DB(..., readonly=true)`, that already exists in the python client
Just improving the ToString() for AttachInfo to add a missing space.
Changed enum value UnicodeType::UNICODE to UnicodeType::UTF8 to avoid conflict with /DUNICODE on MSVC
Fixes duckdblabs/duckdb-internal#5475 This removes the assertion `D_ASSERT(!entry_value.empty())`. This assertion was based on the assumption that an empty string was an invalid state at this point. However, I think empty strings are a valid case and are handled gracefully by downstream code. The assertion was therefore too aggressive and has been removed.
* CSVBuffer: check if block is loaded in CSVBuffer::Pin * Prevent rolling back already committed transaction in the MetaTransaction * Use AddBlob in DictFSST - it can also be used to store blobs which are invalid UTF8 * Align RLE count downwards to ensure pointers are aligned * Only do parsed statement verification for query verification, clean up verify_fetch_row skipped tests * Increase depth for correlated subqueries correctly, and keep depth + root binder stored in the binder itself * Re-organize oss-fuzz cases to be in sequential order instead of using the random case numbers
…nto feat/table-udf-client-context # Conflicts: # src/include/duckdb.h # src/include/duckdb/main/capi/extension_api.hpp
Contributor
|
I think something must've gone wrong on the merge here? |
Contributor
Author
|
Yes, I have made a new one PR |
Mytherin
added a commit
that referenced
this pull request
Aug 7, 2025
This PR adds new CAPI function to retrieve the client context for the table functions bind, it's required to get a connection Id that bind the function. This replaces #18232
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds new CAPI function to retrieve the client context for the table functions bind, it's required to get a connection Id that bind the function