[Nested] Optimize List Type in list_value#17063
Merged
Mytherin merged 6 commits intoduckdb:mainfrom Apr 12, 2025
Merged
Conversation
Mytherin
reviewed
Apr 10, 2025
Collaborator
Mytherin
left a comment
There was a problem hiding this comment.
Thanks for the PR! LGTM - we can't really get around doing a copy here since we have to merge the data of multiple lists into one list.
Contributor
taniabogatsch
left a comment
There was a problem hiding this comment.
Hi, looks good! Just left two nits. :)
Contributor
Author
|
Thanks for your feedback, I've implemented your suggestions and should be good to go now :) |
Collaborator
|
Thanks! |
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 18, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063) Re-enable Avro on core (duckdb/duckdb#17072) Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 18, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063) Re-enable Avro on core (duckdb/duckdb#17072) Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 18, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063) Re-enable Avro on core (duckdb/duckdb#17072) Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 19, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063) Re-enable Avro on core (duckdb/duckdb#17072) Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
Mytherin
added a commit
that referenced
this pull request
Jun 19, 2025
This is a follow-up to - #12468 - #17063. This update focuses specifically on optimizing performance when constructing `LIST_VALUE` with `STRUCT` elements. ### Speed up examples: #### [Large Struct Table](https://github.com/duckdb/duckdb/compare/main...maiadegraaf:duckdb:list_value-struct-optimize?expand=1#diff-2dca289baae01441b6757184d594ecf8915f4d4a460e98522cb2ffb4c85fc9ac) (Benchmark added in this PR): ```sql CREATE TABLE large_struct_table AS SELECT {'a': i - 5, 'b': i - 4} AS a, {'a': i - 3, 'b': i - 2} AS b, {'a': i - 1,'b': i} AS c FROM range(10000000) tbl(i); SELECT LIST_VALUE(a, b, c) FROM large_struct_table; ``` | v1.2.2 | New | |---|---| |5.83 s|0.133 s| #### [Nested Unpivot](https://github.com/duckdb/duckdb/blob/83544bff291ef8ef32a6dfd9fc4e69626c4dd90a/benchmark/pivot/unpivot_struct_payload.benchmark#L4) ```sql CREATE TABLE structs AS SELECT i AS id, {'id': i} AS "2020", {'id': i + 10} AS "2021", {'id': i + 100} AS "2022", {'id': i + 1000} AS "2023", {'id': i + 10000} AS "2024" FROM range(10000000) t(i) UNPIVOT structs ON "2020", "2021", "2022", "2023", "2024" ``` | v1.2.2 | New | |---|---| |3.24 s| 0.39 s|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In #12468
list_valuewas optimized for primitive types. This PR aims to build on that and optimize list types.For example:
Large Tables
Large Lists
Nested Lists
While these results show improvements, the timings are still slower than desired. Profiling suggests that most of the time is spent in VectorOperations::Copy. Any feedback or suggestions on how to further improve performance would be greatly appreciated!
Some additional tests and benchmarks have also been included.