Skip to content

[Nested] Optimize List Type in list_value#17063

Merged
Mytherin merged 6 commits intoduckdb:mainfrom
maiadegraaf:list_value_optimize_nested
Apr 12, 2025
Merged

[Nested] Optimize List Type in list_value#17063
Mytherin merged 6 commits intoduckdb:mainfrom
maiadegraaf:list_value_optimize_nested

Conversation

@maiadegraaf
Copy link
Contributor

@maiadegraaf maiadegraaf commented Apr 10, 2025

In #12468 list_value was optimized for primitive types. This PR aims to build on that and optimize list types.

For example:

Large Tables
CREATE TABLE large_list_table AS SELECT [i, i, i] AS a, [i + 1, i + 1] AS b, [i + 2] AS c FROM range(100000000) tbl(i);

SELECT LIST_VALUE(a, b, c) FROM large_list_table;
1.2.2 New
28.55s 8.68s
Large Lists
CREATE TABLE large_list AS SELECT list(i) AS a FROM range(1000000) t(i);

SELECT list_value(a, a, a, a, a) FROM large_list;
1.2.2 New
0.487s 0.0234s
Nested Lists
CREATE TABLE nested_lists AS SELECT [[i], [i + 1]] AS a, [[i, i], [i + 1, i + 1]] as b FROM range(10000) t(i);

SELECT list_value(a, b, a, b, a, b, a, b, a, b, a, b, a, b) FROM nested_lists;
1.2.2 New
0.128s 0.0075s

While these results show improvements, the timings are still slower than desired. Profiling suggests that most of the time is spent in VectorOperations::Copy. Any feedback or suggestions on how to further improve performance would be greatly appreciated!

Some additional tests and benchmarks have also been included.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! LGTM - we can't really get around doing a copy here since we have to merge the data of multiple lists into one list.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 11, 2025 08:08
@maiadegraaf maiadegraaf marked this pull request as ready for review April 11, 2025 08:08
@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 11, 2025 11:05
@maiadegraaf maiadegraaf marked this pull request as ready for review April 11, 2025 11:05
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, looks good! Just left two nits. :)

@maiadegraaf
Copy link
Contributor Author

Thanks for your feedback, I've implemented your suggestions and should be good to go now :)

@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 11, 2025 14:56
@maiadegraaf maiadegraaf marked this pull request as ready for review April 11, 2025 14:59
@Mytherin Mytherin merged commit 94d529e into duckdb:main Apr 12, 2025
52 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063)
Re-enable Avro on core (duckdb/duckdb#17072)
Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063)
Re-enable Avro on core (duckdb/duckdb#17072)
Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063)
Re-enable Avro on core (duckdb/duckdb#17072)
Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 19, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063)
Re-enable Avro on core (duckdb/duckdb#17072)
Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
@maiadegraaf maiadegraaf deleted the list_value_optimize_nested branch May 28, 2025 07:33
Mytherin added a commit that referenced this pull request Jun 19, 2025
This is a follow-up to 
- #12468
- #17063. 
 
This update focuses specifically on optimizing performance when
constructing `LIST_VALUE` with `STRUCT` elements.

### Speed up examples:
#### [Large Struct
Table](https://github.com/duckdb/duckdb/compare/main...maiadegraaf:duckdb:list_value-struct-optimize?expand=1#diff-2dca289baae01441b6757184d594ecf8915f4d4a460e98522cb2ffb4c85fc9ac)
(Benchmark added in this PR):
```sql
CREATE TABLE large_struct_table AS
SELECT {'a': i - 5, 'b': i - 4} AS a, {'a': i - 3, 'b': i - 2} AS b, {'a': i - 1,'b': i} AS c FROM range(10000000) tbl(i);

SELECT LIST_VALUE(a, b, c) FROM large_struct_table;
```

| v1.2.2 | New |
|---|---|
|5.83 s|0.133 s|

#### [Nested
Unpivot](https://github.com/duckdb/duckdb/blob/83544bff291ef8ef32a6dfd9fc4e69626c4dd90a/benchmark/pivot/unpivot_struct_payload.benchmark#L4)

```sql
CREATE TABLE structs AS
SELECT i AS id, {'id': i} AS "2020", {'id': i + 10} AS "2021", {'id': i + 100} AS "2022", {'id': i + 1000} AS "2023", {'id': i + 10000} AS "2024"
FROM range(10000000) t(i)

UNPIVOT structs ON "2020", "2021", "2022", "2023", "2024"
```

| v1.2.2 | New |
|---|---|
|3.24 s|  0.39 s|
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants