LIST aggregate performance improvements by Mytherin · Pull Request #6995 · duckdb/duckdb

Mytherin · 2023-04-06T19:23:44Z

This PR improves the performance of the LIST aggregate in several ways:

Use an ArenaAllocator that is stored in the aggregate/perfect hash table to do the allocation of the list segments
Avoid storing the AllocatedData in a vector in the aggregate state of the list - instead use the provided allocator in the destructor to de-allocate the segments
Avoid allocating the root LinkedList and instead store this as part of the hash table
Avoid storing the LogicalType in the aggregate state of the list - instead use the type from the bind data

This results in many fewer allocations made, particularly when there are many small lists. They also improve data locality as a result of using large heaps for the allocations. Here are some performance numbers for the two included benchmarks:

Benchmark	Old	New
Many Small Lists (150K lists, 1-7 rows per list)	3.5s	0.06s
Few Large Lists (3 lists, 140K - 300K rows per list)	0.03s	0.02s

These changes require a few modifications to the aggregate hash tables and row layout, particularly to pass in the allocator from the hash table to the actual hash function:

The destroy function now also takes an AggregateFunctionInput
The RowOperations methods take a RowOperationsState, which contains an Allocator
The AggregateHashTable and PerfectAggregateHashTable are extended with an ArenaAllocator object

For the scan of the PhysicalHashAggregate we also launch multiple threads sooner - instead of only launching one thread per row group size we launch one thread per vector. Hash tables are scanned per-vector, and multiple threads processing the output of a hash table becomes beneficial already with low row counts when lists are involved.

…nto an intermediate vector and then appending

…LinkedList into references instead of pointers

…d in in destructor to de-allocate

…stead

…te/perfect aggregate HT

Mytherin · 2023-04-06T19:38:07Z

CC @Tmonster this PR makes the following query on the H2OAI benchmark faster as a substitute for the current window function for Advanced Group By Q3:

SELECT id6, unnest(list_sort(list(v3), 'desc')[1:2]) AS v3 FROM (SELECT id6, v3 FROM x_group WHERE v3 IS NOT NULL) AS subq GROUP BY id6;

create table ans as SELECT id6, v3 AS largest2_v3 FROM (SELECT id6, v3, row_number() OVER (PARTITION BY id6 ORDER BY v3 DESC) AS order_v3 FROM x_group WHERE v3 IS NOT NULL) sub_query WHERE order_v3 <= 2;
-- 0.4s
create table ans as SELECT id6, unnest(list_sort(list(v3), 'desc')[1:2]) AS v3 FROM (SELECT id6, v3 FROM x_group WHERE v3 IS NOT NULL) AS subq GROUP BY id6;
-- 0.32s

…implify memory management of aggregate hash tables

lnkuiper

Looks good to me! I definitely will have merge conflicts with this, but nothing too serious. I wrote down some thoughts:

lnkuiper · 2023-04-07T11:54:15Z

src/include/duckdb/execution/aggregate_hashtable.hpp

 	vector<ExpressionType> predicates;

+	//! The arena allocator used by the aggregates for their internal state
+	ArenaAllocator aggregate_allocator;


What if we have many distinct groups and have to use partitioning? Won't the ArenaAllocator of the initial (unpartitioned) HT still be the owner of the allocated data?

Although I assume the receiving aggregate state is responsible for ownership of the data from the giving aggregate state when they are combined by, e.g., copying it. For the list aggregate function we use a linked list, so in theory, we could do a zero-copy combine.

The Combine currently always copies over the data and keeps the original data in-tact, which allows this to work as-is. For a zero-copy combine we would need to move over the allocators as well, but I will leave that for a future PR.

You are only allowed to take ownership of the incoming state during combine if you know you are not being called by the windowing logic (the segment tree code uses combine on elements of the tree). One notable (and relevant) example is the sorted aggregate wrapper, which is only used by regular aggregation. This will probably not change if we add ORDER BY argument clauses to windowed aggregates as I expect we will use Adrian's stuff instead.

taniabogatsch

This is pretty cool! I added some minor comments, mostly related to clarifying things for me.

I am also linking #3788 here since this PR addresses one of the two missing points I raised there.

src/common/types/list_segment.cpp

taniabogatsch · 2023-04-07T11:34:23Z

src/execution/operator/aggregate/aggregate_object.cpp

-    : function(std::move(function)), bind_data(bind_data), child_count(child_count), payload_size(payload_size),
-      aggr_type(aggr_type), return_type(return_type), filter(filter) {
+    : function(std::move(function)),
+      bind_data_wrapper(bind_data ? make_shared<FunctionDataWrapper>(bind_data->Copy()) : nullptr),


Nit: this might be a bit tough to read.

src/execution/perfect_aggregate_hashtable.cpp

src/storage/arena_allocator.cpp

Mytherin added 12 commits April 6, 2023 14:31

LIST aggregate - remove type from aggregate state

13e6c4e

LIST finalize - read directly into result vector instead of reading i…

6e548ca

…nto an intermediate vector and then appending

List - inline first LinkedList entry instead of allocating, and turn …

7bdda63

…LinkedList into references instead of pointers

Split allocation size from allocation in ListSegment allocate

400711e

Clean-up - unify all ListSegment functions into a single class

b450bfe

Push AggregateInputInfo into aggregate destructors as well

63f447b

LIST - remove usage of AllocatedData, and instead use Allocator passe…

45a2f35

…d in in destructor to de-allocate

List aggregate - actually grow list segments

c98b02b

Add RowOperationsState, and push allocator into actual hash tables in…

bd11924

…stead

List aggregate - move to using arena allocators stored in the aggrega…

a5acbfe

…te/perfect aggregate HT

Add list benchmarks

e08062f

Revert GetCapacityForNewSegment

1a354f7

Mytherin requested review from hawkfish, lnkuiper and taniabogatsch April 6, 2023 19:23

Mytherin added 2 commits April 6, 2023 23:10

AggregateObject - hold shared copy of FunctionData of aggregates to s…

531ecc3

…implify memory management of aggregate hash tables

WindowSegmentTree - use AggregateObject as well

3f3304a

lnkuiper approved these changes Apr 7, 2023

View reviewed changes

lnkuiper mentioned this pull request Apr 7, 2023

Tuple Data Collection #6998

Merged

taniabogatsch approved these changes Apr 7, 2023

View reviewed changes

Revert aggregate parallel threshold, and add missing include

9185d31

Mytherin mentioned this pull request Apr 7, 2023

Issue #5920: Ordered Aggregate Sorting #6986

Merged

Mytherin merged commit 30d444e into duckdb:master Apr 7, 2023

Mytherin deleted the listimprovements branch April 24, 2023 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LIST aggregate performance improvements#6995

LIST aggregate performance improvements#6995
Mytherin merged 15 commits intoduckdb:masterfrom
Mytherin:listimprovements

Mytherin commented Apr 6, 2023

Uh oh!

Mytherin commented Apr 6, 2023

Uh oh!

lnkuiper left a comment

Uh oh!

lnkuiper Apr 7, 2023

Uh oh!

Mytherin Apr 7, 2023

Uh oh!

hawkfish Apr 8, 2023

Uh oh!

taniabogatsch left a comment

Uh oh!

Uh oh!

taniabogatsch Apr 7, 2023

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Mytherin commented Apr 6, 2023

Uh oh!

Mytherin commented Apr 6, 2023

Uh oh!

lnkuiper left a comment

Choose a reason for hiding this comment

Uh oh!

lnkuiper Apr 7, 2023

Choose a reason for hiding this comment

Uh oh!

Mytherin Apr 7, 2023

Choose a reason for hiding this comment

Uh oh!

hawkfish Apr 8, 2023

Choose a reason for hiding this comment

Uh oh!

taniabogatsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

taniabogatsch Apr 7, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants