Improve the performance of serialized aggregation method when involving multiple [nullable] columns#55809
Conversation
|
This is an automated comment for commit a2027db with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
|
I've seen performance gains due to fewer alloca calls. But arena grows linearly, so I didn't think that arena could be modified to achieve the same purpose. It's hard to understand that an additional function call has such a large fixed overhead. 😢 |
It's likely because continuous allocation is heavy. Actually we can do even better by allocating one full row per time. We already have |
|
Performance is nearly on-par with one exception https://s3.amazonaws.com/clickhouse-test-reports/55809/b8a391e96953feaf6fae1b3a8324750196371f52/performance_comparison_[3_4]/report.html . This small portion should be related to virtualization cost. |
Orz,太强了, Too strong. |
|
@amosbird anything we can do to speedup this PR ? |
src/Columns/ColumnLowCardinality.cpp
Outdated
There was a problem hiding this comment.
I'm not sure it is a good idea to allocate memory externally. the callee should know better how much elements the array should have, isn't it?
There was a problem hiding this comment.
Sure. I've updated so that called will do the resize if needed.
src/Columns/ColumnUnique.h
Outdated
There was a problem hiding this comment.
I just hope you advanced memory correctly in all cases )
There was a problem hiding this comment.
It should work, or else tests will complain immediately.
src/Columns/IColumn.cpp
Outdated
There was a problem hiding this comment.
| UInt64 * indexes [[maybe_unused]]; | |
| UInt64 * next_index [[maybe_unused]]; | |
| [[maybe_unused]] UInt64 * indexes = nullptr; | |
| [[maybe_unused]] UInt64 * next_index = nullptr; |
src/Columns/IColumn.cpp
Outdated
There was a problem hiding this comment.
maybe it makes sense to create a separate function for this case. it will add copy-paste, but the code would be easier to read IMO
src/Columns/IColumn.cpp
Outdated
There was a problem hiding this comment.
should be removed I guess in favour of l.165-172
|
@amosbird This looks great, let's merge it! |
|
@amosbird Any update? Thanks. |
|
I'll try to make it mergeable in this month. Hopefully it can land in 24Q1. |
nickitat
left a comment
There was a problem hiding this comment.
apart from minor suggestions looks great
src/Columns/IColumn.h
Outdated
There was a problem hiding this comment.
do you think PODArrayWithStackMemory would make sense here?
There was a problem hiding this comment.
I don't think so. A block of sizes is almost always larger than what we can put on stack.
…ng multiple [nullable] columns.

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Improve the performance of serialized aggregation method when involving multiple [nullable] columns. This is a general version of #51399 that doesn't compromise on abstraction integrity.
This pull request refactors the
IColumnin a manner similar to theIAggregateFunction. It introducesIColumnHelper, which implements methods to devirtualize some calls ofIColumnin its final descendants. It is also used to implement common methods, such asserializeValueIntoArena, and place them in the same translation unit to ensure they are properly inlined.Three variants of serialized aggregation methods have been added:
nullable_serialized,prealloc_serialized, andnullable_prealloc_serialized. With the nullable variant, we have devirtualized calls to write nested column values when they are not null. In the prealloc variant, we first calculate the serialized byte size of each row (involving more than one column) in a vectorized way. Then, we allocate continuous memory as a whole and useserializeValueIntoMemoryto serialize column values.The performance numbers are impressive, showing ~2x improvement in speed.