Optimize performance with lazy projection to avoid reading unused columns#55518
Conversation
|
@alexey-milovidov Based on #45868, lazy projection has been implemented to improve performance. |
|
This is an automated comment for commit 52e58b8 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
src/Core/Settings.h
Outdated
| M(Bool, query_plan_aggregation_in_order, true, "Use query plan for aggregation-in-order optimisation", 0) \ | ||
| M(Bool, query_plan_remove_redundant_sorting, true, "Remove redundant sorting in query plan. For example, sorting steps related to ORDER BY clauses in subqueries", 0) \ | ||
| M(Bool, query_plan_remove_redundant_distinct, true, "Remove redundant Distinct step in query plan", 0) \ | ||
| M(Bool, query_plan_optimize_lazy_projection, false, "Use query plan for lazy projection optimisation", 0) \ |
There was a problem hiding this comment.
Better to use query_plan_optimize_lazy_materialization
There was a problem hiding this comment.
Thank you again for taking the time to review my work. I have renamed it to query_plan_optimize_lazy_materialization.
Before 0.8 vs after 1.6, more slow? |
I apologize for my mistake. It has been corrected. |
|
Out of curiosity, Could the |
9303fc9 to
ce68887
Compare
Both 'ColumnLazy' and 'ColumnFunction' imply delayed execution. While it is possible to achieve the same result using 'ColumnFunction', it may not appear as intuitive, convenient, and concise as using 'ColumnLazy' directly. |
|
It looks like
|
src/Columns/ColumnLazy.h
Outdated
| private: | ||
| friend class COWHelper<IColumn, ColumnLazy>; | ||
|
|
||
| WrappedPtr part_nums; |
There was a problem hiding this comment.
I thought it would store a callback that encapsulates all the details, and you initialize the column with this callback. It will be useful for any kind of delayed reading or calculations.
There was a problem hiding this comment.
If I understand correctly, I need to encapsulate row_nums and part_nums into an array. Different delayed executions may require different column arrays. It is similar to ColumnFunction.
src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp
Outdated
Show resolved
Hide resolved
src/Processors/QueryPlan/Optimizations/optimizeLazyMaterialization.cpp
Outdated
Show resolved
Hide resolved
tests/queries/0_stateless/02813_optimize_lazy_materialization.sql
Outdated
Show resolved
Hide resolved
|
This is a very good feature, that I dreamed of! But we need to make ColumnLazy more generic - containing an opaque callback to transform it to a full column. |
Yes, although currently only the lazy read of MergeTree has been implemented. |
Reimplemented ColumnLazy, adding the IColumnLazyHelper callback interface to abstract away related details. |
…ze_lazy_projection
|
Stateless tests (tsan, s3 storage, 1/3) — Server died, fail - should be fixed by #78205 |
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250326) * Fix build due to ClickHouse/ClickHouse#55518 * Fix Build due to ClickHouse/ClickHouse#77013 * Fix gtest build due to ClickHouse/ClickHouse#77895 * disbable query_plan_optimize_lazy_materialization due to #9141 --------- Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Chang chen <changchen@apache.org>
See #45868
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Optimize performance with lazy projection to avoid reading unused columns.
Documentation entry for user-facing changes