Minimize memory copy in port headers during pipeline construction#70105
Minimize memory copy in port headers during pipeline construction#70105heymind wants to merge 2 commits intoClickHouse:masterfrom
Conversation
|
This is an automated comment for commit a82b9fd with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
|
Very similar improvement is visible in another PR, so I am not sure if this PR actually helps. Let me figure this out. |
af4a227 to
c88d50a
Compare
Co-authored-by: János Benjamin Antal <antaljanosbenjamin@users.noreply.github.com>
c88d50a to
a82b9fd
Compare
|
I can't show you the real query, but I made a similar one for you to see. -- First, create the table
CREATE TABLE my_table
(
`id` String,
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_1` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_2` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_3` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_4` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_5` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_6` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_7` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_8` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_9` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_10` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_11` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_12` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_13` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_14` Nullable(String),
`long_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_longlong_long_long_long_long_column_name_15` Nullable(String)
)
ENGINE = MergeTree
ORDER BY id;
-- Generate some random data
INSERT INTO my_table SELECT
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number),
toString(number)
FROM numbers(10000000);
-- The query
SELECT
*
FROM
my_table AS t1
JOIN
my_table AS t2 ON t1.id = t2.id
JOIN
my_table AS t3 ON t2.id = t3.id
JOIN
my_table AS t4 ON t3.id = t4.id
JOIN
my_table AS t5 ON t4.id = t5.id
JOIN
my_table AS t6 ON t5.id = t6.id
JOIN
my_table AS t7 ON t6.id = t7.id
JOIN
my_table AS t8 ON t7.id = t8.id
JOIN
my_table AS t9 ON t8.id = t9.id
JOIN
my_table AS t10 ON t9.id = t10.id
JOIN
my_table AS t11 ON t10.id = t11.id
JOIN
my_table AS t12 ON t11.id = t12.id
JOIN
my_table AS t13 ON t12.id = t13.id
JOIN
my_table AS t14 ON t13.id = t14.id
JOIN
my_table AS t15 ON t14.id = t15.id
JOIN
my_table AS t16 ON t15.id = t16.id
JOIN
my_table AS t17 ON t16.id = t17.id
JOIN
my_table AS t18 ON t17.id = t18.id
JOIN
my_table AS t19 ON t18.id = t19.id
JOIN
my_table AS t20 ON t19.id = t20.id
JOIN
my_table AS t21 ON t20.id = t21.id
JOIN
my_table AS t22 ON t21.id = t22.id
JOIN
my_table AS t23 ON t22.id = t23.id
JOIN
my_table AS t24 ON t23.id = t24.id
JOIN
my_table AS t25 ON t24.id = t25.id
JOIN
my_table AS t26 ON t25.id = t26.id
JOIN
my_table AS t27 ON t26.id = t27.id
JOIN
my_table AS t28 ON t27.id = t28.id
JOIN
my_table AS t29 ON t28.id = t29.id
JOIN
my_table AS t30 ON t29.id = t30.id
JOIN
my_table AS t31 ON t30.id = t31.id
JOIN
my_table AS t32 ON t31.id = t32.id
JOIN
my_table AS t33 ON t32.id = t33.id
JOIN
my_table AS t34 ON t33.id = t34.id
JOIN
my_table AS t35 ON t34.id = t35.id
JOIN
my_table AS t36 ON t35.id = t36.id
JOIN
my_table AS t37 ON t36.id = t37.id
JOIN
my_table AS t38 ON t37.id = t38.id
JOIN
my_table AS t39 ON t38.id = t39.id
JOIN
my_table AS t40 ON t39.id = t40.id
JOIN
my_table AS t41 ON t40.id = t41.id
JOIN
my_table AS t42 ON t41.id = t42.id
JOIN
my_table AS t43 ON t42.id = t43.id
JOIN
my_table AS t44 ON t43.id = t44.id
JOIN
my_table AS t45 ON t44.id = t45.id
JOIN
my_table AS t46 ON t45.id = t46.id
JOIN
my_table AS t47 ON t46.id = t47.id
JOIN
my_table AS t48 ON t47.id = t48.id
JOIN
my_table AS t49 ON t48.id = t49.id
JOIN
my_table AS t50 ON t49.id = t50.id
settings max_threads=128;
-- The result set is excessively large, and this optimization has no noticeable impact on execution.
-- Use explain pipeline to see the differences.Before: After: |
Then why should we do this? Just to make |
|
I don't see any such slowdowns in cloud instance, node 32GB, 8 vCPU 🤔 UPD: w/o parallel replicas as well |
|
The slowdown depends on |
Such queries, when not using EXPLAIN, will also be faster. This optimization focuses on building a pipeline. In our real case, it reduced the building time from 32 seconds to 16 seconds. The example SQL provided here is just for explanation. |
|
Sorry, this PR went a bit low on my list.
I will try to add a performance test, because we should prove that it makes something faster and make sure we keep that performance in line in the future. Without a performance test this feels like a bugfix without a test to verify it works. |
|
Dear @antaljanosbenjamin, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
|
Continuing in #83381 since pushing to the fork is not enabled |
Continue #70105: Reduce port header memcpy




This PR introduces an optimization to reduce memory copying within the Port header during the construction of the pipeline graph. The core change involves modifying the header type in
Port.hfrom an owning type to aconst shared_ptr<const Block>, which will significantly reduce memory copying when building expression on tables with huge columns.The member
headertype inPort.his now aconst shared_ptr<const Block>, which allows for shared ownership and immutability. Profiling indicated thatPortinstances were frequently cloned, and this change will improve the performance.IProcessorinstances contain bothinput_portsandoutput_ports. When chaining processors usingAddSimpleTransform, the output ports of the previous processor share the same header as the input ports of the next processor. This PR ensures that these headers can share the same reference, reducing the number of unnecessary cloning operations byN * num_streams.The use of
const shared_ptr<const Block>ensures stricter immutability compared to the previous implementation, providing additional safety guarantees enforced by the compiler.Transitioning all occurrences to the new port header style is a substantial task. To facilitate this, the PR introduces
AddSimpleTransformmethod overloads inPipeandQueryPipelineBuilder, enabling an incremental adaptation process. An example of this implementation can be seen inExpressionStep::transformPipeline, which effectively reduces header copying by num_streams times.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Minimize memory copy in port headers during pipeline construction
CI Settings (Only check the boxes if you know what you are doing):