perf(spilling): Support serializing rows to avoid extracting it as vector#15290
perf(spilling): Support serializing rows to avoid extracting it as vector#15290NEUpanning wants to merge 12 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
@jinchengchenghh Would you like to take a look? Thank you. |
f177180 to
dd773ba
Compare
|
Could you summarize the benchmark result in the PR description? And add the benchmark result for primitive data type. |
| queryConfig.spillFileCreateConfig(), | ||
| queryConfig.windowSpillMinReadBatchRows()); | ||
| queryConfig.windowSpillMinReadBatchRows(), | ||
| true); |
There was a problem hiding this comment.
Please add the config
There was a problem hiding this comment.
This change is an optimization with no adverse impact. This parameter is only used for benchmarking. So I'm uncertain whether it is necessary to add a configuration for it.
| initPartitionWriter(id); | ||
|
|
||
| uint64_t bytes = 0; | ||
| for (const auto& row : rows) { |
There was a problem hiding this comment.
Here you have got the exactly row size, so you don't need an estimated initial size 1'000
batch_->createStreamTree(
std::static_pointer_cast<const RowType>(type_),
1'000,
serdeOptions_.get());
| batch_->createStreamTree( | ||
| std::static_pointer_cast<const RowType>(rows->type()), | ||
| 1'000, | ||
| rows->size(), |
There was a problem hiding this comment.
Here is to write part of the RowVector by indices
|
Would you like to take a look? Thanks! @xiaoxmeng |
@jinchengchenghh Thank you for reviewing. I've updated the PR description. |
|
Please make the benchmark spilling stats details readable |
@jinchengchenghh I tried to delete unnecessary stats and reformat it. Now the spilling stats text line is on the line above the corresponding benchmark result. |
|
Overall looks good to me, I ask @xiaoxmeng to help review this PR offline, he can determine this. |
|
Hi @xiaoxmeng , just a gentle reminder to take a look at this PR when you have a moment. Thanks! |
|
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions! |
Currently, Velox has to extract rows in RowContainer as vector for serialization when spilling. This PR supports serializing rows to eliminate the cost of extracting and materializing vectors corresponding to
spillExtractVectorTimemetrics.Below are the benchmark results:
benchmark spilling stats details