-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9156: [C++] Reducing the code size of the tensor module #7539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This removes already more than 2MB of code from libarrow.so on Linux: great. I'll keep an eye on this |
af20396 to
1e54d34
Compare
|
@wesm Could you please review this? |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, this is definitely very beneficial both for compilation time and code size. I left some minor comments about reducing logic duplication in a few places by factoring some things out into a helper function but otherwise this looks good, and the existing test suite gives good evidence that things are working as expected
I'm sensitive to the possible performance issues that this refactor may introduce (by changing inline functions to C function calls or functions with switch statements in them but per my comments within I think the best approach would be to identify the performance problems (if any), write benchmarks to illustrate them, and then pursue optimizations on a case by case basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use an std::function for this, benchmarks would help show which is better. Either solution is going to be slower than the inlined version but if it is demonstrated to be a performance problem then we can revisit the template instantiation on a case by case basis to decide whether the tradeoff between code size and performance makes sense.
Additionally, in the future I think it could be beneficial to package the tensor support code in a libarrow_tensor in the future so that we could also "spin off" the tensor support into a separate pyarrow-tensor Python package so that users who do not need tensor support are not having to always carry around the compiled code that they may never use.
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, modulo CI. Thank you for using the GetByteWidth helper function in so many places =)
|
@wesm Is it better to work for benchmarking in other pull-request? |
|
@mrkn it's up to you, it's fine with me if you work on performance tuning (or at least measurement) in another PR or this one |
|
@wesm OK. I continue to work for benchmarking in this pull-request. If I need more time to tune etc., I'll split the issue. |
ca93bc0 to
bcc2ee3
Compare
|
I wrote a benchmark code that measures the performance of conversion from Tensor to SparseTensor.
I don't think this result, especially SparseCOOTensor's case can be acceptable. The full result is shown below:
|
ef7781b to
59df077
Compare
|
@mrkn to improve the benchmark usefulness I would recommend increasing the size of the data being processed. I ran them locally and the COO benchmarks all run in less than 10 microseconds (some close to 1 microsecond) and at that speed things like destructors show up as non-trivial overhead |
|
What do you think about pursuing the performance optimization work as a follow up? |
|
@wesm I decided to separate a pull-request for performance optimization because I may need some days to get this work done. |
|
OK, sounds good, let me know when this is ready to be merged |
732e087 to
5523588
Compare
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, CI is looking OK now. Thanks @mrkn!
This change improves the conversion speed for all the cases of row-major and column-major tensors. For strided tensors, all the cases are improved except for the combination of int16 value and less than 32-bit index. The result from `archery benchmark diff` command is below, the baseline is the commit 8f96d1d (before merging apache#7539) and the contender is this commit: ``` benchmark baseline contender change % 43 Int16StridedTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 141564.498765 182313.374077 28.785 10 Int16StridedTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 140420.265077 153715.618715 9.468 42 Int16StridedTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 167601.944005 170626.538009 1.805 37 Int16StridedTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 143722.048451 141928.779114 -1.248 27 Int8StridedTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 169947.630903 164423.055008 -3.251 24 Int8StridedTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 170153.324442 163898.373534 -3.676 45 Int8StridedTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 170883.542468 164131.618700 -3.951 35 Int8StridedTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 171015.028153 163516.191034 -4.385 9 DoubleStridedTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 200974.675587 191956.688874 -4.487 18 FloatStridedTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 192320.819787 182941.130595 -4.877 12 DoubleStridedTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 175198.892973 166417.452194 -5.012 30 FloatStridedTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 167174.764713 151431.022906 -9.418 29 DoubleStridedTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 173925.990981 157142.110096 -9.650 16 FloatStridedTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 167877.497573 151666.610814 -9.656 26 FloatStridedTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 169705.312801 151885.952803 -10.500 6 DoubleStridedTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 177394.661870 156019.301906 -12.050 5 Int16RowMajorTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 107592.839089 66069.770737 -38.593 41 Int16ColumnMajorTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 114841.700196 68707.073774 -40.172 47 Int16RowMajorTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 107304.436017 63922.898636 -40.428 4 FloatRowMajorTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 112315.965200 66577.854744 -40.723 21 Int16ColumnMajorTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 115090.317912 66527.852021 -42.195 17 FloatColumnMajorTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 121583.540341 70025.614174 -42.405 3 DoubleRowMajorTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 116946.572632 66411.338694 -43.212 15 FloatRowMajorTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 112275.805149 63264.226406 -43.653 13 FloatColumnMajorTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 122085.596559 66569.027159 -45.473 34 Int16RowMajorTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 109888.801628 58860.826009 -46.436 20 Int16ColumnMajorTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 117648.480324 62574.709433 -46.812 19 Int8ColumnMajorTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 137444.576787 71969.132261 -47.638 28 DoubleRowMajorTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 119527.435615 61405.371141 -48.627 40 FloatRowMajorTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 115130.821188 58664.779831 -49.045 39 Int8ColumnMajorTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 137053.503574 69755.112894 -49.104 22 Int8RowMajorTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 136645.576795 69303.266896 -49.282 23 FloatColumnMajorTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 124100.575779 61723.051518 -50.264 31 DoubleColumnMajorTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 140278.467902 69584.530347 -50.395 1 Int16RowMajorTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 135770.669563 67151.922438 -50.540 44 Int16ColumnMajorTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 142625.928542 70315.759868 -50.699 2 Int8ColumnMajorTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 137443.030096 67752.813535 -50.705 46 Int8RowMajorTensorConversionFixture<Int16Type>/ConvertToSparseCOOTensorInt16 135961.160225 66613.351871 -51.006 11 DoubleColumnMajorTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 138857.793332 67315.714410 -51.522 8 FloatRowMajorTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 138992.703542 66847.061004 -51.906 7 Int8RowMajorTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 136298.424804 64520.497064 -52.662 36 FloatColumnMajorTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 149706.883716 69805.958679 -53.372 33 DoubleRowMajorTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 143460.582904 66870.585026 -53.387 38 DoubleColumnMajorTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 138220.367601 64425.776453 -53.389 14 DoubleRowMajorTensorConversionFixture<Int32Type>/ConvertToSparseCOOTensorInt32 136707.421042 63624.050357 -53.460 25 Int8ColumnMajorTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 137303.219403 62528.740787 -54.459 32 Int8RowMajorTensorConversionFixture<Int64Type>/ConvertToSparseCOOTensorInt64 136551.052565 58743.141699 -56.981 0 DoubleColumnMajorTensorConversionFixture<Int8Type>/ConvertToSparseCOOTensorInt8 162895.437265 69676.279783 -57.226 ```
…onversion In this pull-request, the slowing down of the conversion introduced in #7539 is canceled, and the conversion speed is improved than before #7539 in some cases. Closes #7643 from mrkn/ARROW-9331 Authored-by: Kenta Murata <mrkn@mrkn.jp> Signed-off-by: Wes McKinney <wesm@apache.org>
To reduce the size of libarrow.so, I'd like to reduce the size of tensor and sparse tensor codes.
TODO: