-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-5380: [C++] Fix memory alignment UBSan errors. #4757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4757 +/- ##
==========================================
+ Coverage 86.44% 88.99% +2.55%
==========================================
Files 992 717 -275
Lines 138020 99467 -38553
Branches 1418 0 -1418
==========================================
- Hits 119309 88523 -30786
+ Misses 18349 10944 -7405
+ Partials 362 0 -362
Continue to review full report at Codecov.
|
|
@ursabot benchmark |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#28883) builder has been succeeded. Revision: db49fbb ====================================================== ================== ================== ============================
benchmark baseline contender delta
====================================================== ================== ================== ============================
ValidateSmallAscii 12.74 GiB/s 12.61 GiB/s -141.47 MiB/s (-1.08%)
BuildInt32ColumnByChunk/512 11379.83ns 11202.64ns -177.19ns (-1.56%)
BuildInt32ColumnByChunk/5 198.81ns 200.36ns 1.55ns (0.78%)
BufferedOutputStreamSmallWritesToPipe/real_time 860.02 MiB/s 874.24 MiB/s 14.22 MiB/s (1.65%)
FileOutputStreamLargeWritesToPipe/real_time 2.24 GiB/s 2.3 GiB/s 59.07 MiB/s (2.58%)
BufferedOutputStreamLargeWritesToPipe/real_time 2.24 GiB/s 2.27 GiB/s 31.92 MiB/s (1.39%)
TakeString/32768/1/min_time:1.000 1.45 GiB/s 1.47 GiB/s 21.99 MiB/s (1.48%)
TakeInt64/8388608/1/min_time:1.000 343.49 MiB/s 357.31 MiB/s 13.82 MiB/s (4.02%)
TakeInt64/1048576/1/min_time:1.000 453.24 MiB/s 470.46 MiB/s 17.23 MiB/s (3.80%)
TakeInt64/32768/50/min_time:1.000 342.54 MiB/s 335.58 MiB/s -6.96 MiB/s (-2.03%)
TakeString/32768/0/min_time:1.000 1.62 GiB/s 1.65 GiB/s 31.15 MiB/s (1.87%)
TakeString/1048576/1/min_time:1.000 1.27 GiB/s 1.29 GiB/s 22.73 MiB/s (1.74%)
TakeInt64VsFilter/32768/10/min_time:1.000 1.38 GiB/s 1.39 GiB/s 15.37 MiB/s (1.09%)
TakeFixedSizeList1Int64/1048576/1/min_time:1.000 120.13 MiB/s 121.87 MiB/s 1.74 MiB/s (1.45%)
TakeInt64VsFilter/32768/50/min_time:1.000 779.37 MiB/s 787.26 MiB/s 7.9 MiB/s (1.01%)
FilterFixedSizeList1Int64/8388608/1/min_time:1.000 347.87 MiB/s 354.1 MiB/s 6.23 MiB/s (1.79%)
FilterInt64/8388608/1/min_time:1.000 618.86 MiB/s 631.16 MiB/s 12.3 MiB/s (1.99%)
FilterString/32768/0/min_time:1.000 4.03 GiB/s 4.1 GiB/s 78.39 MiB/s (1.90%)
FilterString/32768/1/min_time:1.000 3.92 GiB/s 3.99 GiB/s 70.94 MiB/s (1.77%)
FilterFixedSizeList1Int64/32768/0/min_time:1.000 399.29 MiB/s 403.82 MiB/s 4.54 MiB/s (1.14%)
FilterString/1048576/1/min_time:1.000 3.3 GiB/s 3.23 GiB/s -71.76 MiB/s (-2.12%)
FilterFixedSizeList1Int64/32768/50/min_time:1.000 171.51 MiB/s 172.7 MiB/s 1.19 MiB/s (0.70%)
FilterFixedSizeList1Int64/1048576/1/min_time:1.000 356.37 MiB/s 354.26 MiB/s -2.11 MiB/s (-0.59%)
FilterInt64/32768/1/min_time:1.000 817.28 MiB/s 807.74 MiB/s -9.54 MiB/s (-1.17%)
FilterInt64/32768/10/min_time:1.000 712.6 MiB/s 706.36 MiB/s -6.23 MiB/s (-0.87%)
FilterString/32768/50/min_time:1.000 1.95 GiB/s 1.96 GiB/s 15.05 MiB/s (0.76%)
FilterString/8388608/1/min_time:1.000 3.46 GiB/s 3.43 GiB/s -37.88 MiB/s (-1.07%)
FilterInt64/32768/50/min_time:1.000 365.68 MiB/s 370.37 MiB/s 4.7 MiB/s (1.28%)
FilterFixedSizeList1Int64/32768/10/min_time:1.000 318.63 MiB/s 314.88 MiB/s -3.75 MiB/s (-1.18%)
ReadJSONBlockWithSchemaMultiThread/real_time 222.26 MiB/s 219.02 MiB/s -3.24 MiB/s (-1.46%)
ChunkJSONLineDelimited 104.61ns 100.43ns -4.17ns (-3.99%)
BuildStringDictionary 45.66 MiB/s 47.48 MiB/s 1.82 MiB/s (3.99%)
UniqueInt64WithNulls/4194304/1024 1.9 GiB/s 1.88 GiB/s -16.19 MiB/s (-0.83%)
SumKernel/32768/1 14.49 GiB/s 14.31 GiB/s -174.51 MiB/s (-1.18%)
ThreadPoolSpawn/threads:2/task_cost:100000/real_time 21766.45246150031 22732.475168703626 966.0227072033158 (4.44%)
ThreadedTaskGroup/threads:2/task_cost:10000/real_time 243879.6678804378 242437.13088468352 -1442.5369957542862 (-0.59%)
ThreadPoolSpawn/threads:4/task_cost:100000/real_time 41313.19311685875 40135.79891525845 -1177.3942016002984 (-2.85%)
ThreadedTaskGroup/threads:4/task_cost:100000/real_time 47294.43689321427 47773.651572188304 479.21467897403636 (1.01%)
ThreadPoolSpawn/threads:8/task_cost:1000/real_time 234707.2491696133 236067.04388044862 1359.7947108353255 (0.58%)
ThreadPoolSpawn/threads:2/task_cost:10000/real_time 215989.39769180672 210750.91108234323 -5238.486609463493 (-2.43%)
ThreadedTaskGroup/threads:8/task_cost:1000/real_time 222985.67371614516 233740.65355218883 10754.979836043669 (4.82%)
ThreadedTaskGroup/threads:1/task_cost:1000/real_time 886778.1114667933 878946.971340187 -7831.140126606332 (-0.88%)
ThreadedTaskGroup/threads:4/task_cost:1000/real_time 230454.4592953691 232475.82248897856 2021.3631936094607 (0.88%)
ThreadPoolSpawn/threads:1/task_cost:1000/real_time 982827.9930659528 953462.7785867956 -29365.214479157235 (-2.99%)
CompareArrayScalarKernel/32768/0 12.15 GiB/s 12.05 GiB/s -97.48 MiB/s (-0.78%)
CompareArrayScalarKernel/32768/1 11.93 GiB/s 12.05 GiB/s 125.43 MiB/s (1.03%)
CompareArrayArrayKernel/32768/0 20.59 GiB/s 20.72 GiB/s 133.79 MiB/s (0.63%)
DetectIntWidthNoNulls 18.97 GiB/s 19.22 GiB/s 262.51 MiB/s (1.35%)
+ BuildInt64DictionaryArraySimilar 379.53 MiB/s 412.36 MiB/s 32.84 MiB/s (8.65%)
+ BuildInt64DictionaryArraySequential 646.98 MiB/s 698.71 MiB/s 51.73 MiB/s (8.00%)
BuildChunkedBinaryArray 382.88 MiB/s 378.39 MiB/s -4.49 MiB/s (-1.17%)
ArrayDataConstructDestruct 68708.70ns 69165.68ns 456.98ns (0.67%)
+ BuildBinaryArray 286.93 MiB/s 315.39 MiB/s 28.47 MiB/s (9.92%)
ParallelMemoryCopy/threads:1/real_time 6.76 GiB/s 6.99 GiB/s 236.68 MiB/s (3.42%)
ParallelMemoryCopy/threads:8/real_time 23.21 GiB/s 23.47 GiB/s 261.88 MiB/s (1.10%)
ParallelMemoryCopy/threads:4/real_time 21.53 GiB/s 21.2 GiB/s -332.31 MiB/s (-1.51%)
ParallelMemoryCopy/threads:16/real_time 22.48 GiB/s 22.77 GiB/s 300.61 MiB/s (1.31%)
ParallelMemoryCopy/threads:40/real_time 21.62 GiB/s 21.86 GiB/s 244.3 MiB/s (1.10%)
+ ParallelMemoryCopy/threads:2/real_time 10.32 GiB/s 10.98 GiB/s 679.54 MiB/s (6.43%)
ReadRecordBatch/16/real_time 197.95 GiB/s 196.43 GiB/s -1.52 GiB/s (-0.77%)
WriteRecordBatch/4096/real_time 691.66 MiB/s 700.99 MiB/s 9.33 MiB/s (1.35%)
+ ReadRecordBatch/4096/real_time 489.3 MiB/s 639.14 MiB/s 149.84 MiB/s (30.62%)
WriteRecordBatch/64/real_time 9.87 GiB/s 10.13 GiB/s 263.55 MiB/s (2.61%)
WriteRecordBatch/256/real_time 6.35 GiB/s 6.48 GiB/s 132.12 MiB/s (2.03%)
WriteRecordBatch/1024/real_time 2.58 GiB/s 2.61 GiB/s 32.21 MiB/s (1.22%)
WriteRecordBatch/16/real_time 11.53 GiB/s 11.88 GiB/s 354.96 MiB/s (3.01%)
ReadRecordBatch/256/real_time 10.15 GiB/s 10.28 GiB/s 134.7 MiB/s (1.30%)
ReadRecordBatch/64/real_time 51.14 GiB/s 51.81 GiB/s 681.34 MiB/s (1.30%)
ReadRecordBatch/1024/real_time 2.5 GiB/s 2.47 GiB/s -26.35 MiB/s (-1.03%)
WriteRecordBatch/1/real_time 11.99 GiB/s 12.38 GiB/s 401.45 MiB/s (3.27%)
WriteRecordBatch/4/real_time 12.03 GiB/s 12.39 GiB/s 373.24 MiB/s (3.03%)
WriteRecordBatch/8192/real_time 343.59 MiB/s 347.61 MiB/s 4.01 MiB/s (1.17%)
FirstTimeBitmapWriter/8192 96.73 MiB/s 97.89 MiB/s 1.16 MiB/s (1.20%)
+ GenerateBitsUnrolled/8192 139.15 MiB/s 149.99 MiB/s 10.84 MiB/s (7.79%)
VisitBits/8192 102.32 MiB/s 100.94 MiB/s -1.38 MiB/s (-1.35%)
+ BitmapReader/8192 98.96 MiB/s 109.27 MiB/s 10.31 MiB/s (10.42%)
- BitmapWriter/8192 83.85 MiB/s 77.89 MiB/s -5.96 MiB/s (-7.11%)
- GenerateBits/8192 88.66 MiB/s 80.93 MiB/s -7.73 MiB/s (-8.72%)
====================================================== ================== ================== ============================
124 result(s) not shown due to tiny delta |
|
@ursabot benchmark --help |
|
|
@ursabot benchmark --benchmark-filter=Bit |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#28904) builder has been succeeded. Revision: db49fbb ========================== ============ ============ ====================
benchmark baseline contender delta
========================== ============ ============ ====================
- GenerateBits/8192 88.66 MiB/s 80.77 MiB/s -7.88 MiB/s (-8.89%)
+ BitmapReader/8192 98.91 MiB/s 109.34 MiB/s 10.42 MiB/s (10.54%)
+ GenerateBitsUnrolled/8192 136.98 MiB/s 150.46 MiB/s 13.48 MiB/s (9.84%)
VisitBits/8192 102.24 MiB/s 101.13 MiB/s -1.12 MiB/s (-1.09%)
FirstTimeBitmapWriter/8192 96.33 MiB/s 97.98 MiB/s 1.65 MiB/s (1.71%)
- BitmapWriter/8192 83.81 MiB/s 77.92 MiB/s -5.89 MiB/s (-7.03%)
========================== ============ ============ ====================
4 result(s) not shown due to tiny delta |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I hope debug mode doesn't become significantly slower because of this.
| return reinterpret_cast<T*>(&internal::non_null_filler); | ||
| } | ||
|
|
||
| template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm no C++ guru, but can't you make a single method for this by templating the input type too, e.g.
template <typename T, typename I = uint8_t>
inline typename std::enable_if<std::is_integral<T>::value, T>::type SafeLoad(
const I* unaligned) {
typename std::remove_const<T>::type ret;
std::memcpy(&ret, unaligned, sizeof(T));
return ret;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, I think I like how the methods are now though since it is more explicit when casting and a pass-through when not. One method could certainly delegate to the other.
| int64_t* impala_last_day_nanos = reinterpret_cast<int64_t*>(impala_timestamp); | ||
| *impala_last_day_nanos = last_day_units * NanosecondsPerUnit; | ||
| auto last_day_nanos = last_day_units * NanosecondsPerUnit; | ||
| // Strage might be unaligned, so use mempcy instead of reinterpret_cast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All even indexed Int96 in a vector will be unaligned (according to int64_t alignment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point I'll open up a follow-up PR, we must not hav good test data here or UBSan isn't foolproof, or somehow I didn't run this test properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait were you just commenting on my comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was commenting on your comment on the "strange" part :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you notice the typo? ("Strage")
|
Heh, didn't see Antoine merged, opened link from ML :) |
|
@fsaintjacques thank you for the comments I'll address them in a follow-up PR> |
- Add utility methods for unaligned loads use where errors are discovered. - Upgrade version of flatbuffers to avoid issues with unaligned load in that library - Discover bug in spec that makes zero-copy well defined behavior virtually impossible with flatbuffers (need to discuss on ML). For now I'm not turning on ASAN and will file a follow-up JIRA to track this. Still needed: - [ ] Performance testing - [X] Discuss flatbuffers issues (I sent e-mail to LM) Author: Micah Kornfield <emkornfield@gmail.com> Author: emkornfield <emkornfield@gmail.com> Closes #4757 from emkornfield/ubsan_mem and squashes the following commits: 5528584 <emkornfield> remove TODO db49fbb <Micah Kornfield> Ubsan excluding flatbuffers
- Add utility methods for unaligned loads use where errors are discovered. - Upgrade version of flatbuffers to avoid issues with unaligned load in that library - Discover bug in spec that makes zero-copy well defined behavior virtually impossible with flatbuffers (need to discuss on ML). For now I'm not turning on ASAN and will file a follow-up JIRA to track this. Still needed: - [ ] Performance testing - [X] Discuss flatbuffers issues (I sent e-mail to LM) Author: Micah Kornfield <emkornfield@gmail.com> Author: emkornfield <emkornfield@gmail.com> Closes #4757 from emkornfield/ubsan_mem and squashes the following commits: 5528584 <emkornfield> remove TODO db49fbb <Micah Kornfield> Ubsan excluding flatbuffers
are discovered.
load in that library
virtually impossible with flatbuffers (need to discuss on ML). For now I'm
not turning on ASAN and will file a follow-up JIRA to track this.
Still needed: