ARROW-5380: [C++] Fix memory alignment UBSan errors. #4757

emkornfield · 2019-06-30T07:41:47Z

Add utility methods for unaligned loads use where errors
are discovered.
Upgrade version of flatbuffers to avoid issues with unaligned
load in that library
Discover bug in spec that makes zero-copy well defined behavior
virtually impossible with flatbuffers (need to discuss on ML). For now I'm
not turning on ASAN and will file a follow-up JIRA to track this.

Still needed:

Performance testing
Discuss flatbuffers issues (I sent e-mail to LM)

codecov-io · 2019-06-30T22:33:55Z

Codecov Report

Merging #4757 into master will increase coverage by 2.55%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4757      +/-   ##
==========================================
+ Coverage   86.44%   88.99%   +2.55%     
==========================================
  Files         992      717     -275     
  Lines      138020    99467   -38553     
  Branches     1418        0    -1418     
==========================================
- Hits       119309    88523   -30786     
+ Misses      18349    10944    -7405     
+ Partials      362        0     -362

Impacted Files	Coverage Δ
cpp/src/arrow/util/bpacking.h	`99.83% <ø> (+0.02%)`	⬆️
cpp/src/parquet/arrow/writer.h	`100% <100%> (ø)`	⬆️
cpp/src/parquet/file_reader.cc	`94.3% <100%> (ø)`	⬆️
cpp/src/arrow/util/hashing.h	`99.12% <100%> (ø)`	⬆️
cpp/src/plasma/common.cc	`88.46% <100%> (ø)`	⬆️
cpp/src/parquet/encoding.cc	`93.99% <100%> (ø)`	⬆️
cpp/src/parquet/column_reader.cc	`91.5% <100%> (ø)`	⬆️
cpp/src/arrow/util/ubsan.h	`100% <100%> (ø)`	⬆️
cpp/src/parquet/arrow/reader.cc	`85.71% <100%> (ø)`	⬆️
cpp/src/gandiva/projector.cc	`90.75% <0%> (-5.12%)`	⬇️
... and 284 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10083ab...5528584. Read the comment docs.

emkornfield · 2019-07-01T00:08:32Z

@ursabot benchmark

ursabot · 2019-07-01T01:18:08Z

AMD64 Ubuntu 18.04 C++ Benchmark (#28883) builder has been succeeded.

Revision: db49fbb

  ======================================================  ==================  ==================  ============================
  benchmark                                               baseline            contender           delta
  ======================================================  ==================  ==================  ============================
  ValidateSmallAscii                                      12.74 GiB/s         12.61 GiB/s         -141.47 MiB/s (-1.08%)
  BuildInt32ColumnByChunk/512                             11379.83ns          11202.64ns          -177.19ns (-1.56%)
  BuildInt32ColumnByChunk/5                               198.81ns            200.36ns            1.55ns (0.78%)
  BufferedOutputStreamSmallWritesToPipe/real_time         860.02 MiB/s        874.24 MiB/s        14.22 MiB/s (1.65%)
  FileOutputStreamLargeWritesToPipe/real_time             2.24 GiB/s          2.3 GiB/s           59.07 MiB/s (2.58%)
  BufferedOutputStreamLargeWritesToPipe/real_time         2.24 GiB/s          2.27 GiB/s          31.92 MiB/s (1.39%)
  TakeString/32768/1/min_time:1.000                       1.45 GiB/s          1.47 GiB/s          21.99 MiB/s (1.48%)
  TakeInt64/8388608/1/min_time:1.000                      343.49 MiB/s        357.31 MiB/s        13.82 MiB/s (4.02%)
  TakeInt64/1048576/1/min_time:1.000                      453.24 MiB/s        470.46 MiB/s        17.23 MiB/s (3.80%)
  TakeInt64/32768/50/min_time:1.000                       342.54 MiB/s        335.58 MiB/s        -6.96 MiB/s (-2.03%)
  TakeString/32768/0/min_time:1.000                       1.62 GiB/s          1.65 GiB/s          31.15 MiB/s (1.87%)
  TakeString/1048576/1/min_time:1.000                     1.27 GiB/s          1.29 GiB/s          22.73 MiB/s (1.74%)
  TakeInt64VsFilter/32768/10/min_time:1.000               1.38 GiB/s          1.39 GiB/s          15.37 MiB/s (1.09%)
  TakeFixedSizeList1Int64/1048576/1/min_time:1.000        120.13 MiB/s        121.87 MiB/s        1.74 MiB/s (1.45%)
  TakeInt64VsFilter/32768/50/min_time:1.000               779.37 MiB/s        787.26 MiB/s        7.9 MiB/s (1.01%)
  FilterFixedSizeList1Int64/8388608/1/min_time:1.000      347.87 MiB/s        354.1 MiB/s         6.23 MiB/s (1.79%)
  FilterInt64/8388608/1/min_time:1.000                    618.86 MiB/s        631.16 MiB/s        12.3 MiB/s (1.99%)
  FilterString/32768/0/min_time:1.000                     4.03 GiB/s          4.1 GiB/s           78.39 MiB/s (1.90%)
  FilterString/32768/1/min_time:1.000                     3.92 GiB/s          3.99 GiB/s          70.94 MiB/s (1.77%)
  FilterFixedSizeList1Int64/32768/0/min_time:1.000        399.29 MiB/s        403.82 MiB/s        4.54 MiB/s (1.14%)
  FilterString/1048576/1/min_time:1.000                   3.3 GiB/s           3.23 GiB/s          -71.76 MiB/s (-2.12%)
  FilterFixedSizeList1Int64/32768/50/min_time:1.000       171.51 MiB/s        172.7 MiB/s         1.19 MiB/s (0.70%)
  FilterFixedSizeList1Int64/1048576/1/min_time:1.000      356.37 MiB/s        354.26 MiB/s        -2.11 MiB/s (-0.59%)
  FilterInt64/32768/1/min_time:1.000                      817.28 MiB/s        807.74 MiB/s        -9.54 MiB/s (-1.17%)
  FilterInt64/32768/10/min_time:1.000                     712.6 MiB/s         706.36 MiB/s        -6.23 MiB/s (-0.87%)
  FilterString/32768/50/min_time:1.000                    1.95 GiB/s          1.96 GiB/s          15.05 MiB/s (0.76%)
  FilterString/8388608/1/min_time:1.000                   3.46 GiB/s          3.43 GiB/s          -37.88 MiB/s (-1.07%)
  FilterInt64/32768/50/min_time:1.000                     365.68 MiB/s        370.37 MiB/s        4.7 MiB/s (1.28%)
  FilterFixedSizeList1Int64/32768/10/min_time:1.000       318.63 MiB/s        314.88 MiB/s        -3.75 MiB/s (-1.18%)
  ReadJSONBlockWithSchemaMultiThread/real_time            222.26 MiB/s        219.02 MiB/s        -3.24 MiB/s (-1.46%)
  ChunkJSONLineDelimited                                  104.61ns            100.43ns            -4.17ns (-3.99%)
  BuildStringDictionary                                   45.66 MiB/s         47.48 MiB/s         1.82 MiB/s (3.99%)
  UniqueInt64WithNulls/4194304/1024                       1.9 GiB/s           1.88 GiB/s          -16.19 MiB/s (-0.83%)
  SumKernel/32768/1                                       14.49 GiB/s         14.31 GiB/s         -174.51 MiB/s (-1.18%)
  ThreadPoolSpawn/threads:2/task_cost:100000/real_time    21766.45246150031   22732.475168703626  966.0227072033158 (4.44%)
  ThreadedTaskGroup/threads:2/task_cost:10000/real_time   243879.6678804378   242437.13088468352  -1442.5369957542862 (-0.59%)
  ThreadPoolSpawn/threads:4/task_cost:100000/real_time    41313.19311685875   40135.79891525845   -1177.3942016002984 (-2.85%)
  ThreadedTaskGroup/threads:4/task_cost:100000/real_time  47294.43689321427   47773.651572188304  479.21467897403636 (1.01%)
  ThreadPoolSpawn/threads:8/task_cost:1000/real_time      234707.2491696133   236067.04388044862  1359.7947108353255 (0.58%)
  ThreadPoolSpawn/threads:2/task_cost:10000/real_time     215989.39769180672  210750.91108234323  -5238.486609463493 (-2.43%)
  ThreadedTaskGroup/threads:8/task_cost:1000/real_time    222985.67371614516  233740.65355218883  10754.979836043669 (4.82%)
  ThreadedTaskGroup/threads:1/task_cost:1000/real_time    886778.1114667933   878946.971340187    -7831.140126606332 (-0.88%)
  ThreadedTaskGroup/threads:4/task_cost:1000/real_time    230454.4592953691   232475.82248897856  2021.3631936094607 (0.88%)
  ThreadPoolSpawn/threads:1/task_cost:1000/real_time      982827.9930659528   953462.7785867956   -29365.214479157235 (-2.99%)
  CompareArrayScalarKernel/32768/0                        12.15 GiB/s         12.05 GiB/s         -97.48 MiB/s (-0.78%)
  CompareArrayScalarKernel/32768/1                        11.93 GiB/s         12.05 GiB/s         125.43 MiB/s (1.03%)
  CompareArrayArrayKernel/32768/0                         20.59 GiB/s         20.72 GiB/s         133.79 MiB/s (0.63%)
  DetectIntWidthNoNulls                                   18.97 GiB/s         19.22 GiB/s         262.51 MiB/s (1.35%)
+ BuildInt64DictionaryArraySimilar                        379.53 MiB/s        412.36 MiB/s        32.84 MiB/s (8.65%)
+ BuildInt64DictionaryArraySequential                     646.98 MiB/s        698.71 MiB/s        51.73 MiB/s (8.00%)
  BuildChunkedBinaryArray                                 382.88 MiB/s        378.39 MiB/s        -4.49 MiB/s (-1.17%)
  ArrayDataConstructDestruct                              68708.70ns          69165.68ns          456.98ns (0.67%)
+ BuildBinaryArray                                        286.93 MiB/s        315.39 MiB/s        28.47 MiB/s (9.92%)
  ParallelMemoryCopy/threads:1/real_time                  6.76 GiB/s          6.99 GiB/s          236.68 MiB/s (3.42%)
  ParallelMemoryCopy/threads:8/real_time                  23.21 GiB/s         23.47 GiB/s         261.88 MiB/s (1.10%)
  ParallelMemoryCopy/threads:4/real_time                  21.53 GiB/s         21.2 GiB/s          -332.31 MiB/s (-1.51%)
  ParallelMemoryCopy/threads:16/real_time                 22.48 GiB/s         22.77 GiB/s         300.61 MiB/s (1.31%)
  ParallelMemoryCopy/threads:40/real_time                 21.62 GiB/s         21.86 GiB/s         244.3 MiB/s (1.10%)
+ ParallelMemoryCopy/threads:2/real_time                  10.32 GiB/s         10.98 GiB/s         679.54 MiB/s (6.43%)
  ReadRecordBatch/16/real_time                            197.95 GiB/s        196.43 GiB/s        -1.52 GiB/s (-0.77%)
  WriteRecordBatch/4096/real_time                         691.66 MiB/s        700.99 MiB/s        9.33 MiB/s (1.35%)
+ ReadRecordBatch/4096/real_time                          489.3 MiB/s         639.14 MiB/s        149.84 MiB/s (30.62%)
  WriteRecordBatch/64/real_time                           9.87 GiB/s          10.13 GiB/s         263.55 MiB/s (2.61%)
  WriteRecordBatch/256/real_time                          6.35 GiB/s          6.48 GiB/s          132.12 MiB/s (2.03%)
  WriteRecordBatch/1024/real_time                         2.58 GiB/s          2.61 GiB/s          32.21 MiB/s (1.22%)
  WriteRecordBatch/16/real_time                           11.53 GiB/s         11.88 GiB/s         354.96 MiB/s (3.01%)
  ReadRecordBatch/256/real_time                           10.15 GiB/s         10.28 GiB/s         134.7 MiB/s (1.30%)
  ReadRecordBatch/64/real_time                            51.14 GiB/s         51.81 GiB/s         681.34 MiB/s (1.30%)
  ReadRecordBatch/1024/real_time                          2.5 GiB/s           2.47 GiB/s          -26.35 MiB/s (-1.03%)
  WriteRecordBatch/1/real_time                            11.99 GiB/s         12.38 GiB/s         401.45 MiB/s (3.27%)
  WriteRecordBatch/4/real_time                            12.03 GiB/s         12.39 GiB/s         373.24 MiB/s (3.03%)
  WriteRecordBatch/8192/real_time                         343.59 MiB/s        347.61 MiB/s        4.01 MiB/s (1.17%)
  FirstTimeBitmapWriter/8192                              96.73 MiB/s         97.89 MiB/s         1.16 MiB/s (1.20%)
+ GenerateBitsUnrolled/8192                               139.15 MiB/s        149.99 MiB/s        10.84 MiB/s (7.79%)
  VisitBits/8192                                          102.32 MiB/s        100.94 MiB/s        -1.38 MiB/s (-1.35%)
+ BitmapReader/8192                                       98.96 MiB/s         109.27 MiB/s        10.31 MiB/s (10.42%)
- BitmapWriter/8192                                       83.85 MiB/s         77.89 MiB/s         -5.96 MiB/s (-7.11%)
- GenerateBits/8192                                       88.66 MiB/s         80.93 MiB/s         -7.73 MiB/s (-8.72%)
  ======================================================  ==================  ==================  ============================

  124 result(s) not shown due to tiny delta

emkornfield · 2019-07-01T03:14:23Z

@ursabot benchmark --help

ursabot · 2019-07-01T03:14:24Z

Usage: @ursabot benchmark [OPTIONS] [<baseline>]

  Run the benchmark suite in comparison mode.

  This command will run the benchmark suite for tip of the branch commit
  against `<baseline>` (or master if not provided).

  Examples:

  # Run the all the benchmarks
  @ursabot benchmark

  # Compare only benchmarks where the name matches the /^Sum/ regex
  @ursabot benchmark --benchmark-filter=^Sum

  # Compare only benchmarks where the suite matches the /compute-/ regex.
  # A suite is the C++ binary.
  @ursabot benchmark --suite-filter=compute-

  # Sometimes a new optimization requires the addition of new benchmarks to
  # quantify the performance increase. When doing this be sure to add the
  # benchmark in a separate commit before introducing the optimization.
  #
  # Note that specifying the baseline is the only way to compare using a new
  # benchmark, since master does not contain the new benchmark and no
  # comparison is possible.
  #
  # The following command compares the results of matching benchmarks,
  # compiling against HEAD and the provided baseline commit, e.g. eaf8302.
  # You can use this to quantify the performance improvement of new
  # optimizations or to check for regressions.
  @ursabot benchmark --benchmark-filter=MyBenchmark eaf8302

Options:
  --suite-filter <regex>      Regex filtering benchmark suites.
  --benchmark-filter <regex>  Regex filtering benchmarks.
  --help                      Show this message and exit.

emkornfield · 2019-07-01T03:15:00Z

@ursabot benchmark --benchmark-filter=Bit

ursabot · 2019-07-01T03:25:42Z

AMD64 Ubuntu 18.04 C++ Benchmark (#28904) builder has been succeeded.

Revision: db49fbb

  ==========================  ============  ============  ====================
  benchmark                   baseline      contender     delta
  ==========================  ============  ============  ====================
- GenerateBits/8192           88.66 MiB/s   80.77 MiB/s   -7.88 MiB/s (-8.89%)
+ BitmapReader/8192           98.91 MiB/s   109.34 MiB/s  10.42 MiB/s (10.54%)
+ GenerateBitsUnrolled/8192   136.98 MiB/s  150.46 MiB/s  13.48 MiB/s (9.84%)
  VisitBits/8192              102.24 MiB/s  101.13 MiB/s  -1.12 MiB/s (-1.09%)
  FirstTimeBitmapWriter/8192  96.33 MiB/s   97.98 MiB/s   1.65 MiB/s (1.71%)
- BitmapWriter/8192           83.81 MiB/s   77.92 MiB/s   -5.89 MiB/s (-7.03%)
  ==========================  ============  ============  ====================

  4 result(s) not shown due to tiny delta

pitrou

LGTM. I hope debug mode doesn't become significantly slower because of this.

cpp/src/arrow/util/hashing.h

fsaintjacques · 2019-07-02T14:38:08Z

cpp/src/arrow/util/ubsan.h

  return reinterpret_cast<T*>(&internal::non_null_filler);
 }

+template <typename T>


I'm no C++ guru, but can't you make a single method for this by templating the input type too, e.g.

template <typename T, typename I = uint8_t> inline typename std::enable_if<std::is_integral<T>::value, T>::type SafeLoad( const I* unaligned) { typename std::remove_const<T>::type ret; std::memcpy(&ret, unaligned, sizeof(T)); return ret; }

This is correct, I think I like how the methods are now though since it is more explicit when casting and a pass-through when not. One method could certainly delegate to the other.

fsaintjacques · 2019-07-02T14:46:23Z

cpp/src/parquet/arrow/writer.h

-  int64_t* impala_last_day_nanos = reinterpret_cast<int64_t*>(impala_timestamp);
-  *impala_last_day_nanos = last_day_units * NanosecondsPerUnit;
+  auto last_day_nanos = last_day_units * NanosecondsPerUnit;
+  // Strage might be unaligned, so use mempcy instead of reinterpret_cast


All even indexed Int96 in a vector will be unaligned (according to int64_t alignment).

Good point I'll open up a follow-up PR, we must not hav good test data here or UBSan isn't foolproof, or somehow I didn't run this test properly

Wait were you just commenting on my comment?

Yes, I was commenting on your comment on the "strange" part :)

Did you notice the typo? ("Strage")

fsaintjacques · 2019-07-02T14:52:43Z

Heh, didn't see Antoine merged, opened link from ML :)

emkornfield · 2019-07-02T15:22:40Z

@fsaintjacques thank you for the comments I'll address them in a follow-up PR>

- Add utility methods for unaligned loads use where errors are discovered. - Upgrade version of flatbuffers to avoid issues with unaligned load in that library - Discover bug in spec that makes zero-copy well defined behavior virtually impossible with flatbuffers (need to discuss on ML). For now I'm not turning on ASAN and will file a follow-up JIRA to track this. Still needed: - [ ] Performance testing - [X] Discuss flatbuffers issues (I sent e-mail to LM) Author: Micah Kornfield <emkornfield@gmail.com> Author: emkornfield <emkornfield@gmail.com> Closes #4757 from emkornfield/ubsan_mem and squashes the following commits: 5528584 <emkornfield> remove TODO db49fbb <Micah Kornfield> Ubsan excluding flatbuffers

Ubsan excluding flatbuffers

db49fbb

emkornfield force-pushed the ubsan_mem branch from 2abe158 to db49fbb Compare June 30, 2019 21:54

emkornfield changed the title ~~ARROW-5380: [C++][WIP] Fix memory UBSan errors.~~ ARROW-5380: [C++] Fix memory alignment UBSan errors. Jun 30, 2019

pitrou approved these changes Jul 1, 2019

View reviewed changes

cpp/src/arrow/util/hashing.h Outdated Show resolved Hide resolved

remove TODO

5528584

pitrou closed this in 1bbeb35 Jul 2, 2019

fsaintjacques reviewed Jul 2, 2019

View reviewed changes

asfimport mentioned this pull request Aug 1, 2019

[C++] Fix and enable UBSan for unaligned accesses. #21839

Closed

ARROW-5380: [C++] Fix memory alignment UBSan errors. #4757

ARROW-5380: [C++] Fix memory alignment UBSan errors. #4757

Uh oh!

Conversation

emkornfield commented Jun 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Jun 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

emkornfield commented Jul 1, 2019

Uh oh!

ursabot commented Jul 1, 2019

Uh oh!

emkornfield commented Jul 1, 2019

Uh oh!

ursabot commented Jul 1, 2019

Uh oh!

emkornfield commented Jul 1, 2019

Uh oh!

ursabot commented Jul 1, 2019

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fsaintjacques Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

emkornfield Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

fsaintjacques Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

emkornfield Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

emkornfield Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

fsaintjacques Jul 11, 2019

Choose a reason for hiding this comment

Uh oh!

pitrou Jul 11, 2019

Choose a reason for hiding this comment

Uh oh!

fsaintjacques commented Jul 2, 2019

Uh oh!

emkornfield commented Jul 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

emkornfield commented Jun 30, 2019 •

edited

Loading

codecov-io commented Jun 30, 2019 •

edited

Loading