Skip to content

Conversation

@emkornfield
Copy link
Contributor

@emkornfield emkornfield commented Jun 30, 2019

  • Add utility methods for unaligned loads use where errors
    are discovered.
  • Upgrade version of flatbuffers to avoid issues with unaligned
    load in that library
  • Discover bug in spec that makes zero-copy well defined behavior
    virtually impossible with flatbuffers (need to discuss on ML). For now I'm
    not turning on ASAN and will file a follow-up JIRA to track this.

Still needed:

  • Performance testing
  • Discuss flatbuffers issues (I sent e-mail to LM)

@emkornfield emkornfield changed the title ARROW-5380: [C++][WIP] Fix memory UBSan errors. ARROW-5380: [C++] Fix memory alignment UBSan errors. Jun 30, 2019
@codecov-io
Copy link

codecov-io commented Jun 30, 2019

Codecov Report

Merging #4757 into master will increase coverage by 2.55%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4757      +/-   ##
==========================================
+ Coverage   86.44%   88.99%   +2.55%     
==========================================
  Files         992      717     -275     
  Lines      138020    99467   -38553     
  Branches     1418        0    -1418     
==========================================
- Hits       119309    88523   -30786     
+ Misses      18349    10944    -7405     
+ Partials      362        0     -362
Impacted Files Coverage Δ
cpp/src/arrow/util/bpacking.h 99.83% <ø> (+0.02%) ⬆️
cpp/src/parquet/arrow/writer.h 100% <100%> (ø) ⬆️
cpp/src/parquet/file_reader.cc 94.3% <100%> (ø) ⬆️
cpp/src/arrow/util/hashing.h 99.12% <100%> (ø) ⬆️
cpp/src/plasma/common.cc 88.46% <100%> (ø) ⬆️
cpp/src/parquet/encoding.cc 93.99% <100%> (ø) ⬆️
cpp/src/parquet/column_reader.cc 91.5% <100%> (ø) ⬆️
cpp/src/arrow/util/ubsan.h 100% <100%> (ø) ⬆️
cpp/src/parquet/arrow/reader.cc 85.71% <100%> (ø) ⬆️
cpp/src/gandiva/projector.cc 90.75% <0%> (-5.12%) ⬇️
... and 284 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10083ab...5528584. Read the comment docs.

@emkornfield
Copy link
Contributor Author

@ursabot benchmark

@ursabot
Copy link

ursabot commented Jul 1, 2019

AMD64 Ubuntu 18.04 C++ Benchmark (#28883) builder has been succeeded.

Revision: db49fbb

  ======================================================  ==================  ==================  ============================
  benchmark                                               baseline            contender           delta
  ======================================================  ==================  ==================  ============================
  ValidateSmallAscii                                      12.74 GiB/s         12.61 GiB/s         -141.47 MiB/s (-1.08%)
  BuildInt32ColumnByChunk/512                             11379.83ns          11202.64ns          -177.19ns (-1.56%)
  BuildInt32ColumnByChunk/5                               198.81ns            200.36ns            1.55ns (0.78%)
  BufferedOutputStreamSmallWritesToPipe/real_time         860.02 MiB/s        874.24 MiB/s        14.22 MiB/s (1.65%)
  FileOutputStreamLargeWritesToPipe/real_time             2.24 GiB/s          2.3 GiB/s           59.07 MiB/s (2.58%)
  BufferedOutputStreamLargeWritesToPipe/real_time         2.24 GiB/s          2.27 GiB/s          31.92 MiB/s (1.39%)
  TakeString/32768/1/min_time:1.000                       1.45 GiB/s          1.47 GiB/s          21.99 MiB/s (1.48%)
  TakeInt64/8388608/1/min_time:1.000                      343.49 MiB/s        357.31 MiB/s        13.82 MiB/s (4.02%)
  TakeInt64/1048576/1/min_time:1.000                      453.24 MiB/s        470.46 MiB/s        17.23 MiB/s (3.80%)
  TakeInt64/32768/50/min_time:1.000                       342.54 MiB/s        335.58 MiB/s        -6.96 MiB/s (-2.03%)
  TakeString/32768/0/min_time:1.000                       1.62 GiB/s          1.65 GiB/s          31.15 MiB/s (1.87%)
  TakeString/1048576/1/min_time:1.000                     1.27 GiB/s          1.29 GiB/s          22.73 MiB/s (1.74%)
  TakeInt64VsFilter/32768/10/min_time:1.000               1.38 GiB/s          1.39 GiB/s          15.37 MiB/s (1.09%)
  TakeFixedSizeList1Int64/1048576/1/min_time:1.000        120.13 MiB/s        121.87 MiB/s        1.74 MiB/s (1.45%)
  TakeInt64VsFilter/32768/50/min_time:1.000               779.37 MiB/s        787.26 MiB/s        7.9 MiB/s (1.01%)
  FilterFixedSizeList1Int64/8388608/1/min_time:1.000      347.87 MiB/s        354.1 MiB/s         6.23 MiB/s (1.79%)
  FilterInt64/8388608/1/min_time:1.000                    618.86 MiB/s        631.16 MiB/s        12.3 MiB/s (1.99%)
  FilterString/32768/0/min_time:1.000                     4.03 GiB/s          4.1 GiB/s           78.39 MiB/s (1.90%)
  FilterString/32768/1/min_time:1.000                     3.92 GiB/s          3.99 GiB/s          70.94 MiB/s (1.77%)
  FilterFixedSizeList1Int64/32768/0/min_time:1.000        399.29 MiB/s        403.82 MiB/s        4.54 MiB/s (1.14%)
  FilterString/1048576/1/min_time:1.000                   3.3 GiB/s           3.23 GiB/s          -71.76 MiB/s (-2.12%)
  FilterFixedSizeList1Int64/32768/50/min_time:1.000       171.51 MiB/s        172.7 MiB/s         1.19 MiB/s (0.70%)
  FilterFixedSizeList1Int64/1048576/1/min_time:1.000      356.37 MiB/s        354.26 MiB/s        -2.11 MiB/s (-0.59%)
  FilterInt64/32768/1/min_time:1.000                      817.28 MiB/s        807.74 MiB/s        -9.54 MiB/s (-1.17%)
  FilterInt64/32768/10/min_time:1.000                     712.6 MiB/s         706.36 MiB/s        -6.23 MiB/s (-0.87%)
  FilterString/32768/50/min_time:1.000                    1.95 GiB/s          1.96 GiB/s          15.05 MiB/s (0.76%)
  FilterString/8388608/1/min_time:1.000                   3.46 GiB/s          3.43 GiB/s          -37.88 MiB/s (-1.07%)
  FilterInt64/32768/50/min_time:1.000                     365.68 MiB/s        370.37 MiB/s        4.7 MiB/s (1.28%)
  FilterFixedSizeList1Int64/32768/10/min_time:1.000       318.63 MiB/s        314.88 MiB/s        -3.75 MiB/s (-1.18%)
  ReadJSONBlockWithSchemaMultiThread/real_time            222.26 MiB/s        219.02 MiB/s        -3.24 MiB/s (-1.46%)
  ChunkJSONLineDelimited                                  104.61ns            100.43ns            -4.17ns (-3.99%)
  BuildStringDictionary                                   45.66 MiB/s         47.48 MiB/s         1.82 MiB/s (3.99%)
  UniqueInt64WithNulls/4194304/1024                       1.9 GiB/s           1.88 GiB/s          -16.19 MiB/s (-0.83%)
  SumKernel/32768/1                                       14.49 GiB/s         14.31 GiB/s         -174.51 MiB/s (-1.18%)
  ThreadPoolSpawn/threads:2/task_cost:100000/real_time    21766.45246150031   22732.475168703626  966.0227072033158 (4.44%)
  ThreadedTaskGroup/threads:2/task_cost:10000/real_time   243879.6678804378   242437.13088468352  -1442.5369957542862 (-0.59%)
  ThreadPoolSpawn/threads:4/task_cost:100000/real_time    41313.19311685875   40135.79891525845   -1177.3942016002984 (-2.85%)
  ThreadedTaskGroup/threads:4/task_cost:100000/real_time  47294.43689321427   47773.651572188304  479.21467897403636 (1.01%)
  ThreadPoolSpawn/threads:8/task_cost:1000/real_time      234707.2491696133   236067.04388044862  1359.7947108353255 (0.58%)
  ThreadPoolSpawn/threads:2/task_cost:10000/real_time     215989.39769180672  210750.91108234323  -5238.486609463493 (-2.43%)
  ThreadedTaskGroup/threads:8/task_cost:1000/real_time    222985.67371614516  233740.65355218883  10754.979836043669 (4.82%)
  ThreadedTaskGroup/threads:1/task_cost:1000/real_time    886778.1114667933   878946.971340187    -7831.140126606332 (-0.88%)
  ThreadedTaskGroup/threads:4/task_cost:1000/real_time    230454.4592953691   232475.82248897856  2021.3631936094607 (0.88%)
  ThreadPoolSpawn/threads:1/task_cost:1000/real_time      982827.9930659528   953462.7785867956   -29365.214479157235 (-2.99%)
  CompareArrayScalarKernel/32768/0                        12.15 GiB/s         12.05 GiB/s         -97.48 MiB/s (-0.78%)
  CompareArrayScalarKernel/32768/1                        11.93 GiB/s         12.05 GiB/s         125.43 MiB/s (1.03%)
  CompareArrayArrayKernel/32768/0                         20.59 GiB/s         20.72 GiB/s         133.79 MiB/s (0.63%)
  DetectIntWidthNoNulls                                   18.97 GiB/s         19.22 GiB/s         262.51 MiB/s (1.35%)
+ BuildInt64DictionaryArraySimilar                        379.53 MiB/s        412.36 MiB/s        32.84 MiB/s (8.65%)
+ BuildInt64DictionaryArraySequential                     646.98 MiB/s        698.71 MiB/s        51.73 MiB/s (8.00%)
  BuildChunkedBinaryArray                                 382.88 MiB/s        378.39 MiB/s        -4.49 MiB/s (-1.17%)
  ArrayDataConstructDestruct                              68708.70ns          69165.68ns          456.98ns (0.67%)
+ BuildBinaryArray                                        286.93 MiB/s        315.39 MiB/s        28.47 MiB/s (9.92%)
  ParallelMemoryCopy/threads:1/real_time                  6.76 GiB/s          6.99 GiB/s          236.68 MiB/s (3.42%)
  ParallelMemoryCopy/threads:8/real_time                  23.21 GiB/s         23.47 GiB/s         261.88 MiB/s (1.10%)
  ParallelMemoryCopy/threads:4/real_time                  21.53 GiB/s         21.2 GiB/s          -332.31 MiB/s (-1.51%)
  ParallelMemoryCopy/threads:16/real_time                 22.48 GiB/s         22.77 GiB/s         300.61 MiB/s (1.31%)
  ParallelMemoryCopy/threads:40/real_time                 21.62 GiB/s         21.86 GiB/s         244.3 MiB/s (1.10%)
+ ParallelMemoryCopy/threads:2/real_time                  10.32 GiB/s         10.98 GiB/s         679.54 MiB/s (6.43%)
  ReadRecordBatch/16/real_time                            197.95 GiB/s        196.43 GiB/s        -1.52 GiB/s (-0.77%)
  WriteRecordBatch/4096/real_time                         691.66 MiB/s        700.99 MiB/s        9.33 MiB/s (1.35%)
+ ReadRecordBatch/4096/real_time                          489.3 MiB/s         639.14 MiB/s        149.84 MiB/s (30.62%)
  WriteRecordBatch/64/real_time                           9.87 GiB/s          10.13 GiB/s         263.55 MiB/s (2.61%)
  WriteRecordBatch/256/real_time                          6.35 GiB/s          6.48 GiB/s          132.12 MiB/s (2.03%)
  WriteRecordBatch/1024/real_time                         2.58 GiB/s          2.61 GiB/s          32.21 MiB/s (1.22%)
  WriteRecordBatch/16/real_time                           11.53 GiB/s         11.88 GiB/s         354.96 MiB/s (3.01%)
  ReadRecordBatch/256/real_time                           10.15 GiB/s         10.28 GiB/s         134.7 MiB/s (1.30%)
  ReadRecordBatch/64/real_time                            51.14 GiB/s         51.81 GiB/s         681.34 MiB/s (1.30%)
  ReadRecordBatch/1024/real_time                          2.5 GiB/s           2.47 GiB/s          -26.35 MiB/s (-1.03%)
  WriteRecordBatch/1/real_time                            11.99 GiB/s         12.38 GiB/s         401.45 MiB/s (3.27%)
  WriteRecordBatch/4/real_time                            12.03 GiB/s         12.39 GiB/s         373.24 MiB/s (3.03%)
  WriteRecordBatch/8192/real_time                         343.59 MiB/s        347.61 MiB/s        4.01 MiB/s (1.17%)
  FirstTimeBitmapWriter/8192                              96.73 MiB/s         97.89 MiB/s         1.16 MiB/s (1.20%)
+ GenerateBitsUnrolled/8192                               139.15 MiB/s        149.99 MiB/s        10.84 MiB/s (7.79%)
  VisitBits/8192                                          102.32 MiB/s        100.94 MiB/s        -1.38 MiB/s (-1.35%)
+ BitmapReader/8192                                       98.96 MiB/s         109.27 MiB/s        10.31 MiB/s (10.42%)
- BitmapWriter/8192                                       83.85 MiB/s         77.89 MiB/s         -5.96 MiB/s (-7.11%)
- GenerateBits/8192                                       88.66 MiB/s         80.93 MiB/s         -7.73 MiB/s (-8.72%)
  ======================================================  ==================  ==================  ============================

  124 result(s) not shown due to tiny delta

@emkornfield
Copy link
Contributor Author

@ursabot benchmark --help

@ursabot
Copy link

ursabot commented Jul 1, 2019

Usage: @ursabot benchmark [OPTIONS] [<baseline>]

  Run the benchmark suite in comparison mode.

  This command will run the benchmark suite for tip of the branch commit
  against `<baseline>` (or master if not provided).

  Examples:

  # Run the all the benchmarks
  @ursabot benchmark

  # Compare only benchmarks where the name matches the /^Sum/ regex
  @ursabot benchmark --benchmark-filter=^Sum

  # Compare only benchmarks where the suite matches the /compute-/ regex.
  # A suite is the C++ binary.
  @ursabot benchmark --suite-filter=compute-

  # Sometimes a new optimization requires the addition of new benchmarks to
  # quantify the performance increase. When doing this be sure to add the
  # benchmark in a separate commit before introducing the optimization.
  #
  # Note that specifying the baseline is the only way to compare using a new
  # benchmark, since master does not contain the new benchmark and no
  # comparison is possible.
  #
  # The following command compares the results of matching benchmarks,
  # compiling against HEAD and the provided baseline commit, e.g. eaf8302.
  # You can use this to quantify the performance improvement of new
  # optimizations or to check for regressions.
  @ursabot benchmark --benchmark-filter=MyBenchmark eaf8302

Options:
  --suite-filter <regex>      Regex filtering benchmark suites.
  --benchmark-filter <regex>  Regex filtering benchmarks.
  --help                      Show this message and exit.

@emkornfield
Copy link
Contributor Author

@ursabot benchmark --benchmark-filter=Bit

@ursabot
Copy link

ursabot commented Jul 1, 2019

AMD64 Ubuntu 18.04 C++ Benchmark (#28904) builder has been succeeded.

Revision: db49fbb

  ==========================  ============  ============  ====================
  benchmark                   baseline      contender     delta
  ==========================  ============  ============  ====================
- GenerateBits/8192           88.66 MiB/s   80.77 MiB/s   -7.88 MiB/s (-8.89%)
+ BitmapReader/8192           98.91 MiB/s   109.34 MiB/s  10.42 MiB/s (10.54%)
+ GenerateBitsUnrolled/8192   136.98 MiB/s  150.46 MiB/s  13.48 MiB/s (9.84%)
  VisitBits/8192              102.24 MiB/s  101.13 MiB/s  -1.12 MiB/s (-1.09%)
  FirstTimeBitmapWriter/8192  96.33 MiB/s   97.98 MiB/s   1.65 MiB/s (1.71%)
- BitmapWriter/8192           83.81 MiB/s   77.92 MiB/s   -5.89 MiB/s (-7.03%)
  ==========================  ============  ============  ====================

  4 result(s) not shown due to tiny delta

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I hope debug mode doesn't become significantly slower because of this.

@pitrou pitrou closed this in 1bbeb35 Jul 2, 2019
return reinterpret_cast<T*>(&internal::non_null_filler);
}

template <typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no C++ guru, but can't you make a single method for this by templating the input type too, e.g.

template <typename T, typename I = uint8_t>
inline typename std::enable_if<std::is_integral<T>::value, T>::type SafeLoad(
    const I* unaligned) {
  typename std::remove_const<T>::type ret;
  std::memcpy(&ret, unaligned, sizeof(T));
  return ret;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, I think I like how the methods are now though since it is more explicit when casting and a pass-through when not. One method could certainly delegate to the other.

int64_t* impala_last_day_nanos = reinterpret_cast<int64_t*>(impala_timestamp);
*impala_last_day_nanos = last_day_units * NanosecondsPerUnit;
auto last_day_nanos = last_day_units * NanosecondsPerUnit;
// Strage might be unaligned, so use mempcy instead of reinterpret_cast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All even indexed Int96 in a vector will be unaligned (according to int64_t alignment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I'll open up a follow-up PR, we must not hav good test data here or UBSan isn't foolproof, or somehow I didn't run this test properly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait were you just commenting on my comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was commenting on your comment on the "strange" part :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you notice the typo? ("Strage")

@fsaintjacques
Copy link
Contributor

Heh, didn't see Antoine merged, opened link from ML :)

@emkornfield
Copy link
Contributor Author

@fsaintjacques thank you for the comments I'll address them in a follow-up PR>

kou pushed a commit that referenced this pull request Jul 4, 2019
- Add utility methods for unaligned loads use where errors
  are discovered.
- Upgrade version of flatbuffers to avoid issues with unaligned
  load in that library
- Discover bug in spec that makes zero-copy well defined behavior
  virtually impossible with flatbuffers (need to discuss on ML).  For now I'm
  not turning on ASAN and will file a follow-up JIRA to track this.

Still needed:
 - [ ] Performance testing
 - [X] Discuss flatbuffers issues (I sent e-mail to LM)

Author: Micah Kornfield <emkornfield@gmail.com>
Author: emkornfield <emkornfield@gmail.com>

Closes #4757 from emkornfield/ubsan_mem and squashes the following commits:

5528584 <emkornfield> remove TODO
db49fbb <Micah Kornfield> Ubsan excluding flatbuffers
wesm pushed a commit that referenced this pull request Jul 13, 2019
- Add utility methods for unaligned loads use where errors
  are discovered.
- Upgrade version of flatbuffers to avoid issues with unaligned
  load in that library
- Discover bug in spec that makes zero-copy well defined behavior
  virtually impossible with flatbuffers (need to discuss on ML).  For now I'm
  not turning on ASAN and will file a follow-up JIRA to track this.

Still needed:
 - [ ] Performance testing
 - [X] Discuss flatbuffers issues (I sent e-mail to LM)

Author: Micah Kornfield <emkornfield@gmail.com>
Author: emkornfield <emkornfield@gmail.com>

Closes #4757 from emkornfield/ubsan_mem and squashes the following commits:

5528584 <emkornfield> remove TODO
db49fbb <Micah Kornfield> Ubsan excluding flatbuffers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants