Skip to content

Conversation

@kou
Copy link
Member

@kou kou commented May 5, 2019

This stops exporting template class that has internal
implementation. It's not supported in MinGW. This exposes all template
class implementations to .h and hides internal details to .cc by pimpl
idiom.

See also ARROW-4399.

@codecov-io
Copy link

Codecov Report

Merging #4255 into master will increase coverage by 0.03%.
The diff coverage is 84.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4255      +/-   ##
==========================================
+ Coverage   88.27%   88.31%   +0.03%     
==========================================
  Files         763      763              
  Lines       93723    93745      +22     
  Branches     1251     1251              
==========================================
+ Hits        82735    82789      +54     
+ Misses      10875    10843      -32     
  Partials      113      113
Impacted Files Coverage Δ
cpp/src/arrow/util/hashing.h 99.26% <100%> (ø) ⬆️
cpp/src/arrow/array/builder_dict.h 81.48% <76.92%> (-7.41%) ⬇️
cpp/src/arrow/array/builder_dict.cc 87.5% <90.76%> (+23.08%) ⬆️
cpp/src/arrow/util/thread-pool-test.cc 98.59% <0%> (+0.93%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 982f341...d9b1188. Read the comment docs.

1 similar comment
@codecov-io
Copy link

Codecov Report

Merging #4255 into master will increase coverage by 0.03%.
The diff coverage is 84.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4255      +/-   ##
==========================================
+ Coverage   88.27%   88.31%   +0.03%     
==========================================
  Files         763      763              
  Lines       93723    93745      +22     
  Branches     1251     1251              
==========================================
+ Hits        82735    82789      +54     
+ Misses      10875    10843      -32     
  Partials      113      113
Impacted Files Coverage Δ
cpp/src/arrow/util/hashing.h 99.26% <100%> (ø) ⬆️
cpp/src/arrow/array/builder_dict.h 81.48% <76.92%> (-7.41%) ⬇️
cpp/src/arrow/array/builder_dict.cc 87.5% <90.76%> (+23.08%) ⬆️
cpp/src/arrow/util/thread-pool-test.cc 98.59% <0%> (+0.93%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 982f341...d9b1188. Read the comment docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems roundabout; for integer values you could use ConcreteMemoTable = typename internal::HashTraits<typename CTypeTraits<T>::ArrowType> and for GetOrInsert(string_view) you could just explicitly name BinaryMemoTable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.
I didn't know that we have CTypeTraits!

kou added 2 commits May 8, 2019 06:19
This stops exporting template class that has internal
implementation. It's not supported in MinGW. This exposes all template
class implementations to .h and hides internal details to .cc by pimpl
idiom.

See also ARROW-4399.
@kou kou force-pushed the cpp-dictionary-build-stop-export-template branch from f2bbb0e to e4d7f7d Compare May 7, 2019 21:21
@kou
Copy link
Member Author

kou commented May 8, 2019

No more comments?
I'll merge this tomorrow.

@wesm
Copy link
Member

wesm commented May 9, 2019

@pitrou could you have a look at this if possible? Are there any performance implications?

@wesm
Copy link
Member

wesm commented May 9, 2019

@kou can you or someone run a benchmark comparison (we are working on doing this in an automated fashion in the near future)?

public:
virtual ~MemoTable() = default;

virtual int32_t size() const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this change uses size() against MemoTable not subclass such as ScalarMemoTable here https://github.com/apache/arrow/pull/4255/files#diff-5562b5446398874441052ef04e7ee2e6R243 .
If this doesn't exist, we need to cast to subclass to call size().

@pitrou
Copy link
Member

pitrou commented May 9, 2019

I don't like this at all. This is adding some complication and exposing implementation details in the details just to appease an exotic platform compiler.

I'd also like to see some benchmark numbers.

@wesm
Copy link
Member

wesm commented May 9, 2019

I've got some refactoring of this file going in my branch for ARROW-3144 -- https://github.com/wesm/arrow/tree/ARROW-3144 (to shift to variable dictionaries). It seems the issue is around symbol visibility of the template exports -- the solution may be to move more of the implementation to the header file. I could look at this in more detail if you'd like, but it won't be until next week

I don't think that exposing implementation in the header is a bad thing so long as the details are in an internal namespace or similar

@kou
Copy link
Member Author

kou commented May 9, 2019

I didn't run make benchmark.

Now, I run it on my Linux desktop. Note that this result may have error because I was doing some light tasks such as writing a comment on the same machine while make benchmark.

Before:

BM_BuildInt64DictionaryArrayRandom/repeats:2                     79 us         79 us       8750   960.194MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2                     80 us         80 us       8750   956.713MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_mean                80 us         80 us       8750   958.454MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_median              80 us         80 us       8750   958.454MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_stddev               0 us          0 us       8750   2.46103MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2                 74 us         74 us       9448   1033.77MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2                 74 us         74 us       9448   1026.17MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_mean            74 us         74 us       9448   1029.97MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_median          74 us         74 us       9448   1029.97MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_stddev           0 us          0 us       9448   5.37446MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2                    73 us         73 us       9558   1040.16MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2                    74 us         74 us       9558   1033.28MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_mean               74 us         74 us       9558   1036.72MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_median             74 us         74 us       9558   1036.72MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_stddev              0 us          0 us       9558   4.87107MB/s
BM_BuildStringDictionaryArray/repeats:2                         177 us        177 us       3965   565.818MB/s
BM_BuildStringDictionaryArray/repeats:2                         177 us        177 us       3965   564.514MB/s
BM_BuildStringDictionaryArray/repeats:2_mean                    177 us        177 us       3965   565.166MB/s
BM_BuildStringDictionaryArray/repeats:2_median                  177 us        177 us       3965   565.166MB/s
BM_BuildStringDictionaryArray/repeats:2_stddev                    0 us          0 us       3965   944.721kB/s

After:

BM_BuildInt64DictionaryArrayRandom/repeats:2                     63 us         63 us      11060   1.19024GB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2                     63 us         63 us      11060   1.18231GB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_mean                63 us         63 us      11060   1.18627GB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_median              63 us         63 us      11060   1.18627GB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_stddev               0 us          0 us      11060   5.74351MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2                 56 us         56 us      12392   1.32746GB/s
BM_BuildInt64DictionaryArraySequential/repeats:2                 56 us         56 us      12392   1.32213GB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_mean            56 us         56 us      12392    1.3248GB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_median          56 us         56 us      12392    1.3248GB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_stddev           0 us          0 us      12392    3.8592MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2                    57 us         57 us      11946   1.31269GB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2                    56 us         56 us      11946   1.32086GB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_mean               57 us         57 us      11946   1.31677GB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_median             57 us         57 us      11946   1.31677GB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_stddev              0 us          0 us      11946   5.91462MB/s
BM_BuildStringDictionaryArray/repeats:2                         168 us        168 us       4162    593.52MB/s
BM_BuildStringDictionaryArray/repeats:2                         169 us        169 us       4162   592.553MB/s
BM_BuildStringDictionaryArray/repeats:2_mean                    169 us        169 us       4162   593.036MB/s
BM_BuildStringDictionaryArray/repeats:2_median                  169 us        169 us       4162   593.036MB/s
BM_BuildStringDictionaryArray/repeats:2_stddev                    0 us          0 us       4162   700.711kB/s

It seems that there are no significant slowdown. Most of cases are faster a bit but it may be error.

Some other builder benchmarks are slower with this change. But I'm not sure this is related to this change or just error.

Before:

BM_BuildPrimitiveArrayNoNulls/repeats:2                       46565 us      46565 us         15   10.7376GB/s
BM_BuildPrimitiveArrayNoNulls/repeats:2                       46523 us      46524 us         15   10.7471GB/s
BM_BuildPrimitiveArrayNoNulls/repeats:2_mean                  46544 us      46545 us         15   10.7424GB/s
BM_BuildPrimitiveArrayNoNulls/repeats:2_median                46544 us      46545 us         15   10.7424GB/s
BM_BuildPrimitiveArrayNoNulls/repeats:2_stddev                   29 us         29 us         15   6.84988MB/s

After:

BM_BuildPrimitiveArrayNoNulls/repeats:2                       91208 us      91209 us          8    5.4819GB/s
BM_BuildPrimitiveArrayNoNulls/repeats:2                       90612 us      90613 us          8   5.51799GB/s
BM_BuildPrimitiveArrayNoNulls/repeats:2_mean                  90910 us      90911 us          8   5.49994GB/s
BM_BuildPrimitiveArrayNoNulls/repeats:2_median                90910 us      90911 us          8   5.49994GB/s
BM_BuildPrimitiveArrayNoNulls/repeats:2_stddev                  421 us        422 us          8   26.1289MB/s

See the attached full benchmark logs for details:

@wesm
Copy link
Member

wesm commented May 11, 2019

I'll look at the benchmark results also locally.

@kou what is the easiest way to reproduce this issue locally, using Cygwin? I would like to understand the template visibility issue more deeply since I've spent a lot of time on such problems over the last few years

@wesm
Copy link
Member

wesm commented May 11, 2019

Hm this branch seems consistently faster than master (yes, I do see the CPU scaling warning... I haven't figured out how to disable it on the machine that I'm on, using cpufreq-util -g performance doesn't seem to do it as with other machines)

before

 ./release/arrow-builder-benchmark --benchmark_filter=Dict
2019-05-10 21:07:59
Running ./release/arrow-builder-benchmark
Run on (6 X 4400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------------------
Benchmark                                                        Time           CPU Iterations
-----------------------------------------------------------------------------------------------
BM_BuildInt64DictionaryArrayRandom/repeats:2                   121 us        121 us       6184   632.767MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2                   129 us        129 us       6184   593.177MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_mean              125 us        125 us       6184   612.972MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_median            125 us        125 us       6184   612.972MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_stddev              6 us          6 us       6184   27.9946MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2               101 us        101 us       7143   756.246MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2               102 us        102 us       7143   748.023MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_mean          101 us        101 us       7143   752.134MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_median        101 us        101 us       7143   752.134MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_stddev          1 us          1 us       7143   5.81457MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2                  102 us        102 us       6941    749.24MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2                  101 us        101 us       6941   752.723MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_mean             102 us        102 us       6941   750.981MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_median           102 us        102 us       6941   750.981MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_stddev             0 us          0 us       6941   2.46331MB/s
BM_BuildStringDictionaryArray/repeats:2                        230 us        230 us       2987   434.809MB/s
BM_BuildStringDictionaryArray/repeats:2                        231 us        231 us       2987   432.284MB/s
BM_BuildStringDictionaryArray/repeats:2_mean                   231 us        231 us       2987   433.546MB/s
BM_BuildStringDictionaryArray/repeats:2_median                 231 us        231 us       2987   433.546MB/s
BM_BuildStringDictionaryArray/repeats:2_stddev                   1 us          1 us       2987   1.78519MB/s

after

$ ./release/arrow-builder-benchmark --benchmark_filter=Dict
2019-05-10 21:08:21
Running ./release/arrow-builder-benchmark
Run on (6 X 4400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------------------
Benchmark                                                        Time           CPU Iterations
-----------------------------------------------------------------------------------------------
BM_BuildInt64DictionaryArrayRandom/repeats:2                    84 us         83 us       9113   920.758MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2                    84 us         83 us       9113   918.057MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_mean               84 us         83 us       9113   919.407MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_median             84 us         83 us       9113   919.407MB/s
BM_BuildInt64DictionaryArrayRandom/repeats:2_stddev              0 us          0 us       9113   1.91017MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2                75 us         74 us       9367   1028.33MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2                75 us         74 us       9367   1025.62MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_mean           75 us         74 us       9367   1026.98MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_median         75 us         74 us       9367   1026.98MB/s
BM_BuildInt64DictionaryArraySequential/repeats:2_stddev          0 us          0 us       9367    1.9155MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2                   73 us         73 us      10610   1044.79MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2                   72 us         72 us      10610   1059.06MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_mean              73 us         73 us      10610   1051.92MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_median            73 us         73 us      10610   1051.92MB/s
BM_BuildInt64DictionaryArraySimilar/repeats:2_stddev             1 us          1 us      10610   10.0899MB/s
BM_BuildStringDictionaryArray/repeats:2                        210 us        210 us       3325   475.559MB/s
BM_BuildStringDictionaryArray/repeats:2                        210 us        210 us       3325   476.206MB/s
BM_BuildStringDictionaryArray/repeats:2_mean                   210 us        210 us       3325   475.882MB/s
BM_BuildStringDictionaryArray/repeats:2_median                 210 us        210 us       3325   475.882MB/s
BM_BuildStringDictionaryArray/repeats:2_stddev                   0 us          0 us       3325   468.806kB/s

@pitrou thoughts? I fear that may be due for multiple rounds of refactoring, we may also want to expand the benchmarking to be more comprehensive or understand why this patch makes things a good deal faster

@pitrou
Copy link
Member

pitrou commented May 11, 2019

Ah, perhaps this PR allows inlining more of the code. In this case I'm probably going change my opinion about it :-)

@wesm
Copy link
Member

wesm commented May 11, 2019

Yeah my preference is to merge this since it is also in the way of my work on ARROW-3144 which touches the DictionaryBuilder implementation details

@kou
Copy link
Member Author

kou commented May 12, 2019

@wesm You don't need to use Cygwin. You can use MSYS2.

Here are instruction to reproduce this issue locally:

  1. Install MSYS2 for 64bit: https://www.msys2.org/
  2. Enable arrow-dict-test.cc by https://github.com/apache/arrow/pull/4255/files#diff-c67858a053df184efdd375b4989ce413
  3. Run cmd.exe
  4. Run the followings:
set MINGW_PACKAGE_PREFIX=mingw-w64-x86_64
set MINGW_PREFIX=C:\msys64\mingw64
set MSYSTEM=MINGW64
set PATH=%MINGW_PREFIX%\bin;C:\msys64\usr\bin;%PATH%
pacman --sync --refresh --noconfirm ^
  "%MINGW_PACKAGE_PREFIX%-boost" ^
  "%MINGW_PACKAGE_PREFIX%-brotli" ^
  "%MINGW_PACKAGE_PREFIX%-cmake" ^
  "%MINGW_PACKAGE_PREFIX%-double-conversion" ^
  "%MINGW_PACKAGE_PREFIX%-flatbuffers" ^
  "%MINGW_PACKAGE_PREFIX%-gflags" ^
  "%MINGW_PACKAGE_PREFIX%-lz4" ^
  "%MINGW_PACKAGE_PREFIX%-protobuf" ^
  "%MINGW_PACKAGE_PREFIX%-rapidjson" ^
  "%MINGW_PACKAGE_PREFIX%-snappy" ^
  "%MINGW_PACKAGE_PREFIX%-thrift" ^
  "%MINGW_PACKAGE_PREFIX%-zlib" ^
  "%MINGW_PACKAGE_PREFIX%-zstd"
mkdir cpp\build
cd cpp\build
cmake ^
    -G "MSYS Makefiles" ^
    -DCMAKE_BUILD_TYPE=release ^
    -DARROW_PACKAGE_PREFIX=%MINGW_PREFIX% ^
    -DARROW_JEMALLOC=OFF ^
    -DARROW_USE_GLOG=OFF ^
    -DARROW_BUILD_TESTS=ON ^
    ..
make -j8

If it doesn't work on your environment, please show me the error message.

@wesm
Copy link
Member

wesm commented May 12, 2019

I'm merging this since it's blocking me. @pitrou I can take responsibility for follow-on improvements to this code per your feedback -- I will also install msys2 so I can understand the issue a bit better (having suffered through template visibility issues in many other environments)

@wesm wesm closed this in 828c44b May 12, 2019
@kou kou deleted the cpp-dictionary-build-stop-export-template branch May 12, 2019 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants