-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-4725: [C++] Enable dictionary builder tests with MinGW build #4255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4255 +/- ##
==========================================
+ Coverage 88.27% 88.31% +0.03%
==========================================
Files 763 763
Lines 93723 93745 +22
Branches 1251 1251
==========================================
+ Hits 82735 82789 +54
+ Misses 10875 10843 -32
Partials 113 113
Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #4255 +/- ##
==========================================
+ Coverage 88.27% 88.31% +0.03%
==========================================
Files 763 763
Lines 93723 93745 +22
Branches 1251 1251
==========================================
+ Hits 82735 82789 +54
+ Misses 10875 10843 -32
Partials 113 113
Continue to review full report at Codecov.
|
cpp/src/arrow/array/builder_dict.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems roundabout; for integer values you could use ConcreteMemoTable = typename internal::HashTraits<typename CTypeTraits<T>::ArrowType> and for GetOrInsert(string_view) you could just explicitly name BinaryMemoTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
I didn't know that we have CTypeTraits!
This stops exporting template class that has internal implementation. It's not supported in MinGW. This exposes all template class implementations to .h and hides internal details to .cc by pimpl idiom. See also ARROW-4399.
f2bbb0e to
e4d7f7d
Compare
|
No more comments? |
|
@pitrou could you have a look at this if possible? Are there any performance implications? |
|
@kou can you or someone run a benchmark comparison (we are working on doing this in an automated fashion in the near future)? |
| public: | ||
| virtual ~MemoTable() = default; | ||
|
|
||
| virtual int32_t size() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this change uses size() against MemoTable not subclass such as ScalarMemoTable here https://github.com/apache/arrow/pull/4255/files#diff-5562b5446398874441052ef04e7ee2e6R243 .
If this doesn't exist, we need to cast to subclass to call size().
|
I don't like this at all. This is adding some complication and exposing implementation details in the details just to appease an exotic platform compiler. I'd also like to see some benchmark numbers. |
|
I've got some refactoring of this file going in my branch for ARROW-3144 -- https://github.com/wesm/arrow/tree/ARROW-3144 (to shift to variable dictionaries). It seems the issue is around symbol visibility of the template exports -- the solution may be to move more of the implementation to the header file. I could look at this in more detail if you'd like, but it won't be until next week I don't think that exposing implementation in the header is a bad thing so long as the details are in an |
|
I didn't run Now, I run it on my Linux desktop. Note that this result may have error because I was doing some light tasks such as writing a comment on the same machine while Before: After: It seems that there are no significant slowdown. Most of cases are faster a bit but it may be error. Some other builder benchmarks are slower with this change. But I'm not sure this is related to this change or just error. Before: After: See the attached full benchmark logs for details:
|
|
I'll look at the benchmark results also locally. @kou what is the easiest way to reproduce this issue locally, using Cygwin? I would like to understand the template visibility issue more deeply since I've spent a lot of time on such problems over the last few years |
|
Hm this branch seems consistently faster than master (yes, I do see the CPU scaling warning... I haven't figured out how to disable it on the machine that I'm on, using before after @pitrou thoughts? I fear that may be due for multiple rounds of refactoring, we may also want to expand the benchmarking to be more comprehensive or understand why this patch makes things a good deal faster |
|
Ah, perhaps this PR allows inlining more of the code. In this case I'm probably going change my opinion about it :-) |
|
Yeah my preference is to merge this since it is also in the way of my work on ARROW-3144 which touches the DictionaryBuilder implementation details |
|
@wesm You don't need to use Cygwin. You can use MSYS2. Here are instruction to reproduce this issue locally:
set MINGW_PACKAGE_PREFIX=mingw-w64-x86_64
set MINGW_PREFIX=C:\msys64\mingw64
set MSYSTEM=MINGW64
set PATH=%MINGW_PREFIX%\bin;C:\msys64\usr\bin;%PATH%
pacman --sync --refresh --noconfirm ^
"%MINGW_PACKAGE_PREFIX%-boost" ^
"%MINGW_PACKAGE_PREFIX%-brotli" ^
"%MINGW_PACKAGE_PREFIX%-cmake" ^
"%MINGW_PACKAGE_PREFIX%-double-conversion" ^
"%MINGW_PACKAGE_PREFIX%-flatbuffers" ^
"%MINGW_PACKAGE_PREFIX%-gflags" ^
"%MINGW_PACKAGE_PREFIX%-lz4" ^
"%MINGW_PACKAGE_PREFIX%-protobuf" ^
"%MINGW_PACKAGE_PREFIX%-rapidjson" ^
"%MINGW_PACKAGE_PREFIX%-snappy" ^
"%MINGW_PACKAGE_PREFIX%-thrift" ^
"%MINGW_PACKAGE_PREFIX%-zlib" ^
"%MINGW_PACKAGE_PREFIX%-zstd"
mkdir cpp\build
cd cpp\build
cmake ^
-G "MSYS Makefiles" ^
-DCMAKE_BUILD_TYPE=release ^
-DARROW_PACKAGE_PREFIX=%MINGW_PREFIX% ^
-DARROW_JEMALLOC=OFF ^
-DARROW_USE_GLOG=OFF ^
-DARROW_BUILD_TESTS=ON ^
..
make -j8If it doesn't work on your environment, please show me the error message. |
|
I'm merging this since it's blocking me. @pitrou I can take responsibility for follow-on improvements to this code per your feedback -- I will also install msys2 so I can understand the issue a bit better (having suffered through template visibility issues in many other environments) |
This stops exporting template class that has internal
implementation. It's not supported in MinGW. This exposes all template
class implementations to .h and hides internal details to .cc by pimpl
idiom.
See also ARROW-4399.