-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-12010: [C++][Compute] Improve performance of the hash table used in GroupIdentifier #9768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format? See also: |
765371c to
66c2b88
Compare
cpp/src/arrow/CMakeLists.txt
Outdated
| list(APPEND ARROW_SRCS engine/key_hash_avx2.cc) | ||
| set_source_files_properties(engine/key_hash_avx2.cc PROPERTIES | ||
| SKIP_PRECOMPILE_HEADERS ON) | ||
| set_source_files_properties(engine/key_hash_avx2.cc PROPERTIES | ||
| COMPILE_FLAGS ${ARROW_AVX2_FLAG}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's extract this to a macro,
macro(append_avx2_src SRC)
if(ARROW_HAVE_RUNTIME_AVX2)
list(APPEND ARROW_SRCS ${SRC})
set_source_files_properties(${SRC} PROPERTIES
SKIP_PRECOMPILE_HEADERS ON)
set_source_files_properties(${SRC} PROPERTIES
COMPILE_FLAGS ${ARROW_AVX2_FLAG})
endif()
endmacro()| ExpectConsume(*ExecBatch::Make(key_batch), expected); | ||
| } | ||
|
|
||
| void AssertEquivalentIds(const Datum& expected, const Datum& actual) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include a comment describing what this does compared to AssertDatumsEqual
| max_right_id = right_ids[i]; | ||
| } | ||
| } | ||
| std::vector<bool> right_to_left_present; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector can be sized and initialized on construction
| std::vector<bool> right_to_left_present; | |
| std::vector<bool> right_to_left_present(max_right_id + 1, false); |
cpp/src/arrow/engine/util.h
Outdated
| --num_vectors_; | ||
| } | ||
| static constexpr int64_t padding = 64; | ||
| MemoryPool* pool_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TempVectorStack doesn't seem to use this MemoryPool outside of init(), could you remove it and add a comment describing the use of this class?
cpp/src/arrow/engine/util_avx2.cc
Outdated
| // second | ||
| input = _mm256_shuffle_epi8( | ||
| input, _mm256_setr_epi64x(0x0e0c0a0806040200ULL, 0x0f0d0b0907050301ULL, | ||
| 0x0e0c0a0806040200ULL, 0x0f0d0b0907050301ULL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create constexpr variables for any magic numbers
| // ** The order of bytes is reversed - highest byte represents 0th bucket. | ||
| // No other part of data structure uses this reversed order. | ||
| // | ||
| class SwissTable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a higher level doc comment describing this class and it's utility. Specifically: describe how equality comparison and appending/storage of new entries are deferred to callbacks.
Separately, add a comment detailing its implementation and usage of stamps for vectorized probing.
cpp/src/arrow/engine/key_encode.h
Outdated
|
|
||
| class EncoderNulls { | ||
| public: | ||
| static void Encode(KeyRowArray& rows, const std::vector<KeyColumnArray>& cols, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid mutable references. For out arguments, please use a mutable pointer
| static void Encode(KeyRowArray& rows, const std::vector<KeyColumnArray>& cols, | |
| static void Encode(KeyRowArray* rows, const std::vector<KeyColumnArray>& cols, |
| *reinterpret_cast<uint16_t*>(row_base + i * row_size) = | ||
| reinterpret_cast<const uint16_t*>(col_base)[i]; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of unaligned accesses like this one. This is undefined behavior in C++ and it's not supported on all platforms. Could we use SafeLoadAs and SafeStore? If those produce a performance regression, can we optimize them?
cddc9a2 to
f23c506
Compare
caded37 to
ca50d93
Compare
…alursa/arrow into ARROW-12010-GroupIdentifier
…alursa/arrow into ARROW-12010-GroupIdentifier
|
@michalursa looks like tests are hanging on our bigendian CI. Is this quick to address or should we leave this for follow up |
cc @kiszk |
It requires a bit of thinking (or debugging) to make the code work with big endian. But since we still have the other group by implementation, for now I am disabling new one on big endian architectures, and will move this issue to a separate jira. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'll merge.
Please add follow up JIRAs and link them here, including:
- Extract vectorized bit utilities
- Enable GrouperFastImpl for big endian platforms
- Unit tests should iterate through supported hardware flags (so an AVX2 build should test AVX2 and scalar implementations)
This is the draft version of the code implementing functionality for mapping arbitrary set of input columns considered a key in grouping operation into a vector containing integer group identifiers (same combinations of input key columns get same ids).
I will continue working on it and updating it with:
At this point group ids, row ids, offsets, hash values are 32-bit. The overflow checks are missing in current version and still need to be fixed.
The entry point for id mapping is GroupBy class. It uses three main modules: storage defined in groupby_storage* files, hash defined in groupby_hash* files and hash table defined in groupby_map* files. Key values stored with the hash table are row oriented. Storage part of the code defines functions converting from column oriented storage to row oriented storage and back. It also implements comparison and appending keys to the incremental store.
I plan to add design doc in a form of a readme file later on.
The individual modules and functions present here have been tested with unit tests and are passing them but unit tests are not included in this change yet.