-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Encapsulate HLL logic #1756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encapsulate HLL logic #1756
Conversation
| if (str != "\\N") { | ||
| uint64_t hash = HashUtil::murmur_hash64A(src, len, HashUtil::MURMUR_SEED); | ||
| char buf[HllHashFunctions::HLL_INIT_EXPLICT_SET_SIZE]; | ||
| char buf[10]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use a const or a macro, better not to use a magic number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Use magic number only for simple.
Because I think we could delete this class code in doris 0.12 version.
| auto* src_hll = reinterpret_cast<HyperLogLog*>(src_slice->data); | ||
| dst_hll->merge(*src_hll); | ||
|
|
||
| delete src_hll; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a dangerous to delete input source content.
Better to leave a comment and try to find a better way to handle this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will Fix this issue with object_pool in next PR.
| std::unique_ptr<HyperLogLog> hll {new HyperLogLog()}; | ||
| if (!input.is_null) { | ||
| uint64_t hash_value = HashUtil::murmur_hash64A(input.ptr, input.len, HashUtil::MURMUR_SEED); | ||
| hll.reset(new HyperLogLog(hash_value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case we create HyperLogLog two times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will Fix
| } else { | ||
| buf.resize(HLL_EMPTY_SIZE); | ||
| } | ||
| hll->serialize((char*)buf.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not serialize(std::string*) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because slice.data type is char* and many HLL methods use char*.
| char buf[HLL_EMPTY_SET_SIZE]; | ||
| buf[0] = HLL_DATA_EMPTY; | ||
| result = AnyValUtil::from_buffer_temp(ctx, buf, sizeof(buf)); | ||
| StringVal HllHashFunctions::hll_hash(FunctionContext* ctx, const StringVal& input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have two implementation? here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I have commented. keep HllHashFunctions only for backward compatibility, we will remove the HllHashFunctions class when doris 0.12 release.
be/src/olap/hll.h
Outdated
| } | ||
|
|
||
| void merge(const HyperLogLog& other) { | ||
| if (other._type == HLL_DATA_EMPTY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 4 * 3 switch case is too big.
Use if block one by one, and directly return in every switch case, which like Guard Clauses and seems more readable.
be/src/olap/hll.h
Outdated
|
|
||
| // _type must change | ||
| if (_type == HLL_DATA_EMPTY) { | ||
| _type = other._type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this->_type is changed, do you want it do following if block? I think there is no need to do following if block, which is a waste of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. So I directly return in every switch case.
be/src/olap/hll.h
Outdated
| update_registers(_registers, hash_value); | ||
| } | ||
|
|
||
| void merge(const HyperLogLog& other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For large code block, we'd better to write it in cpp/cc files to accelerate our compile time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| return HLL_COLUMN_DEFAULT_LEN; | ||
| } | ||
|
|
||
| static int serialize_sparse(char *result, const std::map<int, uint8_t>& index_to_value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
imay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For #1610
Before I do "avoid serialize and deserialize from Scannode to Aggnode" work, I need to unify Bitmap and HyperLogLog behavior and interface firstly. So I encapsulate the HyperLogLog related logic.