-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature](func) Support function mmh64_v2 #57180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
regression-test/suites/query_p0/sql_functions/hash_functions/test_hash_function.groovy
Show resolved
Hide resolved
HappenLee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
zclllyybb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
FE UT Coverage ReportIncrement line coverage |
ClickBench: Total hot run time: 28.47 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
ClickBench: Total hot run time: 30.05 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
HappenLee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
## Proposed changes pick: apache#57180
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
alias
xxhash3_64forxxhash_64Support function
murmur_hash3_64_v2Before:
This result differs from the return value of directly calling mmh3.hash64 in other languages, because external APIs all use mmh3_128 for calculation and then truncate the first 64 bits. The 64-bit version inside Doris compared to the 128-bit version lacks an h2 register (the 64-bit version only has h1), which will affect the final mixing calculation of h1 and h2 in the algorithm, leading to issues in the result.
After support:
The result now is completely the same as the API call.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)