Skip to content

Conversation

@linrrzqqq
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

  1. alias xxhash3_64 for xxhash_64

  2. Support function murmur_hash3_64_v2
    Before:

mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+

This result differs from the return value of directly calling mmh3.hash64 in other languages, because external APIs all use mmh3_128 for calculation and then truncate the first 64 bits. The 64-bit version inside Doris compared to the 128-bit version lacks an h2 register (the 64-bit version only has h1), which will affect the final mixing calculation of h1 and h2 in the algorithm, leading to issues in the result.

After support:

mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+

The result now is completely the same as the API call.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Oct 21, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

HappenLee
HappenLee previously approved these changes Oct 21, 2025
Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 21, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@linrrzqqq
Copy link
Contributor Author

run buildall

zclllyybb
zclllyybb previously approved these changes Oct 21, 2025
Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 14.29% (2/14) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

ClickBench: Total hot run time: 28.47 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 0fbf525a9bab117457ed7b3fe6c7edeade6c1499, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.05	0.05
query3	0.27	0.09	0.08
query4	1.61	0.12	0.12
query5	0.28	0.27	0.26
query6	1.16	0.65	0.64
query7	0.04	0.03	0.03
query8	0.05	0.04	0.05
query9	0.64	0.52	0.52
query10	0.59	0.58	0.59
query11	0.16	0.11	0.11
query12	0.16	0.12	0.12
query13	0.65	0.61	0.61
query14	1.04	1.04	1.02
query15	0.87	0.86	0.87
query16	0.39	0.40	0.38
query17	1.05	1.04	1.03
query18	0.22	0.20	0.20
query19	1.91	1.84	1.80
query20	0.02	0.02	0.02
query21	15.45	0.19	0.13
query22	5.13	0.06	0.04
query23	15.75	0.26	0.10
query24	3.01	0.46	0.58
query25	0.07	0.06	0.05
query26	0.15	0.14	0.13
query27	0.06	0.06	0.05
query28	4.52	1.14	0.93
query29	12.58	3.88	3.23
query30	0.27	0.13	0.14
query31	2.83	0.59	0.39
query32	3.26	0.57	0.48
query33	3.04	3.11	3.10
query34	16.08	5.46	4.87
query35	4.90	4.97	5.01
query36	0.70	0.52	0.53
query37	0.10	0.08	0.07
query38	0.07	0.04	0.04
query39	0.03	0.03	0.03
query40	0.18	0.16	0.15
query41	0.09	0.03	0.03
query42	0.04	0.02	0.03
query43	0.05	0.04	0.03
Total cold run time: 99.62 s
Total hot run time: 28.47 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 35.14% (13/37) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.60% (17899/34030)
Line Coverage 37.84% (162373/429048)
Region Coverage 32.25% (123756/383769)
Branch Coverage 33.66% (54238/161125)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 59.46% (22/37) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.35% (24138/33363)
Line Coverage 59.14% (253593/428800)
Region Coverage 54.84% (213189/388727)
Branch Coverage 56.33% (91259/162020)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 92.86% (13/14) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 59.46% (22/37) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.43% (23832/33363)
Line Coverage 57.82% (247939/428800)
Region Coverage 52.81% (205272/388727)
Branch Coverage 54.64% (88525/162020)

@linrrzqqq linrrzqqq dismissed stale reviews from zclllyybb and HappenLee via a192e83 October 21, 2025 15:56
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 92.86% (13/14) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Oct 21, 2025
@linrrzqqq
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 30.05 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c694a5469c2b67bbff35c3e7c30151a557eefe0a, data reload: false

query1	0.06	0.06	0.05
query2	0.13	0.05	0.06
query3	0.26	0.10	0.10
query4	1.62	0.12	0.13
query5	0.29	0.28	0.28
query6	1.21	0.69	0.67
query7	0.04	0.03	0.03
query8	0.07	0.05	0.05
query9	0.66	0.57	0.58
query10	0.63	0.62	0.64
query11	0.19	0.13	0.13
query12	0.18	0.14	0.14
query13	0.66	0.65	0.64
query14	1.10	1.07	1.06
query15	0.96	0.95	0.94
query16	0.43	0.46	0.45
query17	1.14	1.25	1.07
query18	0.24	0.22	0.23
query19	2.10	1.99	1.94
query20	0.02	0.02	0.01
query21	15.36	0.23	0.15
query22	4.94	0.08	0.06
query23	15.62	0.31	0.12
query24	2.50	0.72	0.33
query25	0.07	0.06	0.07
query26	0.17	0.15	0.16
query27	0.08	0.06	0.06
query28	3.94	1.22	0.97
query29	12.61	4.65	3.82
query30	0.30	0.15	0.13
query31	2.85	0.67	0.42
query32	3.25	0.59	0.50
query33	3.24	3.29	3.21
query34	16.12	5.61	4.97
query35	4.99	5.01	5.03
query36	0.74	0.54	0.54
query37	0.11	0.08	0.08
query38	0.08	0.05	0.04
query39	0.04	0.03	0.04
query40	0.19	0.15	0.16
query41	0.10	0.04	0.04
query42	0.05	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 99.39 s
Total hot run time: 30.05 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 95.65% (44/46) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.61% (17906/34033)
Line Coverage 37.87% (162476/429053)
Region Coverage 32.27% (123840/383774)
Branch Coverage 33.68% (54260/161125)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 92.86% (26/28) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.53% (23865/33362)
Line Coverage 57.92% (248357/428791)
Region Coverage 52.90% (205636/388726)
Branch Coverage 54.71% (88645/162020)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 92.86% (13/14) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 22, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zclllyybb zclllyybb merged commit 226576b into apache:master Oct 27, 2025
26 of 28 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 27, 2025
1. alias `xxhash3_64` for `xxhash_64`

2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.

After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+
```

The result now is completely the same as the API call.
yiguolei pushed a commit that referenced this pull request Oct 27, 2025
Cherry-picked from #57180

Co-authored-by: linrrarity <linzhenqi@selectdb.com>
dwdwqfwe pushed a commit to dwdwqfwe/doris that referenced this pull request Oct 31, 2025
1. alias `xxhash3_64` for `xxhash_64`

2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.

After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+
```

The result now is completely the same as the API call.
@yiguolei yiguolei mentioned this pull request Nov 5, 2025
linrrzqqq added a commit to linrrzqqq/doris that referenced this pull request Nov 10, 2025
1. alias `xxhash3_64` for `xxhash_64`

2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.

After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+
```

The result now is completely the same as the API call.
linrrzqqq added a commit to linrrzqqq/doris that referenced this pull request Nov 10, 2025
1. alias `xxhash3_64` for `xxhash_64`

2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.

After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+
```

The result now is completely the same as the API call.
@linrrzqqq linrrzqqq deleted the mmh64-fix branch November 10, 2025 18:39
linrrzqqq added a commit to linrrzqqq/doris that referenced this pull request Nov 11, 2025
1. alias `xxhash3_64` for `xxhash_64`

2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.

After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+
```

The result now is completely the same as the API call.
linrrzqqq added a commit to linrrzqqq/doris that referenced this pull request Nov 11, 2025
1. alias `xxhash3_64` for `xxhash_64`

2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.

After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+
```

The result now is completely the same as the API call.
linrrzqqq added a commit to linrrzqqq/doris that referenced this pull request Nov 11, 2025
1. alias `xxhash3_64` for `xxhash_64`

2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.

After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+
```

The result now is completely the same as the API call.
linrrzqqq added a commit to linrrzqqq/doris that referenced this pull request Nov 11, 2025
1. alias `xxhash3_64` for `xxhash_64`

2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.

After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+
```

The result now is completely the same as the API call.
linrrzqqq added a commit to linrrzqqq/doris that referenced this pull request Nov 11, 2025
1. alias `xxhash3_64` for `xxhash_64`

2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
|                      1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.

After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
|                         4038800892574899471 |
+---------------------------------------------+
```

The result now is completely the same as the API call.
w41ter pushed a commit to w41ter/incubator-doris that referenced this pull request Dec 26, 2025
yiguolei pushed a commit to yiguolei/incubator-doris that referenced this pull request Dec 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants