Skip to content

Improve performance during rehashing#3073

Merged
zuiderkwast merged 3 commits into
valkey-io:unstablefrom
chzhoo:opt_rehash
Jan 23, 2026
Merged

Improve performance during rehashing#3073
zuiderkwast merged 3 commits into
valkey-io:unstablefrom
chzhoo:opt_rehash

Conversation

@chzhoo

@chzhoo chzhoo commented Jan 18, 2026

Copy link
Copy Markdown
Contributor

Description

As the hashtable grows, rehash is triggered. Rehash consumes a significant amount of CPU time, resulting in noticeable performance degradation. Much of this CPU time is spent on random memory access (when accessing hashtable entries). This CPU usage can be optimized through concurrent memory access.

//  The original rehash process:
For each bucket and its child buckets, all entries are processed as follows:
    1) Access the entry, which involves random memory access.
    2) Compute the hash value of the entry.
    3) Based on the hash value, insert the entries into the new hashtable.

//  The optimized rehash process:
The original steps are transformed into batch processing and phased execution, leveraging
the CPU's multi-issue capability to achieve concurrent memory access. The steps are as follows:
    1) Access multiple entries serially within a for loop. Since each entry access is independent,
       modern CPU architectures can perform concurrent memory access.
    2) Compute the hash values for all entries.
    3) Based on the hash values, insert the entries into the new hashtable.

Benchmark

Benchmark step

Write a batch of data into the server and record the QPS with the following commands.

// Write command
valkey-benchmark  -n 75000000 -r 75000000 --sequential --threads 2 set key:__rand_int__ __data__

// Record QPS
for((x=0;x<100000;x+=1));do valkey-cli info|grep ops;sleep 1;done

Benchmark result

The benchmark results are shown below, with the Y-axis representing the recorded QPS and the X-axis representing time (in seconds). The green line indicates the optimized performance, while the blue line represents the original performance.

7bf815feb63c014872a63636112a79c1

Benchmark env

CPU: Intel(R) Xeon(R) 6983P-C * 8
OS: Ubuntu Server 24.04 LTS 64bit
Memory: 32GB
VM: Tencent Cloud | S9e.2XLARGE32
Server start command: valkey-server --save '' --appendonly no

Signed-off-by: chzhoo <czawyx@163.com>
@codecov

codecov Bot commented Jan 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.36066% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 74.35%. Comparing base (4524c23) to head (87717aa).
⚠️ Report is 19 commits behind head on unstable.

Files with missing lines Patch % Lines
src/hashtable.c 98.36% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3073      +/-   ##
============================================
+ Coverage     74.31%   74.35%   +0.03%     
============================================
  Files           129      129              
  Lines         71010    71037      +27     
============================================
+ Hits          52773    52821      +48     
+ Misses        18237    18216      -21     
Files with missing lines Coverage Δ
src/hashtable.c 89.40% <98.36%> (+0.30%) ⬆️

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread src/hashtable.c Outdated
Signed-off-by: chzhoo <czawyx@163.com>
@chzhoo chzhoo changed the title Optimize the performance degradation during the rehash Mitigate performance degradation during rehash Jan 19, 2026

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very clever!

With this approach, the CPU (or the compiler?) can be smart enough to do concurrent memory access and we don't need to add any explicit __builtin_prefetch() (valkey_prefetch) at all?

Comment thread src/hashtable.c Outdated
Comment thread src/hashtable.c Outdated
Signed-off-by: chzhoo <czawyx@163.com>
@chzhoo

chzhoo commented Jan 19, 2026

Copy link
Copy Markdown
Contributor Author

With this approach, the CPU (or the compiler?) can be smart enough to do concurrent memory access and we don't need to add any explicit __builtin_prefetch() (valkey_prefetch) at all?

Yes, I have benchmarked the performance of manually calling __builtin_prefetch() against this approach, and they deliver identical performance improvement.

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you!

I will merge it but first I will wait some days to let others have a chance to review too.

@zuiderkwast zuiderkwast added release-notes This issue should get a line item in the release notes to-be-merged Almost ready to merge run-benchmark labels Jan 19, 2026
@zuiderkwast zuiderkwast moved this to In Progress in Valkey 9.1 Jan 19, 2026
@github-actions

Copy link
Copy Markdown

Benchmark ran on this commit: 87717aa

Benchmark Comparison: unstable vs 8a8a090 (averaged) - rps metrics

Run Summary:

  • unstable: 80 total runs, 16 configurations (avg 5.00 runs per config)
  • 8a8a090: 80 total runs, 16 configurations (avg 5.00 runs per config)

Statistical Notes:

  • CI99%: 99% Confidence Interval - range where the true population mean is likely to fall
  • PI99%: 99% Prediction Interval - range where a single future observation is likely to fall
  • CV: Coefficient of Variation - relative variability (σ/μ × 100%)

Note: Values with (n=X, σ=Y, CV=Z%, CI99%=±W%, PI99%=±V%) indicate averages from X runs with standard deviation Y, coefficient of variation Z%, 99% confidence interval margin of error ±W% of the mean, and 99% prediction interval margin of error ±V% of the mean. CI bounds [A, B] and PI bounds [C, D] show the actual interval ranges.

Configuration:

  • architecture: aarch64
  • benchmark_mode: duration
  • clients: 1600
  • cluster_mode: False
  • data_size: 16
  • duration: 180
  • tls: False
  • valkey_benchmark_threads: 90
  • warmup: 30
Command Metric Pipeline io_threads unstable 8a8a090 Diff % Change
GET rps 1 1 227331.550 (n=5, σ=1679.731, CV=0.74%, CI99%=±1.521%, PI99%=±3.727%, CI[223872.961, 230790.139], PI[218859.771, 235803.329]) 225675.060 (n=5, σ=2445.363, CV=1.08%, CI99%=±2.231%, PI99%=±5.465%, CI[220640.025, 230710.095], PI[213341.793, 238008.327]) -1656.490 -0.729%
GET rps 1 9 1537483.500 (n=5, σ=11843.678, CV=0.77%, CI99%=±1.586%, PI99%=±3.885%, CI[1513097.203, 1561869.797], PI[1477749.516, 1597217.484]) 1548560.374 (n=5, σ=13544.967, CV=0.87%, CI99%=±1.801%, PI99%=±4.411%, CI[1520671.100, 1576449.648], PI[1480245.884, 1616874.864]) 11076.874 +0.720%
GET rps 10 1 1188247.100 (n=5, σ=2880.378, CV=0.24%, CI99%=±0.499%, PI99%=±1.223%, CI[1182316.361, 1194177.839], PI[1173719.816, 1202774.384]) 1192075.802 (n=5, σ=2149.847, CV=0.18%, CI99%=±0.371%, PI99%=±0.910%, CI[1187649.237, 1196502.367], PI[1181232.977, 1202918.627]) 3828.702 +0.322%
GET rps 10 9 2484702.500 (n=5, σ=19454.777, CV=0.78%, CI99%=±1.612%, PI99%=±3.949%, CI[2444644.844, 2524760.156], PI[2386581.684, 2582823.316]) 2489419.900 (n=5, σ=21994.250, CV=0.88%, CI99%=±1.819%, PI99%=±4.456%, CI[2444133.435, 2534706.365], PI[2378491.168, 2600348.632]) 4717.400 +0.190%
SET rps 1 1 219381.952 (n=5, σ=2182.163, CV=0.99%, CI99%=±2.048%, PI99%=±5.017%, CI[214888.848, 223875.056], PI[208376.139, 230387.765]) 219427.970 (n=5, σ=2600.889, CV=1.19%, CI99%=±2.441%, PI99%=±5.978%, CI[214072.703, 224783.237], PI[206310.298, 232545.642]) 46.018 +0.021%
SET rps 1 9 1458590.524 (n=5, σ=11579.609, CV=0.79%, CI99%=±1.635%, PI99%=±4.004%, CI[1434747.949, 1482433.099], PI[1400188.382, 1516992.666]) 1476453.150 (n=5, σ=15442.674, CV=1.05%, CI99%=±2.154%, PI99%=±5.275%, CI[1444656.472, 1508249.828], PI[1398567.513, 1554338.787]) 17862.626 +1.225%
SET rps 10 1 1015219.774 (n=5, σ=6897.148, CV=0.68%, CI99%=±1.399%, PI99%=±3.426%, CI[1001018.450, 1029421.098], PI[980433.777, 1050005.771]) 1011002.426 (n=5, σ=6741.589, CV=0.67%, CI99%=±1.373%, PI99%=±3.363%, CI[997121.401, 1024883.451], PI[977000.998, 1045003.854]) -4217.348 -0.415%
SET rps 10 9 1863229.624 (n=5, σ=15821.845, CV=0.85%, CI99%=±1.748%, PI99%=±4.283%, CI[1830652.226, 1895807.022], PI[1783431.622, 1943027.626]) 1861866.750 (n=5, σ=10942.478, CV=0.59%, CI99%=±1.210%, PI99%=±2.964%, CI[1839336.036, 1884397.464], PI[1806677.997, 1917055.503]) -1362.874 -0.073%

Configuration:

  • architecture: aarch64
  • benchmark_mode: duration
  • clients: 1600
  • cluster_mode: False
  • data_size: 96
  • duration: 180
  • tls: False
  • valkey_benchmark_threads: 90
  • warmup: 30
Command Metric Pipeline io_threads unstable 8a8a090 Diff % Change
GET rps 1 1 219633.368 (n=5, σ=558.156, CV=0.25%, CI99%=±0.523%, PI99%=±1.282%, CI[218484.117, 220782.619], PI[216818.288, 222448.448]) 218949.982 (n=5, σ=1720.146, CV=0.79%, CI99%=±1.618%, PI99%=±3.962%, CI[215408.177, 222491.787], PI[210274.366, 227625.598]) -683.386 -0.311%
GET rps 1 9 1475438.548 (n=5, σ=9322.927, CV=0.63%, CI99%=±1.301%, PI99%=±3.187%, CI[1456242.512, 1494634.584], PI[1428418.055, 1522459.041]) 1474213.050 (n=5, σ=5102.297, CV=0.35%, CI99%=±0.713%, PI99%=±1.746%, CI[1463707.349, 1484718.751], PI[1448479.444, 1499946.656]) -1225.498 -0.083%
GET rps 10 1 1125427.102 (n=5, σ=6092.618, CV=0.54%, CI99%=±1.115%, PI99%=±2.730%, CI[1112882.317, 1137971.887], PI[1094698.779, 1156155.425]) 1117606.922 (n=5, σ=5580.038, CV=0.50%, CI99%=±1.028%, PI99%=±2.518%, CI[1106117.546, 1129096.298], PI[1089463.814, 1145750.030]) -7820.180 -0.695%
GET rps 10 9 1991558.974 (n=5, σ=24397.528, CV=1.23%, CI99%=±2.522%, PI99%=±6.179%, CI[1941324.126, 2041793.822], PI[1868509.229, 2114608.719]) 1990530.000 (n=5, σ=15990.396, CV=0.80%, CI99%=±1.654%, PI99%=±4.052%, CI[1957605.553, 2023454.447], PI[1909881.905, 2071178.095]) -1028.974 -0.052%
SET rps 1 1 211375.438 (n=5, σ=1606.674, CV=0.76%, CI99%=±1.565%, PI99%=±3.834%, CI[208067.273, 214683.603], PI[203272.122, 219478.754]) 209347.666 (n=5, σ=369.357, CV=0.18%, CI99%=±0.363%, PI99%=±0.890%, CI[208587.155, 210108.177], PI[207484.802, 211210.530]) -2027.772 -0.959%
SET rps 1 9 1476752.000 (n=5, σ=8144.125, CV=0.55%, CI99%=±1.136%, PI99%=±2.781%, CI[1459983.134, 1493520.866], PI[1435676.835, 1517827.165]) 1478347.126 (n=5, σ=11548.269, CV=0.78%, CI99%=±1.608%, PI99%=±3.940%, CI[1454569.080, 1502125.172], PI[1420103.046, 1536591.206]) 1595.126 +0.108%
SET rps 10 1 997713.824 (n=5, σ=5102.691, CV=0.51%, CI99%=±1.053%, PI99%=±2.579%, CI[987207.312, 1008220.336], PI[971978.231, 1023449.417]) 993768.276 (n=5, σ=11079.534, CV=1.11%, CI99%=±2.296%, PI99%=±5.623%, CI[970955.362, 1016581.190], PI[937888.277, 1049648.275]) -3945.548 -0.395%
SET rps 10 9 1783128.126 (n=5, σ=24470.219, CV=1.37%, CI99%=±2.826%, PI99%=±6.921%, CI[1732743.607, 1833512.645], PI[1659711.764, 1906544.488]) 1746892.800 (n=5, σ=25555.703, CV=1.46%, CI99%=±3.012%, PI99%=±7.378%, CI[1694273.255, 1799512.345], PI[1618001.765, 1875783.835]) -36235.326 -2.032%

@dvkashapov

Copy link
Copy Markdown
Member

I don't think this benchmark is representative as rehashing would mainly occur in warmup duration of write load and would go unnoticed in result.

@roshkhatri

Copy link
Copy Markdown
Member

I don't think this benchmark is representative as rehashing would mainly occur in warmup duration of write load and would go unnoticed in result.

Do we want a label for benchmarks without warmups?

@zuiderkwast zuiderkwast changed the title Mitigate performance degradation during rehash Improve performance during rehashing Jan 20, 2026
@zuiderkwast

Copy link
Copy Markdown
Contributor

Do we want a label for benchmarks without warmups?

@roshkhatri It would help for this PR... but I feel that every optimization has some special scenario that we want to target so every optimization requires some specific benchmark to visualize the improvement. I guess whatever we add to the automated benchmarking, it might not be enough for the next optimization PR.

If we could chat with the github actions bot and give it some very specific instructions, like give it a specific valkey-benchmark command line to run, then it might covering more special scenarios. Maybe. I think it's fine to run special benchmarks manually though.

@chzhoo When you run the benchmark, do you see a difference in the latencies in the various percentiles? I would guess we have fewer requests with a high latency.

@chzhoo

chzhoo commented Jan 21, 2026

Copy link
Copy Markdown
Contributor Author

@chzhoo When you run the benchmark, do you see a difference in the latencies in the various percentiles? I would guess we have fewer requests with a high latency.

I re-ran the benchmarks. The new version shows latency improvements at the p95 and p99 percentiles, while p50 latency remains unchanged. The commands and detailed results are below.

// Server start command
valkey-server --save '' --appendonly no

// Benchmark command
valkey-benchmark  -n 75000000 -r 75000000 --sequential --threads 2 set key:__rand_int__ __data__

// Before optimization
====== set key:__rand_int__ __data__ ======
  75000000 requests completed in 401.04 seconds
  50 parallel clients
  45 bytes payload
  keep alive: 1
  host configuration "save":
  host configuration "appendonly": no
  multi-thread: yes
  threads: 2

Latency by percentile distribution:
0.000% <= 0.055 milliseconds (cumulative count 71)
50.000% <= 0.215 milliseconds (cumulative count 40922633)
75.000% <= 0.263 milliseconds (cumulative count 56555392)
87.500% <= 0.319 milliseconds (cumulative count 65903563)
93.750% <= 0.399 milliseconds (cumulative count 70397312)
96.875% <= 0.479 milliseconds (cumulative count 72881842)
98.438% <= 0.519 milliseconds (cumulative count 73863507)
99.219% <= 0.567 milliseconds (cumulative count 74452328)
99.609% <= 0.623 milliseconds (cumulative count 74732718)
99.805% <= 0.679 milliseconds (cumulative count 74860330)
99.902% <= 0.743 milliseconds (cumulative count 74927923)
99.951% <= 1.463 milliseconds (cumulative count 74963383)
99.976% <= 1.679 milliseconds (cumulative count 74982290)
99.988% <= 1.767 milliseconds (cumulative count 74990863)
99.994% <= 1.855 milliseconds (cumulative count 74995514)
99.997% <= 2.095 milliseconds (cumulative count 74997713)
99.998% <= 3.207 milliseconds (cumulative count 74998870)
99.999% <= 4.007 milliseconds (cumulative count 74999445)
100.000% <= 4.167 milliseconds (cumulative count 74999720)
100.000% <= 6.231 milliseconds (cumulative count 74999861)
100.000% <= 12.399 milliseconds (cumulative count 74999929)
100.000% <= 23.487 milliseconds (cumulative count 74999965)
100.000% <= 23.583 milliseconds (cumulative count 74999985)
100.000% <= 23.615 milliseconds (cumulative count 74999992)
100.000% <= 23.647 milliseconds (cumulative count 74999998)
100.000% <= 23.663 milliseconds (cumulative count 74999999)
100.000% <= 23.727 milliseconds (cumulative count 75000000)
100.000% <= 23.727 milliseconds (cumulative count 75000000)

Cumulative distribution of latencies:
1.302% <= 0.103 milliseconds (cumulative count 976737)
47.353% <= 0.207 milliseconds (cumulative count 35515010)
86.106% <= 0.303 milliseconds (cumulative count 64579615)
94.164% <= 0.407 milliseconds (cumulative count 70623085)
98.033% <= 0.503 milliseconds (cumulative count 73524966)
99.566% <= 0.607 milliseconds (cumulative count 74674580)
99.858% <= 0.703 milliseconds (cumulative count 74893337)
99.933% <= 0.807 milliseconds (cumulative count 74949875)
99.940% <= 0.903 milliseconds (cumulative count 74954650)
99.942% <= 1.007 milliseconds (cumulative count 74956149)
99.945% <= 1.103 milliseconds (cumulative count 74959100)
99.948% <= 1.207 milliseconds (cumulative count 74961313)
99.949% <= 1.303 milliseconds (cumulative count 74961666)
99.949% <= 1.407 milliseconds (cumulative count 74961944)
99.954% <= 1.503 milliseconds (cumulative count 74965572)
99.965% <= 1.607 milliseconds (cumulative count 74974115)
99.980% <= 1.703 milliseconds (cumulative count 74984978)
99.991% <= 1.807 milliseconds (cumulative count 74993477)
99.995% <= 1.903 milliseconds (cumulative count 74996586)
99.997% <= 2.007 milliseconds (cumulative count 74997458)
99.997% <= 2.103 milliseconds (cumulative count 74997746)
99.998% <= 3.103 milliseconds (cumulative count 74998598)
100.000% <= 4.103 milliseconds (cumulative count 74999649)
100.000% <= 5.103 milliseconds (cumulative count 74999850)
100.000% <= 7.103 milliseconds (cumulative count 74999900)
100.000% <= 13.103 milliseconds (cumulative count 74999950)
100.000% <= 24.111 milliseconds (cumulative count 75000000)

Summary:
  throughput summary: 187012.36 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.238     0.048     0.215     0.431     0.551    23.727

// After optimization
====== set key:__rand_int__ __data__ ======
  75000000 requests completed in 380.29 seconds
  50 parallel clients
  45 bytes payload
  keep alive: 1
  host configuration "save":
  host configuration "appendonly": no
  multi-thread: yes
  threads: 2

Latency by percentile distribution:
0.000% <= 0.055 milliseconds (cumulative count 187)
50.000% <= 0.215 milliseconds (cumulative count 40193388)
75.000% <= 0.263 milliseconds (cumulative count 58981653)
87.500% <= 0.287 milliseconds (cumulative count 66221892)
93.750% <= 0.351 milliseconds (cumulative count 70899319)
96.875% <= 0.375 milliseconds (cumulative count 72936000)
98.438% <= 0.399 milliseconds (cumulative count 73850462)
99.219% <= 0.447 milliseconds (cumulative count 74449557)
99.609% <= 0.487 milliseconds (cumulative count 74711787)
99.805% <= 0.535 milliseconds (cumulative count 74864717)
99.902% <= 0.583 milliseconds (cumulative count 74928538)
99.951% <= 0.695 milliseconds (cumulative count 74963970)
99.976% <= 1.479 milliseconds (cumulative count 74981951)
99.988% <= 1.583 milliseconds (cumulative count 74991117)
99.994% <= 1.871 milliseconds (cumulative count 74995436)
99.997% <= 2.303 milliseconds (cumulative count 74997743)
99.998% <= 2.663 milliseconds (cumulative count 74998865)
99.999% <= 3.007 milliseconds (cumulative count 74999428)
100.000% <= 5.239 milliseconds (cumulative count 74999715)
100.000% <= 8.255 milliseconds (cumulative count 74999861)
100.000% <= 11.615 milliseconds (cumulative count 74999930)
100.000% <= 22.591 milliseconds (cumulative count 74999965)
100.000% <= 22.671 milliseconds (cumulative count 74999984)
100.000% <= 22.703 milliseconds (cumulative count 74999995)
100.000% <= 22.719 milliseconds (cumulative count 74999996)
100.000% <= 22.735 milliseconds (cumulative count 74999998)
100.000% <= 22.751 milliseconds (cumulative count 74999999)
100.000% <= 22.799 milliseconds (cumulative count 75000000)
100.000% <= 22.799 milliseconds (cumulative count 75000000)

Cumulative distribution of latencies:
0.756% <= 0.103 milliseconds (cumulative count 567372)
45.669% <= 0.207 milliseconds (cumulative count 34251414)
90.677% <= 0.303 milliseconds (cumulative count 68007486)
98.712% <= 0.407 milliseconds (cumulative count 74033626)
99.713% <= 0.503 milliseconds (cumulative count 74784748)
99.926% <= 0.607 milliseconds (cumulative count 74944588)
99.953% <= 0.703 milliseconds (cumulative count 74964512)
99.955% <= 0.807 milliseconds (cumulative count 74966227)
99.955% <= 0.903 milliseconds (cumulative count 74966420)
99.955% <= 1.007 milliseconds (cumulative count 74966549)
99.956% <= 1.103 milliseconds (cumulative count 74966676)
99.956% <= 1.207 milliseconds (cumulative count 74966834)
99.956% <= 1.303 milliseconds (cumulative count 74967179)
99.968% <= 1.407 milliseconds (cumulative count 74975656)
99.980% <= 1.503 milliseconds (cumulative count 74984656)
99.989% <= 1.607 milliseconds (cumulative count 74992018)
99.992% <= 1.703 milliseconds (cumulative count 74994079)
99.993% <= 1.807 milliseconds (cumulative count 74995066)
99.994% <= 1.903 milliseconds (cumulative count 74995589)
99.995% <= 2.007 milliseconds (cumulative count 74996212)
99.996% <= 2.103 milliseconds (cumulative count 74996775)
99.999% <= 3.103 milliseconds (cumulative count 74999477)
100.000% <= 4.103 milliseconds (cumulative count 74999650)
100.000% <= 5.103 milliseconds (cumulative count 74999705)
100.000% <= 6.103 milliseconds (cumulative count 74999775)
100.000% <= 7.103 milliseconds (cumulative count 74999801)
100.000% <= 8.103 milliseconds (cumulative count 74999826)
100.000% <= 9.103 milliseconds (cumulative count 74999900)
100.000% <= 12.103 milliseconds (cumulative count 74999950)
100.000% <= 23.103 milliseconds (cumulative count 75000000)

Summary:
  throughput summary: 197220.50 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.231     0.048     0.215     0.359     0.423    22.799

@zuiderkwast zuiderkwast merged commit d8315cf into valkey-io:unstable Jan 23, 2026
121 of 123 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Valkey 9.1 Jan 23, 2026
@zuiderkwast zuiderkwast removed the to-be-merged Almost ready to merge label Jan 23, 2026
@roshkhatri

Copy link
Copy Markdown
Member

If we could chat with the github actions bot and give it some very specific instructions, like give it a specific valkey-benchmark command line to run, then it might covering more special scenarios. Maybe. I think it's fine to run special benchmarks manually though.

Yes I think I can create wf that you can interact with, mention the PR and give the configs that we want

@zuiderkwast

Copy link
Copy Markdown
Contributor

For some bots, you can post a github comment with some commands it will act on. For example dependabot accepts commands in this way: https://docs.github.com/en/code-security/reference/supply-chain-security/dependabot-pull-request-comment-commands

@chzhoo chzhoo deleted the opt_rehash branch January 24, 2026 04:32
harrylin98 pushed a commit to harrylin98/valkey_forked that referenced this pull request Feb 19, 2026
As the hashtable grows, rehash is triggered. Rehash consumes a
significant amount of CPU time, resulting in noticeable performance
degradation. Much of this CPU time is spent on random memory access
(when accessing hashtable entries). This CPU usage can be optimized
through concurrent memory access.

```
//  The original rehash process:
For each bucket and its child buckets, all entries are processed as follows:
    1) Access the entry, which involves random memory access.
    2) Compute the hash value of the entry.
    3) Based on the hash value, insert the entries into the new hashtable.

//  The optimized rehash process:
The original steps are transformed into batch processing and phased execution, leveraging
the CPU's multi-issue capability to achieve concurrent memory access. The steps are as follows:
    1) Access multiple entries serially within a for loop. Since each entry access is independent,
       modern CPU architectures can perform concurrent memory access.
    2) Compute the hash values for all entries.
    3) Based on the hash values, insert the entries into the new hashtable.
```

---------

Signed-off-by: chzhoo <czawyx@163.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
As the hashtable grows, rehash is triggered. Rehash consumes a
significant amount of CPU time, resulting in noticeable performance
degradation. Much of this CPU time is spent on random memory access
(when accessing hashtable entries). This CPU usage can be optimized
through concurrent memory access.

```
//  The original rehash process:
For each bucket and its child buckets, all entries are processed as follows:
    1) Access the entry, which involves random memory access.
    2) Compute the hash value of the entry.
    3) Based on the hash value, insert the entries into the new hashtable.

//  The optimized rehash process:
The original steps are transformed into batch processing and phased execution, leveraging
the CPU's multi-issue capability to achieve concurrent memory access. The steps are as follows:
    1) Access multiple entries serially within a for loop. Since each entry access is independent,
       modern CPU architectures can perform concurrent memory access.
    2) Compute the hash values for all entries.
    3) Based on the hash values, insert the entries into the new hashtable.
```

---------

Signed-off-by: chzhoo <czawyx@163.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants