Skip to content

Improve multithreaded performance with memory prefetching#14017

Merged
ShooterIT merged 16 commits into
redis:unstablefrom
ShooterIT:memory-prefetch
Jun 5, 2025
Merged

Improve multithreaded performance with memory prefetching#14017
ShooterIT merged 16 commits into
redis:unstablefrom
ShooterIT:memory-prefetch

Conversation

@ShooterIT

@ShooterIT ShooterIT commented May 7, 2025

Copy link
Copy Markdown
Member

This PR is based on: valkey-io/valkey#861

Memory Access Amortization

(Designed and implemented by dan touitou)

Memory Access Amortization (MAA) is a technique designed to optimize the performance of dynamic data structures by reducing the impact of memory access latency. It is applicable when multiple operations need to be executed concurrently. The principle behind it is that for certain dynamic data structures, executing operations in a batch is more efficient than executing each one separately.

Rather than executing operations sequentially, this approach interleaves the execution of all operations. This is done in such a way that whenever a memory access is required during an operation, the program prefetches the necessary memory and transitions to another operation. This ensures that when one operation is blocked awaiting memory access, other memory accesses are executed in parallel, thereby reducing the average access latency.

We applied this method in the development of dictPrefetch, which takes as parameters a vector of keys and dictionaries. It ensures that all memory addresses required to execute dictionary operations for these keys are loaded into the L1-L3 caches when executing commands. Essentially, dictPrefetch is an interleaved execution of dictFind for all the keys.

Implementation of Redis

When the main thread processes clients with ready-to-execute commands (i.e., clients for which the IO thread has parsed the commands), a batch of up to 16 commands is created. Initially, the command's argv, which were allocated by the IO thread, is prefetched to the main thread's L1 cache. Subsequently, all the dict entries and values required for the commands are prefetched from the dictionary before the command execution.

Memory prefetching for main hash table

As shown in the picture, after #13806 , we unify key value and the dict uses no_value optimization, so the memory prefetching has 4 steps:

  1. prefetch the bucket of the hash table
  2. prefetch the entry associated with the given key's hash
  3. prefetch the kv object of the entry
  4. prefetch the value data of the kv object

we also need to handle the case that the dict entry is the pointer of kv object, just skip step 3.

memory prefetching

MAA can improves single-threaded memory access efficiency by interleaving the execution of multiple independent operations, allowing memory-level parallelism and better CPU utilization. Its key point is batch-wise interleaved execution. Split a batch of independent operations (such as multiple key lookups) into multiple state machines, and interleave their progress within a single thread to hide the memory access latency of individual requests.

The difference between serial execution and interleaved execution:
naive serial execution

key1: step1 → wait → step2 → wait → done
key2: step1 → wait → step2 → wait → done

interleaved execution

key1: step1   → step2   → done
key2:   step1 → step2   → done
key3:     step1 → step2 → done
         ↑ While waiting for key1’s memory, progress key2/key3

New configuration

This PR involves a new configuration prefetch-batch-max-size, but we think it is a low level optimization, so we hide this config:
When multiple commands are parsed by the I/O threads and ready for execution, we take advantage of knowing the next set of commands and prefetch their required dictionary entries in a batch. This reduces memory access costs. The optimal batch size depends on the specific workflow of the user. The default batch size is 16, which can be modified using the 'prefetch-batch-max-size' config.
When the config is set to 0, prefetching is disabled.


Co-authored-by: Uri Yagelnik uriy@amazon.com

@snyk-io

snyk-io Bot commented May 7, 2025

Copy link
Copy Markdown

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@fcostaoliveira fcostaoliveira added the action:run-benchmark Triggers the benchmark suite for this Pull Request label May 7, 2025
@fcostaoliveira

fcostaoliveira commented May 7, 2025

Copy link
Copy Markdown
Collaborator

CE Performance Automation : step 1 of 2 (build) DONE.

This comment was automatically generated given a benchmark was triggered.
Started building at 2025-05-29 06:56:48.008438 and took 65 seconds.
You can check each build/benchmark progress in grafana:

  • git hash: 581ade8
  • git branch: ShooterIT:memory-prefetch
  • commit date and time: n/a
  • commit summary: n/a
  • test filters:
    • command priority lower limit: 0
    • command priority upper limit: 10000
    • test name regex: .*
    • command group regex: .*

You can check a comparison in detail via the grafana link

@fcostaoliveira

fcostaoliveira commented May 7, 2025

Copy link
Copy Markdown
Collaborator

CE Performance Automation : step 2 of 2 (benchmark) FINISHED.

This comment was automatically generated given a benchmark was triggered.

Started benchmark suite at 2025-06-20 22:24:03.546479 and took 11299.83093 seconds to finish.
Status: [################################################################################] 100.0% completed.

In total will run 176 benchmarks.
- 0 pending.
- 176 completed:
- 0 successful.
- 176 failed.
You can check a the status in detail via the grafana link

@fcostaoliveira

fcostaoliveira commented May 7, 2025

Copy link
Copy Markdown
Collaborator

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

Using platform named: intel64-ubuntu22.04-redis-icx1 to do the comparison.

In summary:

  • Detected a total of 181 stable tests between versions.
  • Detected a total of 1 highly unstable benchmarks.
  • Detected a total of 1 improvements above the improvement water line.

You can check a comparison in detail via the grafana link

Comparison between unstable and ShooterIT:memory-prefetch.

Time Period from 5 months ago. (environment used: oss-standalone)

By GROUP change csv:

command_group,min_change,max_change

By COMMAND change csv:

command,min_change,max_change

#### Unstable Table
Test Case Baseline redis/redis unstable (median obs. +- std.dev) Comparison redis/redis ShooterIT:memory-prefetch (median obs. +- std.dev) % change (higher-better) Note
memtier_benchmark-1Mkeys-load-zset-listpack-with-100-elements-double-score 2930 2942 +- 21.6% UNSTABLE (7 datapoints) 0.4% UNSTABLE (very high variance) No Change

Unstable test regexp names: memtier_benchmark-1Mkeys-load-zset-listpack-with-100-elements-double-score

Improvements Table

Test Case Baseline redis/redis unstable (median obs. +- std.dev) Comparison redis/redis ShooterIT:memory-prefetch (median obs. +- std.dev) % change (higher-better) Note
memtier_benchmark-1key-zset-1M-elements-zremrangebyscore-pipeline-10 318449 356557 +- 3.3% (9 datapoints) 12.0% IMPROVEMENT

Improvements test regexp names: memtier_benchmark-1key-zset-1M-elements-zremrangebyscore-pipeline-10

Full Results table:
Test Case Baseline redis/redis unstable (median obs. +- std.dev) Comparison redis/redis ShooterIT:memory-prefetch (median obs. +- std.dev) % change (higher-better) Note
memtier_benchmark-100Kkeys-hash-hgetall-50-fields-100B-values 169227.0 169243 +- 0.6% (10 datapoints) 0.0% No Change
memtier_benchmark-100Kkeys-load-hash-20-fields-with-1B-values-pipeline-30 35875.0 36758 +- 0.6% (10 datapoints) 2.5% No Change
memtier_benchmark-100Kkeys-load-hash-50-fields-with-1000B-values 18997.0 19008 +- 0.7% (10 datapoints) 0.1% No Change
memtier_benchmark-100Kkeys-load-hash-50-fields-with-100B-values 47014.0 46988 +- 0.7% (10 datapoints) -0.1% No Change
memtier_benchmark-100Kkeys-load-hash-50-fields-with-10B-values 35972.0 36748 +- 0.5% (10 datapoints) 2.2% No Change
memtier_benchmark-10Kkeys-load-hash-50-fields-with-10000B-values 2927.0 2895 +- 0.5% (10 datapoints) -1.1% No Change
memtier_benchmark-10Kkeys-load-list-with-10B-values-pipeline-50 953330.0 946436 +- 2.7% (9 datapoints) -0.7% No Change
memtier_benchmark-10Mkeys-load-hash-5-fields-with-100B-values 106248.0 106408 +- 0.5% (10 datapoints) 0.2% No Change
memtier_benchmark-10Mkeys-load-hash-5-fields-with-100B-values-pipeline-10 314861.0 327669 +- 1.4% (10 datapoints) 4.1% potential IMPROVEMENT
memtier_benchmark-10Mkeys-load-hash-5-fields-with-10B-values 121627.0 120906 +- 0.8% (10 datapoints) -0.6% No Change
memtier_benchmark-10Mkeys-load-hash-5-fields-with-10B-values-pipeline-10 404527.0 396269 +- 0.7% (10 datapoints) -2.0% No Change
memtier_benchmark-10Mkeys-string-get-10B-pipeline-100-nokeyprefix 2542026.0 2540140 +- 1.9% (9 datapoints) -0.1% No Change
memtier_benchmark-1Mkeys-100B-expire-use-case 176106.0 175375 +- 0.5% (9 datapoints) -0.4% No Change
memtier_benchmark-1Mkeys-10B-expire-use-case 175672.0 174861 +- 0.4% (9 datapoints) -0.5% No Change
memtier_benchmark-1Mkeys-10B-psetex-expire-use-case 165302.0 164897 +- 0.6% (9 datapoints) -0.2% No Change
memtier_benchmark-1Mkeys-10B-setex-expire-use-case 164247.0 166140 +- 0.8% (9 datapoints) 1.2% No Change
memtier_benchmark-1Mkeys-1KiB-expire-use-case 171398.0 171728 +- 0.4% (9 datapoints) 0.2% No Change
memtier_benchmark-1Mkeys-4KiB-expire-use-case 165178.0 164889 +- 0.5% (9 datapoints) -0.2% No Change
memtier_benchmark-1Mkeys-bitmap-getbit-pipeline-10 974114.0 977557 +- 0.7% (7 datapoints) 0.4% No Change
memtier_benchmark-1Mkeys-generic-exists-pipeline-10 1035609.0 1027292 +- 0.4% (9 datapoints) -0.8% No Change
memtier_benchmark-1Mkeys-generic-expire-pipeline-10 948075.0 943936 +- 0.5% (7 datapoints) -0.4% No Change
memtier_benchmark-1Mkeys-generic-expireat-pipeline-10 934023.0 926950 +- 0.7% (9 datapoints) -0.8% No Change
memtier_benchmark-1Mkeys-generic-pexpire-pipeline-10 942750.0 941672 +- 0.6% (10 datapoints) -0.1% No Change
memtier_benchmark-1Mkeys-generic-scan-pipeline-10 512366.0 474736 +- 3.9% (10 datapoints) -7.3% potential REGRESSION
memtier_benchmark-1Mkeys-generic-touch-pipeline-10 1040573.0 1039359 +- 0.5% (10 datapoints) -0.1% No Change
memtier_benchmark-1Mkeys-generic-ttl-pipeline-10 1024572.0 1015699 +- 0.5% (9 datapoints) -0.9% No Change
memtier_benchmark-1Mkeys-hash-hexists 163692.0 162983 +- 0.8% (10 datapoints) -0.4% No Change
memtier_benchmark-1Mkeys-hash-hget-hgetall-hkeys-hvals-with-100B-values 183275.0 182639 +- 1.2% (10 datapoints) -0.3% No Change
memtier_benchmark-1Mkeys-hash-hgetall-50-fields-10B-values 177009.0 176397 +- 0.6% (10 datapoints) -0.3% No Change
memtier_benchmark-1Mkeys-hash-hincrby 179066.0 178969 +- 0.4% (9 datapoints) -0.1% No Change
memtier_benchmark-1Mkeys-hash-hincrbyfloat 161351.0 159261 +- 0.7% (9 datapoints) -1.3% No Change
memtier_benchmark-1Mkeys-hash-hmget-5-fields-with-100B-values-pipeline-10 701814.0 699677 +- 0.7% (10 datapoints) -0.3% No Change
memtier_benchmark-1Mkeys-hash-transactions-multi-exec-pipeline-20 1046498.0 1037448 +- 0.4% (10 datapoints) -0.9% No Change
memtier_benchmark-1Mkeys-list-lpop-rpop-with-100B-values 180689.0 183812 +- 1.2% (10 datapoints) 1.7% No Change
memtier_benchmark-1Mkeys-list-lpop-rpop-with-10B-values 181716.0 184996 +- 1.4% (10 datapoints) 1.8% No Change
memtier_benchmark-1Mkeys-list-lpop-rpop-with-1KiB-values 181169.0 182867 +- 1.3% (10 datapoints) 0.9% No Change
memtier_benchmark-1Mkeys-list-rpoplpush-with-10B-values 180765.0 182532 +- 0.6% (10 datapoints) 1.0% No Change
memtier_benchmark-1Mkeys-load-hash-5-fields-with-1000B-values 98183.0 97606 +- 0.6% (10 datapoints) -0.6% No Change
memtier_benchmark-1Mkeys-load-hash-5-fields-with-1000B-values-pipeline-10 161575.0 161493 +- 1.3% (10 datapoints) -0.1% No Change
memtier_benchmark-1Mkeys-load-hash-hmset-5-fields-with-1000B-values 109493.0 109756 +- 0.7% (7 datapoints) 0.2% No Change
memtier_benchmark-1Mkeys-load-list-rpush-with-10B-values 166656.0 166781 +- 0.9% (9 datapoints) 0.1% No Change
memtier_benchmark-1Mkeys-load-list-with-100B-values 151844.0 150939 +- 1.2% (9 datapoints) -0.6% No Change
memtier_benchmark-1Mkeys-load-list-with-10B-values 166790.0 166201 +- 1.2% (9 datapoints) -0.4% No Change
memtier_benchmark-1Mkeys-load-list-with-10B-values-pipeline-10 714645.0 707643 +- 0.9% (9 datapoints) -1.0% No Change
memtier_benchmark-1Mkeys-load-list-with-1KiB-values 114313.0 113496 +- 0.6% (9 datapoints) -0.7% No Change
memtier_benchmark-1Mkeys-load-set-intset-with-100-elements 67960.0 68151 +- 0.9% (7 datapoints) 0.3% No Change
memtier_benchmark-1Mkeys-load-set-intset-with-100-elements-pipeline-10 103573.0 105042 +- 1.1% (7 datapoints) 1.4% No Change
memtier_benchmark-1Mkeys-load-stream-1-fields-with-100B-values 132199.0 131775 +- 1.0% (10 datapoints) -0.3% No Change
memtier_benchmark-1Mkeys-load-stream-1-fields-with-100B-values-pipeline-10 347245.0 365341 +- 0.6% (10 datapoints) 5.2% potential IMPROVEMENT
memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values 105324.0 106063 +- 0.8% (10 datapoints) 0.7% No Change
memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10 214391.0 220826 +- 0.5% (10 datapoints) 3.0% potential IMPROVEMENT
memtier_benchmark-1Mkeys-load-string-with-100B-values 172635.0 173136 +- 0.7% (7 datapoints) 0.3% No Change
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 691385.0 725529 +- 0.9% (7 datapoints) 4.9% potential IMPROVEMENT
memtier_benchmark-1Mkeys-load-string-with-10B-values 176235.0 176178 +- 0.5% (7 datapoints) -0.0% No Change
memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-10 834009.0 830145 +- 0.8% (7 datapoints) -0.5% No Change
memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-100 1275326.0 1273153 +- 1.7% (7 datapoints) -0.2% No Change
memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-100-nokeyprefix 1438854.0 1427496 +- 2.0% (9 datapoints) -0.8% No Change
memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-50 1170432.0 1161303 +- 1.7% (7 datapoints) -0.8% No Change
memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-500 1299665.0 1296950 +- 2.0% (7 datapoints) -0.2% No Change
memtier_benchmark-1Mkeys-load-string-with-1KiB-values 163601.0 162431 +- 0.4% (7 datapoints) -0.7% No Change
memtier_benchmark-1Mkeys-load-string-with-20KiB-values 65346.0 65384 +- 0.3% (7 datapoints) 0.1% No Change
memtier_benchmark-1Mkeys-load-zset-listpack-with-100-elements-double-score 2930.0 2942 +- 21.6% UNSTABLE (7 datapoints) 0.4% UNSTABLE (very high variance) No Change
memtier_benchmark-1Mkeys-load-zset-with-10-elements-double-score 107825.0 107253 +- 7.7% (7 datapoints) -0.5% No Change
memtier_benchmark-1Mkeys-load-zset-with-10-elements-int-score 120620.0 120015 +- 2.5% (7 datapoints) -0.5% No Change
memtier_benchmark-1Mkeys-string-append-1-100B 152110.0 153458 +- 0.4% (10 datapoints) 0.9% No Change
memtier_benchmark-1Mkeys-string-append-1-100B-pipeline-10 748854.0 745048 +- 0.5% (10 datapoints) -0.5% No Change
memtier_benchmark-1Mkeys-string-decr 161051.0 163143 +- 0.5% (10 datapoints) 1.3% No Change
memtier_benchmark-1Mkeys-string-get-100B 164172.0 164367 +- 0.7% (10 datapoints) 0.1% No Change
memtier_benchmark-1Mkeys-string-get-100B-pipeline-10 946317.0 945671 +- 0.4% (10 datapoints) -0.1% No Change
memtier_benchmark-1Mkeys-string-get-10B 165695.0 165982 +- 0.9% (9 datapoints) 0.2% No Change
memtier_benchmark-1Mkeys-string-get-10B-pipeline-10 945310.0 932953 +- 0.7% (9 datapoints) -1.3% No Change
memtier_benchmark-1Mkeys-string-get-10B-pipeline-100 1822607.0 1800563 +- 0.9% (9 datapoints) -1.2% No Change
memtier_benchmark-1Mkeys-string-get-10B-pipeline-100-nokeyprefix 2542980.0 2540149 +- 0.1% (9 datapoints) -0.1% No Change
memtier_benchmark-1Mkeys-string-get-10B-pipeline-50 1642165.0 1615832 +- 0.9% (9 datapoints) -1.6% No Change
memtier_benchmark-1Mkeys-string-get-10B-pipeline-500 1968828.0 1929243 +- 1.4% (9 datapoints) -2.0% No Change
memtier_benchmark-1Mkeys-string-get-1KiB 163456.0 163049 +- 0.5% (9 datapoints) -0.2% No Change
memtier_benchmark-1Mkeys-string-get-1KiB-pipeline-10 920000.0 918139 +- 0.4% (9 datapoints) -0.2% No Change
memtier_benchmark-1Mkeys-string-incr-pipeline-10 992821.0 989345 +- 1.1% (9 datapoints) -0.4% No Change
memtier_benchmark-1Mkeys-string-incrby 179208.0 179434 +- 0.9% (9 datapoints) 0.1% No Change
memtier_benchmark-1Mkeys-string-incrby-pipeline-10 940222.0 934314 +- 0.5% (9 datapoints) -0.6% No Change
memtier_benchmark-1Mkeys-string-incrbyfloat 154151.0 152873 +- 0.8% (9 datapoints) -0.8% No Change
memtier_benchmark-1Mkeys-string-incrbyfloat-pipeline-10 469376.0 461742 +- 2.3% (9 datapoints) -1.6% No Change
memtier_benchmark-1Mkeys-string-int-encoding-strlen-pipeline-10 1037654.0 1037887 +- 0.8% (10 datapoints) 0.0% No Change
memtier_benchmark-1Mkeys-string-mget-1KiB 121996.0 121529 +- 1.1% (7 datapoints) -0.4% No Change
memtier_benchmark-1Mkeys-string-set-with-ex-100B-pipeline-10 577464.0 562287 +- 2.6% (9 datapoints) -2.6% No Change
memtier_benchmark-1Mkeys-string-setex-100B-pipeline-10 604640.0 598850 +- 1.4% (9 datapoints) -1.0% No Change
memtier_benchmark-1Mkeys-string-setrange-100B 153864.0 154800 +- 0.5% (9 datapoints) 0.6% No Change
memtier_benchmark-1Mkeys-string-setrange-100B-pipeline-10 791265.0 782218 +- 1.2% (9 datapoints) -1.1% No Change
memtier_benchmark-1key-100M-bits-bitmap-bitcount 20900.0 20889 +- 0.1% (7 datapoints) -0.1% No Change
memtier_benchmark-1key-1Billion-bits-bitmap-bitcount 1451.0 1425 +- 0.9% (7 datapoints) -1.8% No Change
memtier_benchmark-1key-geo-2-elements-geopos 159923.0 159708 +- 0.7% (10 datapoints) -0.1% No Change
memtier_benchmark-1key-geo-2-elements-geosearch-fromlonlat-withcoord 97230.0 97048 +- 0.7% (10 datapoints) -0.2% No Change
memtier_benchmark-1key-geo-60M-elements-geodist 181942.0 184102 +- 1.6% (10 datapoints) 1.2% No Change
memtier_benchmark-1key-geo-60M-elements-geodist-pipeline-10 1101431.0 1098160 +- 1.4% (10 datapoints) -0.3% No Change
memtier_benchmark-1key-geo-60M-elements-geohash 185277.0 186237 +- 1.0% (10 datapoints) 0.5% No Change
memtier_benchmark-1key-geo-60M-elements-geohash-pipeline-10 1174321.0 1169698 +- 0.9% (10 datapoints) -0.4% No Change
memtier_benchmark-1key-geo-60M-elements-geopos 183185.0 185475 +- 1.4% (10 datapoints) 1.2% No Change
memtier_benchmark-1key-geo-60M-elements-geopos-pipeline-10 1176318.0 1168696 +- 0.7% (10 datapoints) -0.6% No Change
memtier_benchmark-1key-geo-60M-elements-geosearch-fromlonlat 142147.0 142500 +- 1.5% (10 datapoints) 0.2% No Change
memtier_benchmark-1key-geo-60M-elements-geosearch-fromlonlat-bybox 142958.0 139977 +- 2.1% (10 datapoints) -2.1% No Change
memtier_benchmark-1key-geo-60M-elements-geosearch-fromlonlat-pipeline-10 598674.0 572806 +- 4.9% (10 datapoints) -4.3% potential REGRESSION
memtier_benchmark-1key-hash-1K-fields-hgetall 8672.0 8688 +- 0.9% (9 datapoints) 0.2% No Change
memtier_benchmark-1key-hash-1K-fields-hgetall-pipeline-10 8401.0 8448 +- 2.4% (9 datapoints) 0.6% No Change
memtier_benchmark-1key-hash-hscan-50-fields-10B-values 100605.0 99670 +- 0.3% (10 datapoints) -0.9% No Change
memtier_benchmark-1key-list-10-elements-lrange-all-elements 170828.0 169519 +- 0.5% (9 datapoints) -0.8% No Change
memtier_benchmark-1key-list-10-elements-lrange-all-elements-pipeline-10 687248.0 677272 +- 0.5% (9 datapoints) -1.5% No Change
memtier_benchmark-1key-list-100-elements-int-7bit-uint-lrange-all-elements-pipeline-10 174299.0 170875 +- 1.2% (9 datapoints) -2.0% No Change
memtier_benchmark-1key-list-100-elements-int-lrange-all-elements-pipeline-10 132261.0 131444 +- 0.9% (9 datapoints) -0.6% No Change
memtier_benchmark-1key-list-100-elements-llen-pipeline-10 1132424.0 1135016 +- 0.7% (9 datapoints) 0.2% No Change
memtier_benchmark-1key-list-100-elements-lrange-all-elements 105604.0 103884 +- 0.8% (9 datapoints) -1.6% No Change
memtier_benchmark-1key-list-100-elements-lrange-all-elements-pipeline-10 171477.0 167415 +- 1.1% (9 datapoints) -2.4% No Change
memtier_benchmark-1key-list-10K-elements-lindex-integer 149803.0 149154 +- 0.5% (9 datapoints) -0.4% No Change
memtier_benchmark-1key-list-10K-elements-lindex-string 127892.0 128713 +- 0.6% (9 datapoints) 0.6% No Change
memtier_benchmark-1key-list-10K-elements-lindex-string-pipeline-10 291428.0 291799 +- 0.6% (9 datapoints) 0.1% No Change
memtier_benchmark-1key-list-10K-elements-linsert-lrem-integer 6550.0 6506 +- 0.4% (7 datapoints) -0.7% No Change
memtier_benchmark-1key-list-10K-elements-linsert-lrem-string 8559.0 8538 +- 0.5% (7 datapoints) -0.2% No Change
memtier_benchmark-1key-list-10K-elements-lpos-integer 6503.0 6478 +- 0.8% (7 datapoints) -0.4% No Change
memtier_benchmark-1key-list-10K-elements-lpos-string 8145.0 8137 +- 0.6% (7 datapoints) -0.1% No Change
memtier_benchmark-1key-list-1K-elements-lrange-all-elements 18039.0 17813 +- 0.4% (7 datapoints) -1.3% No Change
memtier_benchmark-1key-list-1K-elements-lrange-all-elements-pipeline-10 17498.0 17732 +- 1.9% (7 datapoints) 1.3% No Change
memtier_benchmark-1key-list-2K-elements-quicklist-lrange-all-elements-longs 7060.0 6985 +- 0.8% (7 datapoints) -1.1% No Change
memtier_benchmark-1key-load-hash-1K-fields-with-5B-values 4339.0 4349 +- 1.4% (9 datapoints) 0.2% No Change
memtier_benchmark-1key-load-zset-with-5-elements-parsing-float-score 149016.0 150158 +- 4.9% (7 datapoints) 0.8% No Change
memtier_benchmark-1key-load-zset-with-5-elements-parsing-hexa-score 127968.0 127314 +- 3.4% (7 datapoints) -0.5% No Change
memtier_benchmark-1key-pfadd-4KB-values-pipeline-10 266322.0 266786 +- 0.5% (10 datapoints) 0.2% No Change
memtier_benchmark-1key-set-10-elements-smembers 171987.0 170719 +- 1.1% (7 datapoints) -0.7% No Change
memtier_benchmark-1key-set-10-elements-smembers-pipeline-10 701393.0 698772 +- 0.4% (7 datapoints) -0.4% No Change
memtier_benchmark-1key-set-10-elements-smismember 180206.0 178550 +- 0.8% (10 datapoints) -0.9% No Change
memtier_benchmark-1key-set-100-elements-sismember-is-a-member 155310.0 155720 +- 0.4% (9 datapoints) 0.3% No Change
memtier_benchmark-1key-set-100-elements-sismember-not-a-member 150183.0 151483 +- 0.4% (9 datapoints) 0.9% No Change
memtier_benchmark-1key-set-100-elements-smembers 99652.0 98742 +- 0.1% (7 datapoints) -0.9% No Change
memtier_benchmark-1key-set-100-elements-smismember 162973.0 163319 +- 0.8% (10 datapoints) 0.2% No Change
memtier_benchmark-1key-set-100-elements-sscan 97162.0 96544 +- 0.6% (7 datapoints) -0.6% No Change
memtier_benchmark-1key-set-10M-elements-sismember-50pct-chance 157025.0 157421 +- 0.4% (9 datapoints) 0.3% No Change
memtier_benchmark-1key-set-10M-elements-srem-50pct-chance 157744.0 157610 +- 0.3% (9 datapoints) -0.1% No Change
memtier_benchmark-1key-set-1K-elements-smembers 16134.0 15536 +- 1.0% (7 datapoints) -3.7% potential REGRESSION
memtier_benchmark-1key-set-1M-elements-sismember-50pct-chance 159187.0 158535 +- 0.5% (9 datapoints) -0.4% No Change
memtier_benchmark-1key-set-200K-elements-sadd-constant 180989.0 179021 +- 0.6% (7 datapoints) -1.1% No Change
memtier_benchmark-1key-set-2M-elements-sadd-increasing 163031.0 162085 +- 0.5% (7 datapoints) -0.6% No Change
memtier_benchmark-1key-zincrby-1M-elements-pipeline-1 42927.0 43280 +- 0.7% (10 datapoints) 0.8% No Change
memtier_benchmark-1key-zrank-100K-elements-pipeline-1 46771.0 46858 +- 0.5% (10 datapoints) 0.2% No Change
memtier_benchmark-1key-zrank-10M-elements-pipeline-1 44133.0 44035 +- 0.8% (10 datapoints) -0.2% No Change
memtier_benchmark-1key-zrank-1M-elements-pipeline-1 46553.0 46020 +- 0.5% (10 datapoints) -1.1% No Change
memtier_benchmark-1key-zrem-5M-elements-pipeline-1 45091.0 45651 +- 0.8% (7 datapoints) 1.2% No Change
memtier_benchmark-1key-zrevrangebyscore-256K-elements-pipeline-1 98881.0 98934 +- 0.7% (7 datapoints) 0.1% No Change
memtier_benchmark-1key-zrevrangebyscore-256K-elements-pipeline-10 151732.0 151100 +- 1.0% (7 datapoints) -0.4% No Change
memtier_benchmark-1key-zrevrank-1M-elements-pipeline-1 45864.0 46054 +- 0.7% (10 datapoints) 0.4% No Change
memtier_benchmark-1key-zset-10-elements-zrange-all-elements 93904.0 90018 +- 4.9% (10 datapoints) -4.1% potential REGRESSION
memtier_benchmark-1key-zset-10-elements-zrange-all-elements-long-scores 122666.0 122885 +- 0.7% (10 datapoints) 0.2% No Change
memtier_benchmark-1key-zset-100-elements-zrange-all-elements 23637.0 22033 +- 9.3% (10 datapoints) -6.8% potential REGRESSION
memtier_benchmark-1key-zset-100-elements-zrangebyscore-all-elements 23190.0 23385 +- 8.5% (9 datapoints) 0.8% No Change
memtier_benchmark-1key-zset-100-elements-zrangebyscore-all-elements-long-scores 52539.0 52984 +- 1.0% (9 datapoints) 0.8% No Change
memtier_benchmark-1key-zset-100-elements-zscan 68886.0 68145 +- 0.5% (10 datapoints) -1.1% No Change
memtier_benchmark-1key-zset-1K-elements-zrange-all-elements 3727.0 3722 +- 0.6% (10 datapoints) -0.1% No Change
memtier_benchmark-1key-zset-1M-elements-zcard-pipeline-10 1036175.0 1028839 +- 0.4% (9 datapoints) -0.7% No Change
memtier_benchmark-1key-zset-1M-elements-zremrangebyscore-pipeline-10 318449.0 356557 +- 3.3% (9 datapoints) 12.0% IMPROVEMENT
memtier_benchmark-1key-zset-1M-elements-zrevrange-5-elements 151234.0 151289 +- 1.2% (9 datapoints) 0.0% No Change
memtier_benchmark-1key-zset-1M-elements-zrevrange-withscores-5-elements-pipeline-10 552637.0 554746 +- 3.0% (9 datapoints) 0.4% No Change
memtier_benchmark-1key-zset-1M-elements-zscore-pipeline-10 957960.0 962362 +- 0.6% (7 datapoints) 0.5% No Change
memtier_benchmark-1key-zset-600K-elements-zrangestore-1K-elements 2298.0 2279 +- 0.7% (7 datapoints) -0.8% No Change
memtier_benchmark-1key-zset-600K-elements-zrangestore-300K-elements 6.5 6.5 +- 0.5% (7 datapoints) -0.6% No Change
memtier_benchmark-2keys-lua-eval-hset-expire 87907.0 87694 +- 0.9% (7 datapoints) -0.2% No Change
memtier_benchmark-2keys-lua-evalsha-hset-expire 104150.0 103217 +- 0.7% (7 datapoints) -0.9% No Change
memtier_benchmark-2keys-set-10-100-elements-sdiff 27488.0 27499 +- 1.3% (10 datapoints) 0.0% No Change
memtier_benchmark-2keys-set-10-100-elements-sinter 90412.0 90133 +- 1.0% (10 datapoints) -0.3% No Change
memtier_benchmark-2keys-set-10-100-elements-sunion 36534.0 36447 +- 0.8% (10 datapoints) -0.2% No Change
memtier_benchmark-2keys-stream-5-entries-xread-all-entries 78148.0 78504 +- 1.3% (10 datapoints) 0.5% No Change
memtier_benchmark-2keys-stream-5-entries-xread-all-entries-pipeline-10 125498.0 125633 +- 2.1% (10 datapoints) 0.1% No Change
memtier_benchmark-2keys-zset-300-elements-skiplist-encoded-zunion 3564.0 3565 +- 1.2% (10 datapoints) 0.0% No Change
memtier_benchmark-2keys-zset-300-elements-skiplist-encoded-zunionstore 4118.0 4164 +- 1.5% (10 datapoints) 1.1% No Change
memtier_benchmark-3Mkeys-load-string-with-512B-values 173260.0 171297 +- 3.5% (18 datapoints) -1.1% No Change
memtier_benchmark-3Mkeys-load-string-with-512B-values-pipeline-10 677653.0 679318 +- 7.8% (18 datapoints) 0.2% No Change
memtier_benchmark-3Mkeys-string-get-with-1KiB-values-pipeline-10-2000_conns 146872.0 147224 +- 0.9% (7 datapoints) 0.2% No Change
memtier_benchmark-3Mkeys-string-get-with-1KiB-values-pipeline-10-400_conns 166356.0 165606 +- 2.2% (18 datapoints) -0.5% No Change
memtier_benchmark-3Mkeys-string-get-with-1KiB-values-pipeline-10-40_conns 150089.0 151121 +- 0.2% (7 datapoints) 0.7% No Change
memtier_benchmark-3Mkeys-string-mixed-20-80-with-512B-values-pipeline-10-2000_conns 150063.0 152242 +- 0.4% (7 datapoints) 1.5% No Change
memtier_benchmark-3Mkeys-string-mixed-20-80-with-512B-values-pipeline-10-400_conns 170797.0 169299 +- 2.3% (18 datapoints) -0.9% No Change
memtier_benchmark-3Mkeys-string-mixed-20-80-with-512B-values-pipeline-10-5200_conns 98306.0 98081 +- 0.4% (7 datapoints) -0.2% No Change
memtier_benchmark-connection-hello 143345.0 142399 +- 0.5% (9 datapoints) -0.7% No Change
memtier_benchmark-connection-hello-pipeline-10 508935.0 503798 +- 0.5% (9 datapoints) -1.0% No Change
memtier_benchmark-nokeys-connection-ping-pipeline-10 1366624.0 1375880 +- 0.6% (7 datapoints) 0.7% No Change
memtier_benchmark-nokeys-pubsub-publish-1K-channels-10B-no-subscribers 1020298.0 1016277 +- 0.4% (7 datapoints) -0.4% No Change
memtier_benchmark-nokeys-server-time-pipeline-10 1191590.0 1192801 +- 0.2% (7 datapoints) 0.1% No Change

WARNING: There were 150 benchmarks with NO datapoints for both baseline and comparison.

NO datapoints for both baseline and comparison:

NO DATAPOINTS test regexp names: latency-rate-limited-10000_qps-memtier_benchmark-100Kkeys-hash-hgetall-50-fields-100B-values|latency-rate-limited-10000_qps-memtier_benchmark-100Kkeys-load-hash-50-fields-with-1000B-values|latency-rate-limited-10000_qps-memtier_benchmark-100Kkeys-load-hash-50-fields-with-100B-values|latency-rate-limited-10000_qps-memtier_benchmark-100Kkeys-load-hash-50-fields-with-10B-values|latency-rate-limited-10000_qps-memtier_benchmark-10Mkeys-load-hash-5-fields-with-100B-values|latency-rate-limited-10000_qps-memtier_benchmark-10Mkeys-load-hash-5-fields-with-100B-values-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-10Mkeys-load-hash-5-fields-with-10B-values|latency-rate-limited-10000_qps-memtier_benchmark-10Mkeys-load-hash-5-fields-with-10B-values-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-100B-expire-use-case|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-10B-expire-use-case|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-1KiB-expire-use-case|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-4KiB-expire-use-case|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-bitmap-getbit-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-generic-exists-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-generic-expire-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-generic-expireat-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-generic-pexpire-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-generic-scan-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-generic-touch-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-generic-ttl-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-hash-hexists|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-hash-hget-hgetall-hkeys-hvals-with-100B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-hash-hgetall-50-fields-10B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-hash-hincrby|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-hash-hmget-5-fields-with-100B-values-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-hash-transactions-multi-exec-pipeline-20|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-list-lpop-rpop-with-100B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-list-lpop-rpop-with-10B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-list-lpop-rpop-with-1KiB-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-hash-5-fields-with-1000B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-hash-5-fields-with-1000B-values-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-hash-hmset-5-fields-with-1000B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-list-with-100B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-list-with-10B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-list-with-1KiB-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-set-intset-with-100-elements|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-set-intset-with-100-elements-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-stream-1-fields-with-100B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-stream-1-fields-with-100B-values-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-string-with-100B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-string-with-10B-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-string-with-1KiB-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-string-with-20KiB-values|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-zset-with-10-elements-double-score|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-load-zset-with-10-elements-int-score|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-append-1-100B|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-append-1-100B-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-decr|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-get-100B|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-get-100B-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-get-10B|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-get-10B-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-get-1KiB|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-get-1KiB-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-get-20KiB|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-incrby|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-incrby-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-incrbyfloat|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-incrbyfloat-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-mget-1KiB|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-setex-100B-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-setrange-100B|latency-rate-limited-10000_qps-memtier_benchmark-1Mkeys-string-setrange-100B-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-2-elements-geopos|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-2-elements-geosearch-fromlonlat-withcoord|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-60M-elements-geodist|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-60M-elements-geodist-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-60M-elements-geohash|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-60M-elements-geohash-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-60M-elements-geopos|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-60M-elements-geopos-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-60M-elements-geosearch-fromlonlat|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-60M-elements-geosearch-fromlonlat-bybox|latency-rate-limited-10000_qps-memtier_benchmark-1key-geo-60M-elements-geosearch-fromlonlat-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-hash-hscan-50-fields-10B-values|latency-rate-limited-10000_qps-memtier_benchmark-1key-list-10-elements-lrange-all-elements|latency-rate-limited-10000_qps-memtier_benchmark-1key-list-10-elements-lrange-all-elements-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-list-100-elements-lrange-all-elements|latency-rate-limited-10000_qps-memtier_benchmark-1key-list-100-elements-lrange-all-elements-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-list-10K-elements-lindex-integer|latency-rate-limited-10000_qps-memtier_benchmark-1key-list-10K-elements-lindex-string|latency-rate-limited-10000_qps-memtier_benchmark-1key-list-1K-elements-lrange-all-elements|latency-rate-limited-10000_qps-memtier_benchmark-1key-list-1K-elements-lrange-all-elements-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-pfadd-4KB-values-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-10-elements-smembers|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-10-elements-smembers-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-10-elements-smismember|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-100-elements-sismember-is-a-member|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-100-elements-sismember-not-a-member|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-100-elements-smembers|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-100-elements-smismember|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-100-elements-sscan|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-10M-elements-sismember-50pct-chance|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-1K-elements-smembers|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-1M-elements-sismember-50pct-chance|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-200K-elements-sadd-constant|latency-rate-limited-10000_qps-memtier_benchmark-1key-set-2M-elements-sadd-increasing|latency-rate-limited-10000_qps-memtier_benchmark-1key-zincrby-1M-elements-pipeline-1|latency-rate-limited-10000_qps-memtier_benchmark-1key-zrank-1M-elements-pipeline-1|latency-rate-limited-10000_qps-memtier_benchmark-1key-zrem-5M-elements-pipeline-1|latency-rate-limited-10000_qps-memtier_benchmark-1key-zrevrangebyscore-256K-elements-pipeline-1|latency-rate-limited-10000_qps-memtier_benchmark-1key-zrevrank-1M-elements-pipeline-1|latency-rate-limited-10000_qps-memtier_benchmark-1key-zset-10-elements-zrange-all-elements|latency-rate-limited-10000_qps-memtier_benchmark-1key-zset-10-elements-zrange-all-elements-long-scores|latency-rate-limited-10000_qps-memtier_benchmark-1key-zset-100-elements-zrange-all-elements|latency-rate-limited-10000_qps-memtier_benchmark-1key-zset-100-elements-zrangebyscore-all-elements|latency-rate-limited-10000_qps-memtier_benchmark-1key-zset-100-elements-zrangebyscore-all-elements-long-scores|latency-rate-limited-10000_qps-memtier_benchmark-1key-zset-100-elements-zscan|latency-rate-limited-10000_qps-memtier_benchmark-1key-zset-1M-elements-zcard-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-1key-zset-1M-elements-zrevrange-5-elements|latency-rate-limited-10000_qps-memtier_benchmark-1key-zset-1M-elements-zscore-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-2keys-lua-eval-hset-expire|latency-rate-limited-10000_qps-memtier_benchmark-2keys-lua-evalsha-hset-expire|latency-rate-limited-10000_qps-memtier_benchmark-2keys-set-10-100-elements-sdiff|latency-rate-limited-10000_qps-memtier_benchmark-2keys-set-10-100-elements-sinter|latency-rate-limited-10000_qps-memtier_benchmark-2keys-set-10-100-elements-sunion|latency-rate-limited-10000_qps-memtier_benchmark-2keys-stream-5-entries-xread-all-entries|latency-rate-limited-10000_qps-memtier_benchmark-2keys-stream-5-entries-xread-all-entries-pipeline-10|latency-rate-limited-10000_qps-memtier_benchmark-3Mkeys-load-string-with-512B-values|latency-rate-limited-10000_qps-memtier_benchmark-connection-hello|latency-rate-limited-1000_qps-memtier_benchmark-10Kkeys-load-hash-50-fields-with-10000B-values|latency-rate-limited-1000_qps-memtier_benchmark-1Mkeys-load-zset-listpack-with-100-elements-double-score|latency-rate-limited-1000_qps-memtier_benchmark-1key-100M-bits-bitmap-bitcount|latency-rate-limited-1000_qps-memtier_benchmark-1key-list-10K-elements-linsert-lrem-integer|latency-rate-limited-1000_qps-memtier_benchmark-1key-list-10K-elements-linsert-lrem-string|latency-rate-limited-1000_qps-memtier_benchmark-1key-list-10K-elements-lpos-integer|latency-rate-limited-1000_qps-memtier_benchmark-1key-list-10K-elements-lpos-string|latency-rate-limited-1000_qps-memtier_benchmark-1key-list-2K-elements-quicklist-lrange-all-elements-longs|latency-rate-limited-1000_qps-memtier_benchmark-1key-zset-1K-elements-zrange-all-elements|latency-rate-limited-1000_qps-memtier_benchmark-2keys-zset-300-elements-skiplist-encoded-zunion|latency-rate-limited-1000_qps-memtier_benchmark-2keys-zset-300-elements-skiplist-encoded-zunionstore|latency-rate-limited-100_qps-memtier_benchmark-1key-1Billion-bits-bitmap-bitcount|memtier_benchmark-10Mkeys-load-hash-5-fields-with-1000B-values|memtier_benchmark-10Mkeys-load-hash-5-fields-with-1000B-values-pipeline-10|memtier_benchmark-1Mkeys-lhash-hexists|memtier_benchmark-1Mkeys-lhash-hincbry|memtier_benchmark-1Mkeys-load-string-with-200KiB-values|memtier_benchmark-1Mkeys-load-string-with-2MB-values|memtier_benchmark-1Mkeys-string-get-200KiB|memtier_benchmark-1Mkeys-string-get-20KiB|memtier_benchmark-1Mkeys-string-get-2MB|redis-benchmark-full-suite-1Mkeys-100B|redis-benchmark-full-suite-1Mkeys-100B-pipeline-10|redis-benchmark-full-suite-1Mkeys-1KiB|redis-benchmark-full-suite-1Mkeys-1KiB-pipeline-10|vector_db_benchmark_test

@ShooterIT ShooterIT marked this pull request as draft May 9, 2025 02:29
@ShooterIT ShooterIT marked this pull request as ready for review May 13, 2025 07:59
@ShooterIT ShooterIT added the release-notes indication that this issue needs to be mentioned in the release notes label May 14, 2025
@ShooterIT ShooterIT requested a review from sundb May 15, 2025 02:28
Comment thread src/dict.h
@ShooterIT ShooterIT requested a review from moticless May 16, 2025 03:00
Comment thread src/iothread.c
@moticless

Copy link
Copy Markdown
Collaborator

@ShooterIT , do we understand the degradation in "Automated performance analysis summary". Is it real issue?

@ShooterIT

Copy link
Copy Markdown
Member Author

Hi @moticless it is not real issue, the benchmark doesn't test based on multi-threaded. For single thread, we don't apply this feature now.

@ShooterIT

Copy link
Copy Markdown
Member Author

To make the optimizations effective, we must merge both #13968 and #14017 at the same time.
Without #13968, arena contention during memory release would mask the performance improvement brought by #14017.
Similarly, without #14017, accessing key size of old objects would cause memory access latency, also diminishing the overall effect.

Here is a record of test results. The machine is m7i.2xlarge, the test command:

memtier_benchmark --ratio 1:0/0:1 -c 20 -t 16 --hide-histogram -d 1000 --key-minimum 1 --key-maximum 3000000 -s <ip> --distinct-client-seed --test-time 60
scenario unstable merge #13968 & #14017
write 386107.06 570700.79
read 425431.19 541029.89

Comment thread src/memory_prefetch.c

@oranagra oranagra left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't really review the code, just took a quick glance

Comment thread src/config.c Outdated
Comment thread src/db.c
Comment on lines +334 to +335
getKeysResult result = GETKEYS_RESULT_INIT;
int numkeys = getKeysFromCommand(cmd, argv, argc, &result);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do that, maybe we can cache the result so it can serve others usages (ACL, Cluster, ROF).
besides, maybe instead of just computing the slot of the first key, it can already check for cross slot, and then the main thread won't have to.
we did that in lookahead.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. we can let IO threads do this and cache the result. This PR is based on valkey-io/valkey#861, valkey hasn’t optimized for this issue, so I kept the original behavior, and i don't want to make it bigger, maybe a new PR is better, as you said, we can cache the key result, even calculate if all keys are in a single slot in the IO thread.

besides, I have tried to cache key result but there is no significant performance improvement, so i think it is not urgent.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no significant performance improvement.. did you check with cluster mode or ACL?
seems odd that we invest if offloading argument parsing, and command lookup, but claim that offloading key name extraction and slot number / cross slot isn't significant.

in any case, i don't mind taking it in a separate PR, arguing this one is about memory prefetching only. the reason i commented was because this change does add a call to getKeysFromCommand, so essentially adding some work that's now done twice.

anyway, if / when we'll combine the lookahead project this this, we'll get it cached anyway.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested without cluster mode or ACL, so i only offload get keys result, here is a flame graph i have save in the previous test with 8 IO threads 8write. It shows getting keys result costs 0.1% CPU of the main thread.
flame graph
If in cluster mode, calculating slot number may cost more CPU.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway, if / when we'll combine the lookahead project this this, we'll get it cached anyway.

since the lookahead project did this, we can have this feature (cache keys result) after merging it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand, 0.1% of the main thread CPU in which scenario? if there's no ACL or cluster, it's not used by the main thread.

in any case, we agree it should be cached, and we agree it can be done later. the only question is if we have some small regression because we can now call it twice (e.g. in cluster mode). i guess we don't care as long as the impact of the PR is still positive and we have a plan to improve later.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scenario is under standalone mode with 8 IO threads, 100% SET command stress test.

yes, do agree

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that main thread calls getKeysFromCommand() inside addCommandToBatch(). Maybe it can be avoided as well once we cache the results.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Keys result can be used

  • memory prefetch
  • slot calculating for cluster/cluster-compatibility
  • ACL check

And as oran said, the IO thread also should check if all keys are in the same slot, instead of only the first key, so the main thread can just report cross-slot error (MULTI-EXEC requires special treatment), so i want to put these works in a separate PR (Valkey also intends to do this, but hasn’t implemented it yet). lookahead project already did this, maybe we just reuse this logic when merging. Besides, as i showed above, the regression of calling getKeysFromCommand is not notable.

Comment thread src/iothread.c
Comment thread src/iothread.c Outdated
Comment thread src/memory_prefetch.c
Comment thread src/memory_prefetch.c Outdated
Comment thread src/memory_prefetch.c Outdated
Comment thread src/memory_prefetch.c Outdated
Comment thread src/memory_prefetch.c Outdated
Comment thread src/memory_prefetch.c Outdated
ShooterIT and others added 2 commits May 29, 2025 12:58
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Comment thread src/config.h Outdated
@ShooterIT ShooterIT requested review from sundb and tezc June 4, 2025 07:51
@ShooterIT ShooterIT merged commit 70a079d into redis:unstable Jun 5, 2025
24 of 26 checks passed
@ShooterIT ShooterIT deleted the memory-prefetch branch June 5, 2025 00:57
ShooterIT added a commit that referenced this pull request Jun 18, 2025
We need to check the command arity in IO threads, if it is not correct,
we should reset it, as we may do memory prefetching according to the
`iolookedcmd`. Accessing `argv` using the key positions returned by
`getKeysFromCommand` is unsafe and must be avoided for invalid commands.

This bug starts to have an impact after #14017
@ShooterIT ShooterIT mentioned this pull request Jan 21, 2026
ShooterIT added a commit that referenced this pull request May 11, 2026
…15133)

Reduce MGET / MSET latency by overlapping the dict-lookup memory accesses
across the keys of a single multi-key command. Builds on the cross-command
batched prefetch framework introduced in #14017 and the dict-prefetch state
machine in `memory_prefetch.c`, and lifts the kvobject-aware bits out of the
state machine into two new `dictType` callbacks so the same machinery can
be reused for other dict-encoded types later (hash hashtable, sets, sorted
sets) without paying for `kvobj`-specific code paths in the core loop.

Bundles the work originally proposed in #14899 (MGET prefetch framework,
by @mpozniak95) and #15043 (MSET batch prefetch).

## Design

Two new optional callbacks on `dictType`:

```c
typedef struct dictType {
    ...
    /* Bring the entry's key payload into cache before keyCompare runs.
     * Returns the address to prefetch, or NULL if the entry alone is enough. */
    void *(*prefetchEntryKey)(const dictEntry *de);

    /* Called only after a key match. Returns the value-side payload to
     * prefetch (or NULL). */
    void *(*prefetchEntryValue)(const dictEntry *de);
} dictType;
```

`dbDictType` registers both. The kv-aware logic — the `dictEntryIsKey()`
shortcut for embedded kvobjs, and `kv->ptr` for `OBJ_STRING` /
`OBJ_ENCODING_RAW` values — now lives in two small helpers in
`server.c`:

```c
static void *dbDictPrefetchEntryKey(const dictEntry *de) {
    return dictEntryIsKey(de) ? NULL : dictGetKey(de);
}

static void *dbDictPrefetchEntryValue(const dictEntry *de) {
    kvobj *kv = dictGetKey(de);
    return (kv->type == OBJ_STRING && kv->encoding == OBJ_ENCODING_RAW)
            ? kv->ptr : NULL;
}
```

The `PrefetchGetValueDataFunc` typedef and the per-call `get_val_data`
parameter on `dictPrefetchKeys()` / `dictPrefetch()` are removed — the
dict's own type drives both ends. This also removes the foot-gun where
callers (like `mgetCommand`) had to remember whether to pass
`prefetchGetObjectValuePtr` or `NULL`. `memory_prefetch.c` no longer
references `kvobj`, `kvobjGetKey`, or any specific value layout.

## State machine

Two file-local types in `memory_prefetch.c`:

| Type | Role |
|---|---|
| `dictPrefetchLookup` | Per-key snapshot of an in-flight,
software-pipelined `dictFind` (mirrors the locals a synchronous
`dictFind` would carry across one bucket walk). |
| `dictPrefetcher` | Driver that advances a batch of
`dictPrefetchLookup`s through the FSM, yielding to the next in-flight
lookup each time a prefetch is issued. |

Five-stage lifecycle for each lookup, driven by the prefetcher:

```text
                                                           │
                                                         start
                                                           │
                                                  ┌────────▼─────────┐
                                       ┌─────────►│  PREFETCH_BUCKET ├────►────────┐
                                       │          └────────┬─────────┘            no more tables
                                       │             bucket│found                  │
                                       │                   │                       │
        entry not found - goto next table         ┌────────▼────────┐              │
                                       └────◄─────┤ PREFETCH_ENTRY  │              ▼
                                    ┌────────────►└────────┬────────┘              │
                                    │                 entry│found                  │
                                    │                      │                       │
                                    │          ┌───────────▼─────────────┐         │
                                    │          │   PREFETCH_ENTRY_KEY    │ ◄── dictType->prefetchEntryKey(de)
                                    │          └───────────┬─────────────┘         │
                                    │                      │                       │
        key mismatch - goto next entry                     │                       │
                                    │          ┌───────────▼─────────────┐         │
                                    └──────◄───│   PREFETCH_ENTRY_VALUE  │ ◄── keyCompare; on match,
                                               └───────────┬─────────────┘     dictType->prefetchEntryValue(de)
                                                           │                       │
                                                 ┌─────────▼─────────────┐         │
                                                 │     PREFETCH_DONE     │◄────────┘
                                                 └───────────────────────┘
```

`PREFETCH_BUCKET` first picks `ht_table[0]`, then flips to `ht_table[1]`
if the dict is mid-rehash, then transitions to `PREFETCH_DONE` if no
more tables remain.

`memory_prefetch.c` exposes a small lifecycle that any caller can drive:

```c
dictPrefetcherInit(p, max_keys);                  /* one-shot heap alloc of lookups[] */
dictPrefetcherReset(p, dicts, keys, nkeys);       /* configure for one batch */
dictPrefetcherRun(p);                             /* drive FSM until all PREFETCH_DONE */
dictPrefetcherFree(p);                            /* release */
```

Each FSM stage is a named static function (`dictPrefetchBucket`,
`dictPrefetchEntry`, `dictPrefetchEntryKey`, `dictPrefetchEntryValue`),
so the `dictPrefetcherRun` driver is a four-line `switch` over the
state.

The state machine is dict-pure: no `kvobj` field on
`dictPrefetchLookup`,
no `kvobjGetKey` reach-through. Round-robin advance semantics — a state
only advances the cursor if a prefetch was actually issued — are
preserved, so the embedded-kvobj fast path
(`dictEntryIsKey(de) == 1` → callback returns NULL) still skips the
extra prefetch and falls straight into the compare on the next loop
iteration.

The cross-command path (`prefetchCommands` / `PrefetchCommandsBatch`)
embeds a `dictPrefetcher` initialized once at startup and reset per
batch, so cross-command prefetching no longer allocates per call.

## Intra-command API

```c
void dictPrefetchKeys(dict **dicts, void **keys, size_t nkeys);
```

A single multi-key command (e.g. MGET) can prefetch dict data for a
batch of its own keys, reusing the same state machine that the
cross-command path uses. Single-key calls (`nkeys <= 1`) early-return —
nothing to interleave with. The implementation stack-allocates a
fixed-size lookup array bounded by `DICT_PREFETCH_MAX_SIZE = 64` (no
VLA, predictable stack usage), so the intra-command path doesn't touch
the heap.

## Notes on the call sites

A shared helper picks the next prefetch batch and warms it via
`dictPrefetchKeys`:

```c
/* Pick the next prefetch batch starting at argv[start] and warm it via
 * dictPrefetchKeys. 'stride' is 1 for keys-only args (MGET) or 2 for
 * key/value pairs (MSET). Returns the chosen batch size in items. */
static int prefetchKeysBatch(client *c, int slot, int start, int stride);
```

Adaptive batch sizing inside the helper: if at least two full batches
(`PREFETCH_BATCH_SIZE * 2 = 32` items) remain, take one batch
(`PREFETCH_BATCH_SIZE = 16`); otherwise take all remaining items in one
call. This generalizes the small-request fast path so the trailing
batch of a large request also gets the single-call benefit.

- **MGET (`mgetCommand`)** — gated by
`do_prefetch = server.prefetch_batch_max_size && !already_prefetched && numkeys > 1`,
with `already_prefetched = c->current_pending_cmd &&
(c->current_pending_cmd->flags & PENDING_CMD_KEYS_PREFETCHED)`.
  When `do_prefetch` is set, each iteration calls
  `prefetchKeysBatch(c, slot, j, 1)` and then sequentially
  `lookupKeyRead`s + replies the chosen batch. When `do_prefetch` is
  clear (cross-command path already warmed the keys, or batch
  prefetching is off), the loop takes all remaining items in one go
  and skips the prefetch.

- **MSET / MSETNX (`msetGenericCommand`)** — same `do_prefetch` gate as
  MGET with `stride = 2`. For the NX flag the NX-check loop runs
  `lookupKeyWrite` (which already warmed everything via
  `prefetchKeysBatch`); the SET loop then disables further prefetch
  (`do_prefetch &&= !nx`) so we don't re-prefetch on the second pass.
  Going through the full state machine (rather than bucket-only) means
  `dbDictType`'s `prefetchEntryValue` callback runs on a key match —
  warming the old kvobj's payload, which `setKey -> dbReplaceValue ->
  updateKeysizesHist(oldlen, newlen)` then reads to compute the
  histogram delta. The slot dict is re-fetched per batch — in cluster
  mode the slot dict can be freed mid-MSET (`KVSTORE_FREE_EMPTY_DICTS`
  + `expireIfNeeded`), so a cached pointer would otherwise dangle.

- **Cross-command batch path (`addCommandToBatch`)** — sets
  `PENDING_CMD_KEYS_PREFETCHED` on every command added to the batch,
  even on partial-batch overflow (was: only when ALL keys fit). The
  intra-command path then uniformly skips supplemental prefetching for
  any command the batch touched. Rationale: running both paths
  (cross-command warm + intra-command supplement) caused a measured
  −9.6 % regression on x86 with pipeline-10, and the partial cross-
  command warmup is sufficient for the head of the keyset; the cold
  tail goes through normal lookup, which is still cheaper than running
  the FSM a second time on already-warm keys.

- **Future types**: each dict's `dictType` can register its own
  `prefetchEntryKey` / `prefetchEntryValue` (e.g. for the hashtable hash
  encoding, the field-sds and value-sds payloads), without touching
  `memory_prefetch.c`.

## Benchmark validation

On x86, performance improvements are significant for larger batch sizes:
  - 5Mkeys-string-mget-10B-100keys-pipeline-10: +89.44%
  - 5Mkeys-string-mget-100B-100keys: +37.33%
  - 5Mkeys-string-mget-100B-30keys: +22.40%
On ARM (Graviton4), the gains are even more pronounced:
  - 5Mkeys-string-mget-10B-100keys-pipeline-10: +128.34%
  - 5Mkeys-string-mget-100B-100keys-pipeline-10: +46.76%
Overall, the improvement scales with batch size, while a few small-batch cases show marginal gains or slight regressions.

---------

Co-authored-by: Marcin Poźniak <marcin.pozniak@intel.com>
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action:run-benchmark Triggers the benchmark suite for this Pull Request release-notes indication that this issue needs to be mentioned in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants