Skip to content

Parallel crc16#3323

Closed
zuiderkwast wants to merge 4 commits into
valkey-io:unstablefrom
zuiderkwast:parallel-crc16
Closed

Parallel crc16#3323
zuiderkwast wants to merge 4 commits into
valkey-io:unstablefrom
zuiderkwast:parallel-crc16

Conversation

@zuiderkwast

Copy link
Copy Markdown
Contributor

Make use of memory parallelism by computing the crc16 for multiple keys in parallel.

In prepareCommandQueue, all the keys of all the parsed commands are crc16'ed in parallel using a new function crc16_parallel().

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@github-actions

github-actions Bot commented Mar 6, 2026

Copy link
Copy Markdown

Benchmark ran on this commit: f2e26d9

Benchmark Comparison: HEAD vs HEAD (averaged) - rps metrics

Run Summary:

  • HEAD: 80 total runs, 16 configurations (avg 5.00 runs per config)
  • HEAD: 80 total runs, 16 configurations (avg 5.00 runs per config)

Statistical Notes:

  • CI99%: 99% Confidence Interval - range where the true population mean is likely to fall
  • PI99%: 99% Prediction Interval - range where a single future observation is likely to fall
  • CV: Coefficient of Variation - relative variability (σ/μ × 100%)

Note: Values with (n=X, σ=Y, CV=Z%, CI99%=±W%, PI99%=±V%) indicate averages from X runs with standard deviation Y, coefficient of variation Z%, 99% confidence interval margin of error ±W% of the mean, and 99% prediction interval margin of error ±V% of the mean. CI bounds [A, B] and PI bounds [C, D] show the actual interval ranges.

Configuration:

  • architecture: aarch64
  • benchmark_mode: duration
  • clients: 1600
  • cluster_mode: False
  • data_size: 16
  • duration: 180
  • tls: False
  • valkey_benchmark_threads: 90
  • warmup: 30
Command Metric Pipeline io_threads HEAD HEAD Diff % Change
GET rps 1 1 227980.954 (n=5, σ=1454.862, CV=0.64%, CI99%=±1.314%, PI99%=±3.219%, CI[224985.373, 230976.535], PI[220643.309, 235318.599]) 226524.126 (n=5, σ=1970.383, CV=0.87%, CI99%=±1.791%, PI99%=±4.387%, CI[222467.081, 230581.171], PI[216586.436, 236461.816]) -1456.828 -0.639%
GET rps 1 9 1522143.976 (n=5, σ=17565.836, CV=1.15%, CI99%=±2.376%, PI99%=±5.820%, CI[1485975.677, 1558312.275], PI[1433550.098, 1610737.854]) 1519460.098 (n=5, σ=8218.079, CV=0.54%, CI99%=±1.114%, PI99%=±2.728%, CI[1502538.959, 1536381.237], PI[1478011.941, 1560908.255]) -2683.878 -0.176%
GET rps 10 1 1226404.548 (n=5, σ=5463.938, CV=0.45%, CI99%=±0.917%, PI99%=±2.247%, CI[1215154.223, 1237654.873], PI[1198846.993, 1253962.103]) 1223067.450 (n=5, σ=3113.018, CV=0.25%, CI99%=±0.524%, PI99%=±1.284%, CI[1216657.703, 1229477.197], PI[1207366.840, 1238768.060]) -3337.098 -0.272%
GET rps 10 9 2819603.350 (n=5, σ=22360.357, CV=0.79%, CI99%=±1.633%, PI99%=±4.000%, CI[2773563.066, 2865643.634], PI[2706828.147, 2932378.553]) 2810958.600 (n=5, σ=22751.198, CV=0.81%, CI99%=±1.667%, PI99%=±4.082%, CI[2764113.570, 2857803.630], PI[2696212.179, 2925705.021]) -8644.750 -0.307%
SET rps 1 1 219730.182 (n=5, σ=956.208, CV=0.44%, CI99%=±0.896%, PI99%=±2.195%, CI[217761.337, 221699.027], PI[214907.517, 224552.847]) 218844.354 (n=5, σ=1012.825, CV=0.46%, CI99%=±0.953%, PI99%=±2.334%, CI[216758.933, 220929.775], PI[213736.136, 223952.572]) -885.828 -0.403%
SET rps 1 9 1476222.524 (n=5, σ=18074.901, CV=1.22%, CI99%=±2.521%, PI99%=±6.175%, CI[1439006.052, 1513438.996], PI[1385061.158, 1567383.890]) 1471052.676 (n=5, σ=12888.333, CV=0.88%, CI99%=±1.804%, PI99%=±4.419%, CI[1444515.420, 1497589.932], PI[1406049.939, 1536055.413]) -5169.848 -0.350%
SET rps 10 1 1039277.662 (n=5, σ=5211.708, CV=0.50%, CI99%=±1.033%, PI99%=±2.529%, CI[1028546.682, 1050008.642], PI[1012992.237, 1065563.087]) 1038797.312 (n=5, σ=3106.401, CV=0.30%, CI99%=±0.616%, PI99%=±1.508%, CI[1032401.190, 1045193.434], PI[1023130.076, 1054464.548]) -480.350 -0.046%
SET rps 10 9 1938895.750 (n=5, σ=14535.664, CV=0.75%, CI99%=±1.544%, PI99%=±3.781%, CI[1908966.618, 1968824.882], PI[1865584.648, 2012206.852]) 1933963.122 (n=5, σ=12192.841, CV=0.63%, CI99%=±1.298%, PI99%=±3.180%, CI[1908857.894, 1959068.350], PI[1872468.122, 1995458.122]) -4932.628 -0.254%

Configuration:

  • architecture: aarch64
  • benchmark_mode: duration
  • clients: 1600
  • cluster_mode: False
  • data_size: 96
  • duration: 180
  • tls: False
  • valkey_benchmark_threads: 90
  • warmup: 30
Command Metric Pipeline io_threads HEAD HEAD Diff % Change
GET rps 1 1 218537.084 (n=5, σ=3291.477, CV=1.51%, CI99%=±3.101%, PI99%=±7.596%, CI[211759.887, 225314.281], PI[201936.408, 235137.760]) 219379.972 (n=5, σ=886.532, CV=0.40%, CI99%=±0.832%, PI99%=±2.038%, CI[217554.590, 221205.354], PI[214908.716, 223851.228]) 842.888 +0.386%
GET rps 1 9 1458948.748 (n=5, σ=14478.970, CV=0.99%, CI99%=±2.043%, PI99%=±5.005%, CI[1429136.348, 1488761.148], PI[1385923.581, 1531973.915]) 1469765.300 (n=5, σ=13633.179, CV=0.93%, CI99%=±1.910%, PI99%=±4.678%, CI[1441694.396, 1497836.204], PI[1401005.908, 1538524.692]) 10816.552 +0.741%
GET rps 10 1 1152836.548 (n=5, σ=8197.210, CV=0.71%, CI99%=±1.464%, PI99%=±3.586%, CI[1135958.379, 1169714.717], PI[1111493.647, 1194179.449]) 1160865.798 (n=5, σ=3309.948, CV=0.29%, CI99%=±0.587%, PI99%=±1.438%, CI[1154050.568, 1167681.028], PI[1144171.963, 1177559.633]) 8029.250 +0.696%
GET rps 10 9 2250544.200 (n=5, σ=16880.942, CV=0.75%, CI99%=±1.544%, PI99%=±3.783%, CI[2215786.107, 2285302.293], PI[2165404.607, 2335683.793]) 2235375.900 (n=5, σ=12687.546, CV=0.57%, CI99%=±1.169%, PI99%=±2.863%, CI[2209252.067, 2261499.733], PI[2171385.839, 2299365.961]) -15168.300 -0.674%
SET rps 1 1 210487.304 (n=5, σ=1633.755, CV=0.78%, CI99%=±1.598%, PI99%=±3.915%, CI[207123.379, 213851.229], PI[202247.405, 218727.203]) 211044.534 (n=5, σ=2701.185, CV=1.28%, CI99%=±2.635%, PI99%=±6.455%, CI[205482.757, 216606.311], PI[197421.018, 224668.050]) 557.230 +0.265%
SET rps 1 9 1463323.278 (n=5, σ=9969.986, CV=0.68%, CI99%=±1.403%, PI99%=±3.436%, CI[1442794.940, 1483851.616], PI[1413039.324, 1513607.232]) 1468172.674 (n=5, σ=9675.533, CV=0.66%, CI99%=±1.357%, PI99%=±3.324%, CI[1448250.617, 1488094.731], PI[1419373.801, 1516971.547]) 4849.396 +0.331%
SET rps 10 1 1037574.152 (n=5, σ=5584.129, CV=0.54%, CI99%=±1.108%, PI99%=±2.714%, CI[1026076.353, 1049071.951], PI[1009410.410, 1065737.894]) 1038494.202 (n=5, σ=3380.191, CV=0.33%, CI99%=±0.670%, PI99%=±1.642%, CI[1031534.343, 1045454.061], PI[1021446.098, 1055542.306]) 920.050 +0.089%
SET rps 10 9 1829882.050 (n=5, σ=33999.670, CV=1.86%, CI99%=±3.826%, PI99%=±9.371%, CI[1759876.260, 1899887.840], PI[1658403.585, 2001360.515]) 1794868.350 (n=5, σ=31585.866, CV=1.76%, CI99%=±3.623%, PI99%=±8.876%, CI[1729832.616, 1859904.084], PI[1635563.988, 1954172.712]) -35013.700 -1.913%

@zuiderkwast

zuiderkwast commented Mar 6, 2026

Copy link
Copy Markdown
Contributor Author

The automatic benchmark was useless in this case because it doesn't do any runs in cluster mode.

I've benchmarked it locally with a single-node cluster. With --cluster, valkey benchmark uses hashtags that makes the crc16 input 3 bytes long. Without --cluster also works when benchmarking a single-node cluster. The crc16 input is then 16 bytes long. The database is empty, so running only the get test means all commands return null quickly. This is to maximize the relative time spent for crc16.

Server started with valkey-server --save '' --cluster-enabled yes followed by valkey-cli cluster addslotsrange 0 16383.

crc16 input unstable (RPS) this PR (RPS)
valkey-benchmark --threads 4 -P 10 -t get -q -n 5000000 -r 10000000 --warmup 1 --cluster 3 1533760.75 1519316.38
valkey-benchmark --threads 4 -P 10 -t get -q -n 5000000 -r 10000000 --warmup 1 16 1482817.38 1464569.38

Conclusion: No improvement. Actually a 1% decrease. Closing this draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant