Skip to content

Inline reduce.c#363

Closed
mkannwischer wants to merge 2 commits intomainfrom
inline-reduce
Closed

Inline reduce.c#363
mkannwischer wants to merge 2 commits intomainfrom
inline-reduce

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer commented Jul 15, 2025

This commits inlines reduce.c entirely. The goal is to get performance without -flto closer to performance with -flto.

Note that some of these functions are a potential side-channel leak. That needs to be addressed urgently, but it with -flto it was already present prior to this commit.

Here are some benchmarks on Apple M1:

Parameter Set Operation main (w/o lto) main (w/ lto) inline reduce (w/o lto) Improvement w/o lto
ML-DSA-44 keypair 57,501 46,698 49,867 13.3% faster
sign 269,596 183,685 201,120 25.4% faster
verify 79,420 66,167 70,232 11.6% faster
ML-DSA-65 keypair 100,446 81,473 86,266 14.1% faster
sign 436,016 279,764 306,257 29.8% faster
verify 126,042 102,810 109,346 13.2% faster
ML-DSA-87 keypair 157,452 130,063 137,147 12.9% faster
sign 501,679 334,345 367,961 26.7% faster
verify 191,143 158,452 168,890 11.6% faster

This commits inlines reduce.c entirely. The goal is to get performance without
-flto closer to performance with -flto.
Namespacing will be added in the next commit.

Note that some of these functions are a potential side-channel leak.
That needs to be addressed urgently, but it with -flto it was already present
prior to this commit.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 46825 cycles 46826 cycles 1.00
ML-DSA-44 sign 177188 cycles 177037 cycles 1.00
ML-DSA-44 verify 66118 cycles 66121 cycles 1.00
ML-DSA-65 keypair 81452 cycles 81439 cycles 1.00
ML-DSA-65 sign 280975 cycles 281199 cycles 1.00
ML-DSA-65 verify 103013 cycles 103021 cycles 1.00
ML-DSA-87 keypair 130389 cycles 130764 cycles 1.00
ML-DSA-87 sign 343277 cycles 344116 cycles 1.00
ML-DSA-87 verify 158698 cycles 159148 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 112404 cycles 112317 cycles 1.00
ML-DSA-44 sign 410011 cycles 409141 cycles 1.00
ML-DSA-44 verify 130173 cycles 130153 cycles 1.00
ML-DSA-65 keypair 193307 cycles 192224 cycles 1.01
ML-DSA-65 sign 658830 cycles 658665 cycles 1.00
ML-DSA-65 verify 207274 cycles 207237 cycles 1.00
ML-DSA-87 keypair 316112 cycles 315969 cycles 1.00
ML-DSA-87 sign 833227 cycles 832811 cycles 1.00
ML-DSA-87 verify 339209 cycles 339006 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 114813 cycles 114865 cycles 1.00
ML-DSA-44 sign 412000 cycles 412011 cycles 1.00
ML-DSA-44 verify 135193 cycles 134863 cycles 1.00
ML-DSA-65 keypair 198896 cycles 198812 cycles 1.00
ML-DSA-65 sign 680514 cycles 680280 cycles 1.00
ML-DSA-65 verify 217396 cycles 216872 cycles 1.00
ML-DSA-87 keypair 324554 cycles 324705 cycles 1.00
ML-DSA-87 sign 850800 cycles 850463 cycles 1.00
ML-DSA-87 verify 349168 cycles 348367 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 210124 cycles 210196 cycles 1.00
ML-DSA-44 sign 721783 cycles 721639 cycles 1.00
ML-DSA-44 verify 228911 cycles 229000 cycles 1.00
ML-DSA-65 keypair 374415 cycles 376228 cycles 1.00
ML-DSA-65 sign 1184328 cycles 1186159 cycles 1.00
ML-DSA-65 verify 369737 cycles 370145 cycles 1.00
ML-DSA-87 keypair 596893 cycles 595991 cycles 1.00
ML-DSA-87 sign 1513799 cycles 1514501 cycles 1.00
ML-DSA-87 verify 613913 cycles 613928 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 36076 cycles 35472 cycles 1.02
ML-DSA-44 sign 136744 cycles 135995 cycles 1.01
ML-DSA-44 verify 45651 cycles 45539 cycles 1.00
ML-DSA-65 keypair 62837 cycles 61655 cycles 1.02
ML-DSA-65 sign 224208 cycles 224162 cycles 1.00
ML-DSA-65 verify 70700 cycles 70153 cycles 1.01
ML-DSA-87 keypair 92901 cycles 92936 cycles 1.00
ML-DSA-87 sign 263178 cycles 263402 cycles 1.00
ML-DSA-87 verify 103468 cycles 104888 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 58158 cycles 58075 cycles 1.00
ML-DSA-44 sign 208636 cycles 208086 cycles 1.00
ML-DSA-44 verify 73631 cycles 73487 cycles 1.00
ML-DSA-65 keypair 101118 cycles 102370 cycles 0.99
ML-DSA-65 sign 340338 cycles 343811 cycles 0.99
ML-DSA-65 verify 113311 cycles 114664 cycles 0.99
ML-DSA-87 keypair 156158 cycles 154161 cycles 1.01
ML-DSA-87 sign 410297 cycles 403666 cycles 1.02
ML-DSA-87 verify 171967 cycles 168988 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 96223 cycles 96133 cycles 1.00
ML-DSA-44 sign 322625 cycles 323398 cycles 1.00
ML-DSA-44 verify 103023 cycles 102521 cycles 1.00
ML-DSA-65 keypair 163722 cycles 164220 cycles 1.00
ML-DSA-65 sign 525458 cycles 528496 cycles 0.99
ML-DSA-65 verify 162116 cycles 163672 cycles 0.99
ML-DSA-87 keypair 269420 cycles 267417 cycles 1.01
ML-DSA-87 sign 678943 cycles 668661 cycles 1.02
ML-DSA-87 verify 269766 cycles 270775 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 71284 cycles 71452 cycles 1.00
ML-DSA-44 sign 220884 cycles 219809 cycles 1.00
ML-DSA-44 verify 83258 cycles 82905 cycles 1.00
ML-DSA-65 keypair 129787 cycles 122829 cycles 1.06
ML-DSA-65 sign 372454 cycles 355657 cycles 1.05
ML-DSA-65 verify 138894 cycles 131537 cycles 1.06
ML-DSA-87 keypair 208939 cycles 207402 cycles 1.01
ML-DSA-87 sign 459755 cycles 458199 cycles 1.00
ML-DSA-87 verify 217649 cycles 217016 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-65 keypair 129787 cycles 122829 cycles 1.06
ML-DSA-65 sign 372454 cycles 355657 cycles 1.05
ML-DSA-65 verify 138894 cycles 131537 cycles 1.06

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 69995 cycles 70041 cycles 1.00
ML-DSA-44 sign 234315 cycles 234176 cycles 1.00
ML-DSA-44 verify 80668 cycles 80552 cycles 1.00
ML-DSA-65 keypair 123520 cycles 123653 cycles 1.00
ML-DSA-65 sign 378261 cycles 378786 cycles 1.00
ML-DSA-65 verify 130196 cycles 130313 cycles 1.00
ML-DSA-87 keypair 201256 cycles 200817 cycles 1.00
ML-DSA-87 sign 479149 cycles 478823 cycles 1.00
ML-DSA-87 verify 210028 cycles 209893 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 153831 cycles 153777 cycles 1.00
ML-DSA-44 sign 517642 cycles 519021 cycles 1.00
ML-DSA-44 verify 165460 cycles 165744 cycles 1.00
ML-DSA-65 keypair 261428 cycles 260906 cycles 1.00
ML-DSA-65 sign 840274 cycles 835713 cycles 1.01
ML-DSA-65 verify 265501 cycles 264996 cycles 1.00
ML-DSA-87 keypair 433542 cycles 434080 cycles 1.00
ML-DSA-87 sign 1076244 cycles 1072326 cycles 1.00
ML-DSA-87 verify 438133 cycles 438369 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 42534 cycles 42857 cycles 0.99
ML-DSA-44 sign 150029 cycles 152741 cycles 0.98
ML-DSA-44 verify 53935 cycles 54592 cycles 0.99
ML-DSA-65 keypair 71787 cycles 72055 cycles 1.00
ML-DSA-65 sign 242691 cycles 241346 cycles 1.01
ML-DSA-65 verify 81640 cycles 81481 cycles 1.00
ML-DSA-87 keypair 109594 cycles 116110 cycles 0.94
ML-DSA-87 sign 283014 cycles 293320 cycles 0.96
ML-DSA-87 verify 121848 cycles 129363 cycles 0.94

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 135139 cycles 133930 cycles 1.01
ML-DSA-44 sign 511573 cycles 508520 cycles 1.01
ML-DSA-44 verify 149868 cycles 148836 cycles 1.01
ML-DSA-65 keypair 224921 cycles 224038 cycles 1.00
ML-DSA-65 sign 818202 cycles 812510 cycles 1.01
ML-DSA-65 verify 234375 cycles 233515 cycles 1.00
ML-DSA-87 keypair 369241 cycles 368528 cycles 1.00
ML-DSA-87 sign 1036630 cycles 1028366 cycles 1.01
ML-DSA-87 verify 381473 cycles 381571 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 74150 cycles 74140 cycles 1.00
ML-DSA-44 sign 249715 cycles 249753 cycles 1.00
ML-DSA-44 verify 88336 cycles 88344 cycles 1.00
ML-DSA-65 keypair 129625 cycles 129627 cycles 1.00
ML-DSA-65 sign 409116 cycles 408711 cycles 1.00
ML-DSA-65 verify 140568 cycles 140611 cycles 1.00
ML-DSA-87 keypair 210204 cycles 210254 cycles 1.00
ML-DSA-87 sign 511707 cycles 511786 cycles 1.00
ML-DSA-87 verify 224698 cycles 224850 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 131438 cycles 131458 cycles 1.00
ML-DSA-44 sign 456110 cycles 455752 cycles 1.00
ML-DSA-44 verify 155907 cycles 142820 cycles 1.09
ML-DSA-65 keypair 224146 cycles 224110 cycles 1.00
ML-DSA-65 sign 737256 cycles 733546 cycles 1.01
ML-DSA-65 verify 227448 cycles 226877 cycles 1.00
ML-DSA-87 keypair 370346 cycles 370702 cycles 1.00
ML-DSA-87 sign 937526 cycles 938185 cycles 1.00
ML-DSA-87 verify 377177 cycles 377210 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton4 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 verify 155907 cycles 142820 cycles 1.09

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 279537 cycles 279878 cycles 1.00
ML-DSA-44 sign 999321 cycles 1000781 cycles 1.00
ML-DSA-44 verify 322572 cycles 322375 cycles 1.00
ML-DSA-65 keypair 477900 cycles 477824 cycles 1.00
ML-DSA-65 sign 1631588 cycles 1632841 cycles 1.00
ML-DSA-65 verify 504851 cycles 506598 cycles 1.00
ML-DSA-87 keypair 812675 cycles 818125 cycles 0.99
ML-DSA-87 sign 2153973 cycles 2169078 cycles 0.99
ML-DSA-87 verify 844933 cycles 849879 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 118720 cycles 118724 cycles 1.00
ML-DSA-44 sign 417742 cycles 418369 cycles 1.00
ML-DSA-44 verify 131396 cycles 131290 cycles 1.00
ML-DSA-65 keypair 201094 cycles 200076 cycles 1.01
ML-DSA-65 sign 673922 cycles 670991 cycles 1.00
ML-DSA-65 verify 205109 cycles 205403 cycles 1.00
ML-DSA-87 keypair 334760 cycles 333300 cycles 1.00
ML-DSA-87 sign 872059 cycles 867281 cycles 1.01
ML-DSA-87 verify 342677 cycles 341842 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 115446 cycles 115308 cycles 1.00
ML-DSA-44 sign 413174 cycles 412469 cycles 1.00
ML-DSA-44 verify 135735 cycles 135292 cycles 1.00
ML-DSA-65 keypair 199555 cycles 199065 cycles 1.00
ML-DSA-65 sign 681625 cycles 680832 cycles 1.00
ML-DSA-65 verify 217840 cycles 217260 cycles 1.00
ML-DSA-87 keypair 325406 cycles 325724 cycles 1.00
ML-DSA-87 sign 852589 cycles 852130 cycles 1.00
ML-DSA-87 verify 349719 cycles 349000 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 136245 cycles 136271 cycles 1.00
ML-DSA-44 sign 451613 cycles 451027 cycles 1.00
ML-DSA-44 verify 155350 cycles 147182 cycles 1.06
ML-DSA-65 keypair 238565 cycles 239177 cycles 1.00
ML-DSA-65 sign 736060 cycles 732754 cycles 1.00
ML-DSA-65 verify 237515 cycles 237861 cycles 1.00
ML-DSA-87 keypair 390205 cycles 390707 cycles 1.00
ML-DSA-87 sign 949974 cycles 947422 cycles 1.00
ML-DSA-87 verify 397043 cycles 396895 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton3 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 verify 155350 cycles 147182 cycles 1.06

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 210966 cycles 211210 cycles 1.00
ML-DSA-44 sign 723730 cycles 725201 cycles 1.00
ML-DSA-44 verify 236053 cycles 240026 cycles 0.98
ML-DSA-65 keypair 375314 cycles 376881 cycles 1.00
ML-DSA-65 sign 1187001 cycles 1187944 cycles 1.00
ML-DSA-65 verify 370676 cycles 370687 cycles 1.00
ML-DSA-87 keypair 597937 cycles 596640 cycles 1.00
ML-DSA-87 sign 1517966 cycles 1517557 cycles 1.00
ML-DSA-87 verify 614895 cycles 614359 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 452237 cycles 451807 cycles 1.00
ML-DSA-44 sign 2009978 cycles 2010372 cycles 1.00
ML-DSA-44 verify 527294 cycles 527463 cycles 1.00
ML-DSA-65 keypair 761279 cycles 761380 cycles 1.00
ML-DSA-65 sign 3319302 cycles 3334854 cycles 1.00
ML-DSA-65 verify 819410 cycles 818285 cycles 1.00
ML-DSA-87 keypair 1226354 cycles 1226480 cycles 1.00
ML-DSA-87 sign 4133631 cycles 4120162 cycles 1.00
ML-DSA-87 verify 1313874 cycles 1318108 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 828157 cycles 940147 cycles 0.88
ML-DSA-44 sign 3240359 cycles 4320171 cycles 0.75
ML-DSA-44 verify 900061 cycles 1071283 cycles 0.84
ML-DSA-65 keypair 1384661 cycles 1565069 cycles 0.88
ML-DSA-65 sign 5287800 cycles 7134328 cycles 0.74
ML-DSA-65 verify 1438846 cycles 1690275 cycles 0.85
ML-DSA-87 keypair 2301743 cycles 2524208 cycles 0.91
ML-DSA-87 sign 6667972 cycles 8722966 cycles 0.76
ML-DSA-87 verify 2374721 cycles 2707524 cycles 0.88

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer
Copy link
Copy Markdown
Contributor Author

Current benchmarks on Apple M1:

Parameter Set Operation main (w/o lto) main (w/ lto) inline reduce (w/o lto) Improvement w/o lto
ML-DSA-44 keypair 57,501 46,698 49,867 13.3% faster
sign 269,596 183,685 201,120 25.4% faster
verify 79,420 66,167 70,232 11.6% faster
ML-DSA-65 keypair 100,446 81,473 86,266 14.1% faster
sign 436,016 279,764 306,257 29.8% faster
verify 126,042 102,810 109,346 13.2% faster
ML-DSA-87 keypair 157,452 130,063 137,147 12.9% faster
sign 501,679 334,345 367,961 26.7% faster
verify 191,143 158,452 168,890 11.6% faster

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 220605 cycles 227787 cycles 0.97
ML-DSA-44 sign 684992 cycles 714292 cycles 0.96
ML-DSA-44 verify 241715 cycles 243230 cycles 0.99
ML-DSA-65 keypair 388861 cycles 380639 cycles 1.02
ML-DSA-65 sign 1153471 cycles 1120572 cycles 1.03
ML-DSA-65 verify 395364 cycles 389764 cycles 1.01
ML-DSA-87 keypair 646922 cycles 655479 cycles 0.99
ML-DSA-87 sign 1533379 cycles 1525900 cycles 1.00
ML-DSA-87 verify 671432 cycles 668923 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 keypair 303445 cycles 298929 cycles 1.02
ML-DSA-44 sign 1115755 cycles 1099833 cycles 1.01
ML-DSA-44 verify 340065 cycles 325451 cycles 1.04
ML-DSA-65 keypair 542423 cycles 554401 cycles 0.98
ML-DSA-65 sign 1798741 cycles 1810077 cycles 0.99
ML-DSA-65 verify 521510 cycles 534686 cycles 0.98
ML-DSA-87 keypair 836193 cycles 832492 cycles 1.00
ML-DSA-87 sign 2337426 cycles 2294933 cycles 1.02
ML-DSA-87 verify 869972 cycles 875274 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: f29e99e Previous: f2d8abd Ratio
ML-DSA-44 verify 340065 cycles 325451 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

This renames the functions in reduce.h:
 - montgomery_reduce -> mld_montgomery_reduce
 - reduce32 -> mld_reduce32
 - caddq -> mld_caddq

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
@mkannwischer
Copy link
Copy Markdown
Contributor Author

Closing in favour of #410

@mkannwischer mkannwischer deleted the inline-reduce branch October 1, 2025 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider inlining reduce.c

2 participants