Conversation
a79571c to
a982823
Compare
This commits inlines reduce.c entirely. The goal is to get performance without -flto closer to performance with -flto. Namespacing will be added in the next commit. Note that some of these functions are a potential side-channel leak. That needs to be addressed urgently, but it with -flto it was already present prior to this commit. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
a982823 to
f29e99e
Compare
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46825 cycles |
46826 cycles |
1.00 |
ML-DSA-44 sign |
177188 cycles |
177037 cycles |
1.00 |
ML-DSA-44 verify |
66118 cycles |
66121 cycles |
1.00 |
ML-DSA-65 keypair |
81452 cycles |
81439 cycles |
1.00 |
ML-DSA-65 sign |
280975 cycles |
281199 cycles |
1.00 |
ML-DSA-65 verify |
103013 cycles |
103021 cycles |
1.00 |
ML-DSA-87 keypair |
130389 cycles |
130764 cycles |
1.00 |
ML-DSA-87 sign |
343277 cycles |
344116 cycles |
1.00 |
ML-DSA-87 verify |
158698 cycles |
159148 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112404 cycles |
112317 cycles |
1.00 |
ML-DSA-44 sign |
410011 cycles |
409141 cycles |
1.00 |
ML-DSA-44 verify |
130173 cycles |
130153 cycles |
1.00 |
ML-DSA-65 keypair |
193307 cycles |
192224 cycles |
1.01 |
ML-DSA-65 sign |
658830 cycles |
658665 cycles |
1.00 |
ML-DSA-65 verify |
207274 cycles |
207237 cycles |
1.00 |
ML-DSA-87 keypair |
316112 cycles |
315969 cycles |
1.00 |
ML-DSA-87 sign |
833227 cycles |
832811 cycles |
1.00 |
ML-DSA-87 verify |
339209 cycles |
339006 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
114813 cycles |
114865 cycles |
1.00 |
ML-DSA-44 sign |
412000 cycles |
412011 cycles |
1.00 |
ML-DSA-44 verify |
135193 cycles |
134863 cycles |
1.00 |
ML-DSA-65 keypair |
198896 cycles |
198812 cycles |
1.00 |
ML-DSA-65 sign |
680514 cycles |
680280 cycles |
1.00 |
ML-DSA-65 verify |
217396 cycles |
216872 cycles |
1.00 |
ML-DSA-87 keypair |
324554 cycles |
324705 cycles |
1.00 |
ML-DSA-87 sign |
850800 cycles |
850463 cycles |
1.00 |
ML-DSA-87 verify |
349168 cycles |
348367 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
210124 cycles |
210196 cycles |
1.00 |
ML-DSA-44 sign |
721783 cycles |
721639 cycles |
1.00 |
ML-DSA-44 verify |
228911 cycles |
229000 cycles |
1.00 |
ML-DSA-65 keypair |
374415 cycles |
376228 cycles |
1.00 |
ML-DSA-65 sign |
1184328 cycles |
1186159 cycles |
1.00 |
ML-DSA-65 verify |
369737 cycles |
370145 cycles |
1.00 |
ML-DSA-87 keypair |
596893 cycles |
595991 cycles |
1.00 |
ML-DSA-87 sign |
1513799 cycles |
1514501 cycles |
1.00 |
ML-DSA-87 verify |
613913 cycles |
613928 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
36076 cycles |
35472 cycles |
1.02 |
ML-DSA-44 sign |
136744 cycles |
135995 cycles |
1.01 |
ML-DSA-44 verify |
45651 cycles |
45539 cycles |
1.00 |
ML-DSA-65 keypair |
62837 cycles |
61655 cycles |
1.02 |
ML-DSA-65 sign |
224208 cycles |
224162 cycles |
1.00 |
ML-DSA-65 verify |
70700 cycles |
70153 cycles |
1.01 |
ML-DSA-87 keypair |
92901 cycles |
92936 cycles |
1.00 |
ML-DSA-87 sign |
263178 cycles |
263402 cycles |
1.00 |
ML-DSA-87 verify |
103468 cycles |
104888 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
58158 cycles |
58075 cycles |
1.00 |
ML-DSA-44 sign |
208636 cycles |
208086 cycles |
1.00 |
ML-DSA-44 verify |
73631 cycles |
73487 cycles |
1.00 |
ML-DSA-65 keypair |
101118 cycles |
102370 cycles |
0.99 |
ML-DSA-65 sign |
340338 cycles |
343811 cycles |
0.99 |
ML-DSA-65 verify |
113311 cycles |
114664 cycles |
0.99 |
ML-DSA-87 keypair |
156158 cycles |
154161 cycles |
1.01 |
ML-DSA-87 sign |
410297 cycles |
403666 cycles |
1.02 |
ML-DSA-87 verify |
171967 cycles |
168988 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
96223 cycles |
96133 cycles |
1.00 |
ML-DSA-44 sign |
322625 cycles |
323398 cycles |
1.00 |
ML-DSA-44 verify |
103023 cycles |
102521 cycles |
1.00 |
ML-DSA-65 keypair |
163722 cycles |
164220 cycles |
1.00 |
ML-DSA-65 sign |
525458 cycles |
528496 cycles |
0.99 |
ML-DSA-65 verify |
162116 cycles |
163672 cycles |
0.99 |
ML-DSA-87 keypair |
269420 cycles |
267417 cycles |
1.01 |
ML-DSA-87 sign |
678943 cycles |
668661 cycles |
1.02 |
ML-DSA-87 verify |
269766 cycles |
270775 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
71284 cycles |
71452 cycles |
1.00 |
ML-DSA-44 sign |
220884 cycles |
219809 cycles |
1.00 |
ML-DSA-44 verify |
83258 cycles |
82905 cycles |
1.00 |
ML-DSA-65 keypair |
129787 cycles |
122829 cycles |
1.06 |
ML-DSA-65 sign |
372454 cycles |
355657 cycles |
1.05 |
ML-DSA-65 verify |
138894 cycles |
131537 cycles |
1.06 |
ML-DSA-87 keypair |
208939 cycles |
207402 cycles |
1.01 |
ML-DSA-87 sign |
459755 cycles |
458199 cycles |
1.00 |
ML-DSA-87 verify |
217649 cycles |
217016 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-65 keypair |
129787 cycles |
122829 cycles |
1.06 |
ML-DSA-65 sign |
372454 cycles |
355657 cycles |
1.05 |
ML-DSA-65 verify |
138894 cycles |
131537 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69995 cycles |
70041 cycles |
1.00 |
ML-DSA-44 sign |
234315 cycles |
234176 cycles |
1.00 |
ML-DSA-44 verify |
80668 cycles |
80552 cycles |
1.00 |
ML-DSA-65 keypair |
123520 cycles |
123653 cycles |
1.00 |
ML-DSA-65 sign |
378261 cycles |
378786 cycles |
1.00 |
ML-DSA-65 verify |
130196 cycles |
130313 cycles |
1.00 |
ML-DSA-87 keypair |
201256 cycles |
200817 cycles |
1.00 |
ML-DSA-87 sign |
479149 cycles |
478823 cycles |
1.00 |
ML-DSA-87 verify |
210028 cycles |
209893 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
153831 cycles |
153777 cycles |
1.00 |
ML-DSA-44 sign |
517642 cycles |
519021 cycles |
1.00 |
ML-DSA-44 verify |
165460 cycles |
165744 cycles |
1.00 |
ML-DSA-65 keypair |
261428 cycles |
260906 cycles |
1.00 |
ML-DSA-65 sign |
840274 cycles |
835713 cycles |
1.01 |
ML-DSA-65 verify |
265501 cycles |
264996 cycles |
1.00 |
ML-DSA-87 keypair |
433542 cycles |
434080 cycles |
1.00 |
ML-DSA-87 sign |
1076244 cycles |
1072326 cycles |
1.00 |
ML-DSA-87 verify |
438133 cycles |
438369 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
42534 cycles |
42857 cycles |
0.99 |
ML-DSA-44 sign |
150029 cycles |
152741 cycles |
0.98 |
ML-DSA-44 verify |
53935 cycles |
54592 cycles |
0.99 |
ML-DSA-65 keypair |
71787 cycles |
72055 cycles |
1.00 |
ML-DSA-65 sign |
242691 cycles |
241346 cycles |
1.01 |
ML-DSA-65 verify |
81640 cycles |
81481 cycles |
1.00 |
ML-DSA-87 keypair |
109594 cycles |
116110 cycles |
0.94 |
ML-DSA-87 sign |
283014 cycles |
293320 cycles |
0.96 |
ML-DSA-87 verify |
121848 cycles |
129363 cycles |
0.94 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135139 cycles |
133930 cycles |
1.01 |
ML-DSA-44 sign |
511573 cycles |
508520 cycles |
1.01 |
ML-DSA-44 verify |
149868 cycles |
148836 cycles |
1.01 |
ML-DSA-65 keypair |
224921 cycles |
224038 cycles |
1.00 |
ML-DSA-65 sign |
818202 cycles |
812510 cycles |
1.01 |
ML-DSA-65 verify |
234375 cycles |
233515 cycles |
1.00 |
ML-DSA-87 keypair |
369241 cycles |
368528 cycles |
1.00 |
ML-DSA-87 sign |
1036630 cycles |
1028366 cycles |
1.01 |
ML-DSA-87 verify |
381473 cycles |
381571 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
74150 cycles |
74140 cycles |
1.00 |
ML-DSA-44 sign |
249715 cycles |
249753 cycles |
1.00 |
ML-DSA-44 verify |
88336 cycles |
88344 cycles |
1.00 |
ML-DSA-65 keypair |
129625 cycles |
129627 cycles |
1.00 |
ML-DSA-65 sign |
409116 cycles |
408711 cycles |
1.00 |
ML-DSA-65 verify |
140568 cycles |
140611 cycles |
1.00 |
ML-DSA-87 keypair |
210204 cycles |
210254 cycles |
1.00 |
ML-DSA-87 sign |
511707 cycles |
511786 cycles |
1.00 |
ML-DSA-87 verify |
224698 cycles |
224850 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
131438 cycles |
131458 cycles |
1.00 |
ML-DSA-44 sign |
456110 cycles |
455752 cycles |
1.00 |
ML-DSA-44 verify |
155907 cycles |
142820 cycles |
1.09 |
ML-DSA-65 keypair |
224146 cycles |
224110 cycles |
1.00 |
ML-DSA-65 sign |
737256 cycles |
733546 cycles |
1.01 |
ML-DSA-65 verify |
227448 cycles |
226877 cycles |
1.00 |
ML-DSA-87 keypair |
370346 cycles |
370702 cycles |
1.00 |
ML-DSA-87 sign |
937526 cycles |
938185 cycles |
1.00 |
ML-DSA-87 verify |
377177 cycles |
377210 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton4 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 verify |
155907 cycles |
142820 cycles |
1.09 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
279537 cycles |
279878 cycles |
1.00 |
ML-DSA-44 sign |
999321 cycles |
1000781 cycles |
1.00 |
ML-DSA-44 verify |
322572 cycles |
322375 cycles |
1.00 |
ML-DSA-65 keypair |
477900 cycles |
477824 cycles |
1.00 |
ML-DSA-65 sign |
1631588 cycles |
1632841 cycles |
1.00 |
ML-DSA-65 verify |
504851 cycles |
506598 cycles |
1.00 |
ML-DSA-87 keypair |
812675 cycles |
818125 cycles |
0.99 |
ML-DSA-87 sign |
2153973 cycles |
2169078 cycles |
0.99 |
ML-DSA-87 verify |
844933 cycles |
849879 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
118720 cycles |
118724 cycles |
1.00 |
ML-DSA-44 sign |
417742 cycles |
418369 cycles |
1.00 |
ML-DSA-44 verify |
131396 cycles |
131290 cycles |
1.00 |
ML-DSA-65 keypair |
201094 cycles |
200076 cycles |
1.01 |
ML-DSA-65 sign |
673922 cycles |
670991 cycles |
1.00 |
ML-DSA-65 verify |
205109 cycles |
205403 cycles |
1.00 |
ML-DSA-87 keypair |
334760 cycles |
333300 cycles |
1.00 |
ML-DSA-87 sign |
872059 cycles |
867281 cycles |
1.01 |
ML-DSA-87 verify |
342677 cycles |
341842 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
115446 cycles |
115308 cycles |
1.00 |
ML-DSA-44 sign |
413174 cycles |
412469 cycles |
1.00 |
ML-DSA-44 verify |
135735 cycles |
135292 cycles |
1.00 |
ML-DSA-65 keypair |
199555 cycles |
199065 cycles |
1.00 |
ML-DSA-65 sign |
681625 cycles |
680832 cycles |
1.00 |
ML-DSA-65 verify |
217840 cycles |
217260 cycles |
1.00 |
ML-DSA-87 keypair |
325406 cycles |
325724 cycles |
1.00 |
ML-DSA-87 sign |
852589 cycles |
852130 cycles |
1.00 |
ML-DSA-87 verify |
349719 cycles |
349000 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
136245 cycles |
136271 cycles |
1.00 |
ML-DSA-44 sign |
451613 cycles |
451027 cycles |
1.00 |
ML-DSA-44 verify |
155350 cycles |
147182 cycles |
1.06 |
ML-DSA-65 keypair |
238565 cycles |
239177 cycles |
1.00 |
ML-DSA-65 sign |
736060 cycles |
732754 cycles |
1.00 |
ML-DSA-65 verify |
237515 cycles |
237861 cycles |
1.00 |
ML-DSA-87 keypair |
390205 cycles |
390707 cycles |
1.00 |
ML-DSA-87 sign |
949974 cycles |
947422 cycles |
1.00 |
ML-DSA-87 verify |
397043 cycles |
396895 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton3 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 verify |
155350 cycles |
147182 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
210966 cycles |
211210 cycles |
1.00 |
ML-DSA-44 sign |
723730 cycles |
725201 cycles |
1.00 |
ML-DSA-44 verify |
236053 cycles |
240026 cycles |
0.98 |
ML-DSA-65 keypair |
375314 cycles |
376881 cycles |
1.00 |
ML-DSA-65 sign |
1187001 cycles |
1187944 cycles |
1.00 |
ML-DSA-65 verify |
370676 cycles |
370687 cycles |
1.00 |
ML-DSA-87 keypair |
597937 cycles |
596640 cycles |
1.00 |
ML-DSA-87 sign |
1517966 cycles |
1517557 cycles |
1.00 |
ML-DSA-87 verify |
614895 cycles |
614359 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
452237 cycles |
451807 cycles |
1.00 |
ML-DSA-44 sign |
2009978 cycles |
2010372 cycles |
1.00 |
ML-DSA-44 verify |
527294 cycles |
527463 cycles |
1.00 |
ML-DSA-65 keypair |
761279 cycles |
761380 cycles |
1.00 |
ML-DSA-65 sign |
3319302 cycles |
3334854 cycles |
1.00 |
ML-DSA-65 verify |
819410 cycles |
818285 cycles |
1.00 |
ML-DSA-87 keypair |
1226354 cycles |
1226480 cycles |
1.00 |
ML-DSA-87 sign |
4133631 cycles |
4120162 cycles |
1.00 |
ML-DSA-87 verify |
1313874 cycles |
1318108 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
828157 cycles |
940147 cycles |
0.88 |
ML-DSA-44 sign |
3240359 cycles |
4320171 cycles |
0.75 |
ML-DSA-44 verify |
900061 cycles |
1071283 cycles |
0.84 |
ML-DSA-65 keypair |
1384661 cycles |
1565069 cycles |
0.88 |
ML-DSA-65 sign |
5287800 cycles |
7134328 cycles |
0.74 |
ML-DSA-65 verify |
1438846 cycles |
1690275 cycles |
0.85 |
ML-DSA-87 keypair |
2301743 cycles |
2524208 cycles |
0.91 |
ML-DSA-87 sign |
6667972 cycles |
8722966 cycles |
0.76 |
ML-DSA-87 verify |
2374721 cycles |
2707524 cycles |
0.88 |
This comment was automatically generated by workflow using github-action-benchmark.
|
Current benchmarks on Apple M1:
|
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
220605 cycles |
227787 cycles |
0.97 |
ML-DSA-44 sign |
684992 cycles |
714292 cycles |
0.96 |
ML-DSA-44 verify |
241715 cycles |
243230 cycles |
0.99 |
ML-DSA-65 keypair |
388861 cycles |
380639 cycles |
1.02 |
ML-DSA-65 sign |
1153471 cycles |
1120572 cycles |
1.03 |
ML-DSA-65 verify |
395364 cycles |
389764 cycles |
1.01 |
ML-DSA-87 keypair |
646922 cycles |
655479 cycles |
0.99 |
ML-DSA-87 sign |
1533379 cycles |
1525900 cycles |
1.00 |
ML-DSA-87 verify |
671432 cycles |
668923 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
303445 cycles |
298929 cycles |
1.02 |
ML-DSA-44 sign |
1115755 cycles |
1099833 cycles |
1.01 |
ML-DSA-44 verify |
340065 cycles |
325451 cycles |
1.04 |
ML-DSA-65 keypair |
542423 cycles |
554401 cycles |
0.98 |
ML-DSA-65 sign |
1798741 cycles |
1810077 cycles |
0.99 |
ML-DSA-65 verify |
521510 cycles |
534686 cycles |
0.98 |
ML-DSA-87 keypair |
836193 cycles |
832492 cycles |
1.00 |
ML-DSA-87 sign |
2337426 cycles |
2294933 cycles |
1.02 |
ML-DSA-87 verify |
869972 cycles |
875274 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f29e99e | Previous: f2d8abd | Ratio |
|---|---|---|---|
ML-DSA-44 verify |
340065 cycles |
325451 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
This renames the functions in reduce.h: - montgomery_reduce -> mld_montgomery_reduce - reduce32 -> mld_reduce32 - caddq -> mld_caddq Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
|
Closing in favour of #410 |
This commits inlines reduce.c entirely. The goal is to get performance without -flto closer to performance with -flto.
Note that some of these functions are a potential side-channel leak. That needs to be addressed urgently, but it with -flto it was already present prior to this commit.
Here are some benchmarks on Apple M1: