ARM64 implementation for poly.PackLe16#563
Conversation
| // length N/2. | ||
| func (p *Poly) PackLe16(buf []byte) { | ||
| p.packLe16Generic(buf) | ||
| // early bounds so we don't have to in assembly code |
There was a problem hiding this comment.
I don't mind the check that much, but I dislike that you write we have to check it. I think it's optional. Most internal functions have a bunch of prerequisites, which aren't always easy to check. One prerequisite you don't check here is that the coefficients of p are indeed less than 16. That's fine: inspecting the call sites we see that it is indeed fine. Same for length of the buffer passed.
|
I wanted to throw in one thing. I'm coming from C etc. which means the ABI was based on which registers are caller and which are callee-saved. Regarding Go I am not 100% sure what must be guaranteed. I've read somewhere that in Go are no callee-save registers. Is it still true or are there any caveats? Based on my personal projects, this was always the assumption, also there were no issues. |
There are caveats. They're documented here. |
…1418) This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [github.com/cloudflare/circl](https://github.com/cloudflare/circl) | `v1.6.1` -> `v1.6.3` |  |  | --- ### CIRCL has an incorrect calculation in secp384r1 CombinedMult [CVE-2026-1229](https://nvd.nist.gov/vuln/detail/CVE-2026-1229) / [GHSA-q9hv-hpm4-hj6x](GHSA-q9hv-hpm4-hj6x) <details> <summary>More information</summary> #### Details The CombinedMult function in the CIRCL ecc/p384 package (secp384r1 curve) produces an incorrect value for specific inputs. The issue is fixed by using complete addition formulas. ECDH and ECDSA signing relying on this curve are not affected. The bug was fixed in **[v1.6.3](https://github.com/cloudflare/circl/releases/tag/v1.6.3)**. #### Severity - CVSS Score: 2.9 / 10 (Low) - Vector String: `CVSS:4.0/AV:N/AC:H/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:L/SI:L/SA:L/E:P/S:N/AU:Y/U:Amber` #### References - [https://github.com/cloudflare/circl/security/advisories/GHSA-q9hv-hpm4-hj6x](https://github.com/cloudflare/circl/security/advisories/GHSA-q9hv-hpm4-hj6x) - [https://nvd.nist.gov/vuln/detail/CVE-2026-1229](https://nvd.nist.gov/vuln/detail/CVE-2026-1229) - [https://github.com/cloudflare/circl/pull/583](https://github.com/cloudflare/circl/pull/583) - [https://github.com/cloudflare/circl](https://github.com/cloudflare/circl) - [https://github.com/cloudflare/circl/releases/tag/v1.6.3](https://github.com/cloudflare/circl/releases/tag/v1.6.3) This data is provided by [OSV](https://osv.dev/vulnerability/GHSA-q9hv-hpm4-hj6x) and the [GitHub Advisory Database](https://github.com/github/advisory-database) ([CC-BY 4.0](https://github.com/github/advisory-database/blob/main/LICENSE.md)). </details> --- ### Release Notes <details> <summary>cloudflare/circl (github.com/cloudflare/circl)</summary> ### [`v1.6.3`](https://github.com/cloudflare/circl/releases/tag/v1.6.3): CIRCL v1.6.3 [Compare Source](cloudflare/circl@v1.6.2...v1.6.3) #### CIRCL v1.6.3 Fix a bug on ecc/p384 scalar multiplication. ##### What's Changed - sign/mldsa: Check opts for nil value by [@​armfazh](https://github.com/armfazh) in [#​582](cloudflare/circl#582) - ecc/p384: Point addition must handle point doubling case. by [@​armfazh](https://github.com/armfazh) in [#​583](cloudflare/circl#583) - Release CIRCL v1.6.3 by [@​armfazh](https://github.com/armfazh) in [#​584](cloudflare/circl#584) **Full Changelog**: <cloudflare/circl@v1.6.2...v1.6.3> ### [`v1.6.2`](https://github.com/cloudflare/circl/releases/tag/v1.6.2): CIRCL v1.6.2 [Compare Source](cloudflare/circl@v1.6.1...v1.6.2) #### CIRCL v1.6.2 - New SLH-DSA, improvements in ML-DSA for arm64. - Tested compilation on WASM. #### What's Changed - Optimize pairing product computation by moving exponentiations to G1. by [@​dfaranha](https://github.com/dfaranha) in [#​547](cloudflare/circl#547) - sign: Adding SLH-DSA signature by [@​armfazh](https://github.com/armfazh) in [#​512](cloudflare/circl#512) - Update code generators to CIRCL v1.6.1. by [@​armfazh](https://github.com/armfazh) in [#​548](cloudflare/circl#548) - ML-DSA: Add preliminary Wycheproof test vectors by [@​bwesterb](https://github.com/bwesterb) in [#​552](cloudflare/circl#552) - go fmt by [@​bwesterb](https://github.com/bwesterb) in [#​554](cloudflare/circl#554) - gz-compressing test vectors, use of HexBytes and ReadGzip functions. by [@​armfazh](https://github.com/armfazh) in [#​555](cloudflare/circl#555) - group: Removes use of elliptic Marshal and Unmarshal functions. by [@​armfazh](https://github.com/armfazh) in [#​556](cloudflare/circl#556) - Support encoding/decoding ML-DSA private keys (as long as they contain seeds) by [@​bwesterb](https://github.com/bwesterb) in [#​559](cloudflare/circl#559) - Update to golangci-lint v2 by [@​bwesterb](https://github.com/bwesterb) in [#​560](cloudflare/circl#560) - Preparation for ARM64 Implementation of poly operations for dilithium package. by [@​elementrics](https://github.com/elementrics) in [#​562](cloudflare/circl#562) - prepare power2Round for custom implementations in assembly by [@​elementrics](https://github.com/elementrics) in [#​564](cloudflare/circl#564) - ARM64 implementation for poly.PackLe16 by [@​elementrics](https://github.com/elementrics) in [#​563](cloudflare/circl#563) - add arm64 version of polyMulBy2toD by [@​elementrics](https://github.com/elementrics) in [#​565](cloudflare/circl#565) - add arm64 version of polySub by [@​elementrics](https://github.com/elementrics) in [#​566](cloudflare/circl#566) - group: add byteLen method for short groups and RandomScalar uses rand.Int by [@​armfazh](https://github.com/armfazh) in [#​568](cloudflare/circl#568) - add arm64 version of poly.Add/Sub by [@​elementrics](https://github.com/elementrics) in [#​572](cloudflare/circl#572) - group: Adding cryptobyte marshaling to scalars by [@​armfazh](https://github.com/armfazh) in [#​569](cloudflare/circl#569) - Bumping up to Go1.25 by [@​armfazh](https://github.com/armfazh) in [#​574](cloudflare/circl#574) - ci: Including WASM compilation. by [@​armfazh](https://github.com/armfazh) in [#​577](cloudflare/circl#577) - Revert to using package-declared HPKE errors for shortkem instead of standard library errors by [@​harshiniwho](https://github.com/harshiniwho) in [#​578](cloudflare/circl#578) - Release v1.6.2 by [@​armfazh](https://github.com/armfazh) in [#​579](cloudflare/circl#579) #### New Contributors - [@​dfaranha](https://github.com/dfaranha) made their first contribution in [#​547](cloudflare/circl#547) - [@​elementrics](https://github.com/elementrics) made their first contribution in [#​562](cloudflare/circl#562) - [@​harshiniwho](https://github.com/harshiniwho) made their first contribution in [#​578](cloudflare/circl#578) **Full Changelog**: <cloudflare/circl@v1.6.1...v1.6.2> </details> --- ### Configuration 📅 **Schedule**: Branch creation - "" (UTC), Automerge - Between 12:00 AM and 03:59 AM ( * 0-3 * * * ) (UTC). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41Mi4wIiwidXBkYXRlZEluVmVyIjoiNDMuNTIuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiS2luZC9EZXBlbmRlbmN5VXBkYXRlIiwicnVuLWVuZC10by1lbmQtdGVzdHMiXX0=--> Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/1418 Reviewed-by: Michael Kriese <michael.kriese@gmx.de> Co-authored-by: Renovate Bot <bot@kriese.eu> Co-committed-by: Renovate Bot <bot@kriese.eu>
To test performance difference on arm64 chips: "go test -benchmem -run=^$ ./sign/internal/dilithium -bench=Le16"
On my machine (Apple M1 Max) on average:
Also consider this are microbenchmarks!