Skip to content

ARM64 implementation for poly.PackLe16#563

Merged
bwesterb merged 12 commits intocloudflare:mainfrom
elementrics:polyPackLe16
Aug 15, 2025
Merged

ARM64 implementation for poly.PackLe16#563
bwesterb merged 12 commits intocloudflare:mainfrom
elementrics:polyPackLe16

Conversation

@elementrics
Copy link
Copy Markdown
Contributor

To test performance difference on arm64 chips: "go test -benchmem -run=^$ ./sign/internal/dilithium -bench=Le16"

On my machine (Apple M1 Max) on average:

BenchmarkPackLe16-10            69454038                17.62 ns/op            0 B/op          0 allocs/op
BenchmarkPackLe16Generic-10     18178948                66.44 ns/op            0 B/op          0 allocs/op

Also consider this are microbenchmarks!

Copy link
Copy Markdown
Member

@bwesterb bwesterb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small nits.

// length N/2.
func (p *Poly) PackLe16(buf []byte) {
p.packLe16Generic(buf)
// early bounds so we don't have to in assembly code
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the check that much, but I dislike that you write we have to check it. I think it's optional. Most internal functions have a bunch of prerequisites, which aren't always easy to check. One prerequisite you don't check here is that the coefficients of p are indeed less than 16. That's fine: inspecting the call sites we see that it is indeed fine. Same for length of the buffer passed.

@elementrics
Copy link
Copy Markdown
Contributor Author

I wanted to throw in one thing. I'm coming from C etc. which means the ABI was based on which registers are caller and which are callee-saved.

Regarding Go I am not 100% sure what must be guaranteed. I've read somewhere that in Go are no callee-save registers. Is it still true or are there any caveats?

Based on my personal projects, this was always the assumption, also there were no issues.

@bwesterb
Copy link
Copy Markdown
Member

Regarding Go I am not 100% sure what must be guaranteed. I've read somewhere that in Go are no callee-save registers. Is it still true or are there any caveats?

There are caveats. They're documented here.

@bwesterb bwesterb merged commit 12bafce into cloudflare:main Aug 15, 2025
11 checks passed
@elementrics elementrics deleted the polyPackLe16 branch August 16, 2025 04:57
arthurzam pushed a commit to gentoo-golang-dist/forgejo-runner that referenced this pull request Mar 12, 2026
…1418)

This PR contains the following updates:

| Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [github.com/cloudflare/circl](https://github.com/cloudflare/circl) | `v1.6.1` -> `v1.6.3` | ![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fcloudflare%2fcircl/v1.6.3?slim=true) | ![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fcloudflare%2fcircl/v1.6.1/v1.6.3?slim=true) |

---

### CIRCL has an incorrect calculation in secp384r1 CombinedMult
[CVE-2026-1229](https://nvd.nist.gov/vuln/detail/CVE-2026-1229) / [GHSA-q9hv-hpm4-hj6x](GHSA-q9hv-hpm4-hj6x)

<details>
<summary>More information</summary>

#### Details
The CombinedMult function in the CIRCL ecc/p384 package (secp384r1 curve) produces an incorrect value for specific inputs. The issue is fixed by using complete addition formulas.
ECDH and ECDSA signing relying on this curve are not affected.

The bug was fixed in **[v1.6.3](https://github.com/cloudflare/circl/releases/tag/v1.6.3)**.

#### Severity
- CVSS Score: 2.9 / 10 (Low)
- Vector String: `CVSS:4.0/AV:N/AC:H/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:L/SI:L/SA:L/E:P/S:N/AU:Y/U:Amber`

#### References
- [https://github.com/cloudflare/circl/security/advisories/GHSA-q9hv-hpm4-hj6x](https://github.com/cloudflare/circl/security/advisories/GHSA-q9hv-hpm4-hj6x)
- [https://nvd.nist.gov/vuln/detail/CVE-2026-1229](https://nvd.nist.gov/vuln/detail/CVE-2026-1229)
- [https://github.com/cloudflare/circl/pull/583](https://github.com/cloudflare/circl/pull/583)
- [https://github.com/cloudflare/circl](https://github.com/cloudflare/circl)
- [https://github.com/cloudflare/circl/releases/tag/v1.6.3](https://github.com/cloudflare/circl/releases/tag/v1.6.3)

This data is provided by [OSV](https://osv.dev/vulnerability/GHSA-q9hv-hpm4-hj6x) and the [GitHub Advisory Database](https://github.com/github/advisory-database) ([CC-BY 4.0](https://github.com/github/advisory-database/blob/main/LICENSE.md)).
</details>

---

### Release Notes

<details>
<summary>cloudflare/circl (github.com/cloudflare/circl)</summary>

### [`v1.6.3`](https://github.com/cloudflare/circl/releases/tag/v1.6.3): CIRCL v1.6.3

[Compare Source](cloudflare/circl@v1.6.2...v1.6.3)

#### CIRCL v1.6.3

Fix a bug on ecc/p384 scalar multiplication.

##### What's Changed

- sign/mldsa: Check opts for nil value  by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;582](cloudflare/circl#582)
- ecc/p384: Point addition must handle point doubling case. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;583](cloudflare/circl#583)
- Release CIRCL v1.6.3 by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;584](cloudflare/circl#584)

**Full Changelog**: <cloudflare/circl@v1.6.2...v1.6.3>

### [`v1.6.2`](https://github.com/cloudflare/circl/releases/tag/v1.6.2): CIRCL v1.6.2

[Compare Source](cloudflare/circl@v1.6.1...v1.6.2)

#### CIRCL v1.6.2

- New SLH-DSA, improvements in ML-DSA for arm64.
- Tested compilation on WASM.

#### What's Changed

- Optimize pairing product computation by moving exponentiations to G1. by [@&#8203;dfaranha](https://github.com/dfaranha) in [#&#8203;547](cloudflare/circl#547)
- sign: Adding SLH-DSA signature by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;512](cloudflare/circl#512)
- Update code generators to CIRCL v1.6.1. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;548](cloudflare/circl#548)
- ML-DSA: Add preliminary Wycheproof test vectors by [@&#8203;bwesterb](https://github.com/bwesterb) in [#&#8203;552](cloudflare/circl#552)
- go fmt by [@&#8203;bwesterb](https://github.com/bwesterb) in [#&#8203;554](cloudflare/circl#554)
- gz-compressing test vectors, use of HexBytes and ReadGzip functions. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;555](cloudflare/circl#555)
- group: Removes use of elliptic Marshal and Unmarshal functions. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;556](cloudflare/circl#556)
- Support encoding/decoding ML-DSA private keys (as long as they contain seeds) by [@&#8203;bwesterb](https://github.com/bwesterb) in [#&#8203;559](cloudflare/circl#559)
- Update to golangci-lint v2 by [@&#8203;bwesterb](https://github.com/bwesterb) in [#&#8203;560](cloudflare/circl#560)
- Preparation for ARM64 Implementation of poly operations for dilithium package. by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;562](cloudflare/circl#562)
- prepare power2Round for custom implementations in assembly by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;564](cloudflare/circl#564)
- ARM64 implementation for poly.PackLe16 by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;563](cloudflare/circl#563)
- add arm64 version of polyMulBy2toD by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;565](cloudflare/circl#565)
- add arm64 version of polySub by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;566](cloudflare/circl#566)
- group: add byteLen method for short groups and RandomScalar uses rand.Int by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;568](cloudflare/circl#568)
- add arm64 version of poly.Add/Sub by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;572](cloudflare/circl#572)
- group: Adding cryptobyte marshaling to scalars by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;569](cloudflare/circl#569)
- Bumping up to Go1.25 by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;574](cloudflare/circl#574)
- ci: Including WASM compilation. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;577](cloudflare/circl#577)
- Revert to using package-declared HPKE errors for shortkem instead of standard library errors by [@&#8203;harshiniwho](https://github.com/harshiniwho) in [#&#8203;578](cloudflare/circl#578)
- Release v1.6.2 by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;579](cloudflare/circl#579)

#### New Contributors

- [@&#8203;dfaranha](https://github.com/dfaranha) made their first contribution in [#&#8203;547](cloudflare/circl#547)
- [@&#8203;elementrics](https://github.com/elementrics) made their first contribution in [#&#8203;562](cloudflare/circl#562)
- [@&#8203;harshiniwho](https://github.com/harshiniwho) made their first contribution in [#&#8203;578](cloudflare/circl#578)

**Full Changelog**: <cloudflare/circl@v1.6.1...v1.6.2>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "" (UTC), Automerge - Between 12:00 AM and 03:59 AM ( * 0-3 * * * ) (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41Mi4wIiwidXBkYXRlZEluVmVyIjoiNDMuNTIuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiS2luZC9EZXBlbmRlbmN5VXBkYXRlIiwicnVuLWVuZC10by1lbmQtdGVzdHMiXX0=-->

Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/1418
Reviewed-by: Michael Kriese <michael.kriese@gmx.de>
Co-authored-by: Renovate Bot <bot@kriese.eu>
Co-committed-by: Renovate Bot <bot@kriese.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants