Skip to content

Preparation for ARM64 Implementation of poly operations for dilithium package.#562

Merged
armfazh merged 5 commits intocloudflare:mainfrom
elementrics:arm64prep
Aug 14, 2025
Merged

Preparation for ARM64 Implementation of poly operations for dilithium package.#562
armfazh merged 5 commits intocloudflare:mainfrom
elementrics:arm64prep

Conversation

@elementrics
Copy link
Copy Markdown
Contributor

@elementrics elementrics commented Aug 14, 2025

To test performance difference on arm64 chips: "go test -benchmem -run=^$ ./sign/internal/dilithium -bench=Add"

On my machine (Apple M1 Max) on average:

BenchmarkAddGeneric-10          12393860                95.46 ns/op            0 B/op          0 allocs/op
BenchmarkAdd-10                 68264402                17.40 ns/op            0 B/op          0 allocs/op

Also consider this are microbenchmarks!

@elementrics
Copy link
Copy Markdown
Contributor Author

a part of bigger PR: #561

@elementrics
Copy link
Copy Markdown
Contributor Author

elementrics commented Aug 14, 2025

once this PR is approved, I will provide the the other PR's due to the fact that the other PR's need the base files (arm64.s and arm64.go)

@elementrics
Copy link
Copy Markdown
Contributor Author

elementrics commented Aug 14, 2025

what should be considered is the alignment of the poly array. The difference between unaligned and aligned loads and stores needs to be tested.

@bwesterb bwesterb self-requested a review August 14, 2025 20:19
// manually unrolling could also be done, for now skipped
MOVW $16, R3

add:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can just call this loop

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okey, I will take it under consideration on the next PR's!

VLD1.P (64)(R1), [V0.S4, V1.S4, V2.S4, V3.S4]
VLD1.P (64)(R2), [V4.S4, V5.S4, V6.S4, V7.S4]

VADD V4.S4, V0.S4, V8.S4
Copy link
Copy Markdown
Member

@bwesterb bwesterb Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it's not necessary here, but you can reuse V0 or V4 as target register.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are absolutely right!

@bwesterb
Copy link
Copy Markdown
Member

This is ready to be merged @armfazh

@armfazh armfazh merged commit e5f5529 into cloudflare:main Aug 14, 2025
11 checks passed
@bwesterb
Copy link
Copy Markdown
Member

Thank you @elementrics, keep 'm coming!

@elementrics elementrics deleted the arm64prep branch August 15, 2025 07:25
arthurzam pushed a commit to gentoo-golang-dist/forgejo-runner that referenced this pull request Mar 12, 2026
…1418)

This PR contains the following updates:

| Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [github.com/cloudflare/circl](https://github.com/cloudflare/circl) | `v1.6.1` -> `v1.6.3` | ![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fcloudflare%2fcircl/v1.6.3?slim=true) | ![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fcloudflare%2fcircl/v1.6.1/v1.6.3?slim=true) |

---

### CIRCL has an incorrect calculation in secp384r1 CombinedMult
[CVE-2026-1229](https://nvd.nist.gov/vuln/detail/CVE-2026-1229) / [GHSA-q9hv-hpm4-hj6x](GHSA-q9hv-hpm4-hj6x)

<details>
<summary>More information</summary>

#### Details
The CombinedMult function in the CIRCL ecc/p384 package (secp384r1 curve) produces an incorrect value for specific inputs. The issue is fixed by using complete addition formulas.
ECDH and ECDSA signing relying on this curve are not affected.

The bug was fixed in **[v1.6.3](https://github.com/cloudflare/circl/releases/tag/v1.6.3)**.

#### Severity
- CVSS Score: 2.9 / 10 (Low)
- Vector String: `CVSS:4.0/AV:N/AC:H/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:L/SI:L/SA:L/E:P/S:N/AU:Y/U:Amber`

#### References
- [https://github.com/cloudflare/circl/security/advisories/GHSA-q9hv-hpm4-hj6x](https://github.com/cloudflare/circl/security/advisories/GHSA-q9hv-hpm4-hj6x)
- [https://nvd.nist.gov/vuln/detail/CVE-2026-1229](https://nvd.nist.gov/vuln/detail/CVE-2026-1229)
- [https://github.com/cloudflare/circl/pull/583](https://github.com/cloudflare/circl/pull/583)
- [https://github.com/cloudflare/circl](https://github.com/cloudflare/circl)
- [https://github.com/cloudflare/circl/releases/tag/v1.6.3](https://github.com/cloudflare/circl/releases/tag/v1.6.3)

This data is provided by [OSV](https://osv.dev/vulnerability/GHSA-q9hv-hpm4-hj6x) and the [GitHub Advisory Database](https://github.com/github/advisory-database) ([CC-BY 4.0](https://github.com/github/advisory-database/blob/main/LICENSE.md)).
</details>

---

### Release Notes

<details>
<summary>cloudflare/circl (github.com/cloudflare/circl)</summary>

### [`v1.6.3`](https://github.com/cloudflare/circl/releases/tag/v1.6.3): CIRCL v1.6.3

[Compare Source](cloudflare/circl@v1.6.2...v1.6.3)

#### CIRCL v1.6.3

Fix a bug on ecc/p384 scalar multiplication.

##### What's Changed

- sign/mldsa: Check opts for nil value  by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;582](cloudflare/circl#582)
- ecc/p384: Point addition must handle point doubling case. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;583](cloudflare/circl#583)
- Release CIRCL v1.6.3 by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;584](cloudflare/circl#584)

**Full Changelog**: <cloudflare/circl@v1.6.2...v1.6.3>

### [`v1.6.2`](https://github.com/cloudflare/circl/releases/tag/v1.6.2): CIRCL v1.6.2

[Compare Source](cloudflare/circl@v1.6.1...v1.6.2)

#### CIRCL v1.6.2

- New SLH-DSA, improvements in ML-DSA for arm64.
- Tested compilation on WASM.

#### What's Changed

- Optimize pairing product computation by moving exponentiations to G1. by [@&#8203;dfaranha](https://github.com/dfaranha) in [#&#8203;547](cloudflare/circl#547)
- sign: Adding SLH-DSA signature by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;512](cloudflare/circl#512)
- Update code generators to CIRCL v1.6.1. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;548](cloudflare/circl#548)
- ML-DSA: Add preliminary Wycheproof test vectors by [@&#8203;bwesterb](https://github.com/bwesterb) in [#&#8203;552](cloudflare/circl#552)
- go fmt by [@&#8203;bwesterb](https://github.com/bwesterb) in [#&#8203;554](cloudflare/circl#554)
- gz-compressing test vectors, use of HexBytes and ReadGzip functions. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;555](cloudflare/circl#555)
- group: Removes use of elliptic Marshal and Unmarshal functions. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;556](cloudflare/circl#556)
- Support encoding/decoding ML-DSA private keys (as long as they contain seeds) by [@&#8203;bwesterb](https://github.com/bwesterb) in [#&#8203;559](cloudflare/circl#559)
- Update to golangci-lint v2 by [@&#8203;bwesterb](https://github.com/bwesterb) in [#&#8203;560](cloudflare/circl#560)
- Preparation for ARM64 Implementation of poly operations for dilithium package. by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;562](cloudflare/circl#562)
- prepare power2Round for custom implementations in assembly by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;564](cloudflare/circl#564)
- ARM64 implementation for poly.PackLe16 by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;563](cloudflare/circl#563)
- add arm64 version of polyMulBy2toD by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;565](cloudflare/circl#565)
- add arm64 version of polySub by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;566](cloudflare/circl#566)
- group: add byteLen method for short groups and RandomScalar uses rand.Int by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;568](cloudflare/circl#568)
- add arm64 version of poly.Add/Sub by [@&#8203;elementrics](https://github.com/elementrics) in [#&#8203;572](cloudflare/circl#572)
- group: Adding cryptobyte marshaling to scalars by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;569](cloudflare/circl#569)
- Bumping up to Go1.25 by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;574](cloudflare/circl#574)
- ci: Including WASM compilation. by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;577](cloudflare/circl#577)
- Revert to using package-declared HPKE errors for shortkem instead of standard library errors by [@&#8203;harshiniwho](https://github.com/harshiniwho) in [#&#8203;578](cloudflare/circl#578)
- Release v1.6.2 by [@&#8203;armfazh](https://github.com/armfazh) in [#&#8203;579](cloudflare/circl#579)

#### New Contributors

- [@&#8203;dfaranha](https://github.com/dfaranha) made their first contribution in [#&#8203;547](cloudflare/circl#547)
- [@&#8203;elementrics](https://github.com/elementrics) made their first contribution in [#&#8203;562](cloudflare/circl#562)
- [@&#8203;harshiniwho](https://github.com/harshiniwho) made their first contribution in [#&#8203;578](cloudflare/circl#578)

**Full Changelog**: <cloudflare/circl@v1.6.1...v1.6.2>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "" (UTC), Automerge - Between 12:00 AM and 03:59 AM ( * 0-3 * * * ) (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41Mi4wIiwidXBkYXRlZEluVmVyIjoiNDMuNTIuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiS2luZC9EZXBlbmRlbmN5VXBkYXRlIiwicnVuLWVuZC10by1lbmQtdGVzdHMiXX0=-->

Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/1418
Reviewed-by: Michael Kriese <michael.kriese@gmx.de>
Co-authored-by: Renovate Bot <bot@kriese.eu>
Co-committed-by: Renovate Bot <bot@kriese.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants