polyval: implement R/F algorithm by tarcieri · Pull Request #294 · RustCrypto/universal-hashes

tarcieri · 2026-02-08T19:06:23Z

Adapts the Apache 2.0+MIT licensed implementation of the Reduction/Field algorithm (#290, see https://eprint.iacr.org/2025/2171) from hpcrypt-mac: https://github.com/seceq/hpcrypt/

Sources were taken as of upstream commit 6c22585, then modified to fit the structure of this crate (e.g. removed buffering).

Benchmarks show for 4 powers-of-H, this algorithm is ~66% faster for short messages on ARMv8, and ~20% faster for longer messages. It requires double the memory footprint, however, and for the same size is faster than our previous Karatsuba + 8 x powers-of-H for short messages, but slower for longer ones.

This removes the generic parameter, since the implementation we're adapting isn't currently generic, however it could potentially be returned if there's a desire for higher performance on long messages.

Additionally support for using a custom init block has been removed, but could be added back. I was noticing there are some oddities to it like Reset will always use 0 rather than the custom init block. I think perhaps instead of trying to shoehorn that stuff into Polyval/GHash and adding undue complexity there, we should instead expose bits through polyval::hazmat (like the core R/F algorithm) which make it possible to build your own customized high performance GHASH/POLYVAL-like UHF that can be tweaked in any way desired.

tarcieri · 2026-02-08T19:37:04Z

After getting all of the backends to share a common representation when using Karatsuba + powers-of-H, this feels like an annoying backstep, as it's using a union across backend-specific state representations again.

I was a bit surprised by the benchmark results, as at least on my M1 Max here it seems to actually be slower than Karatsuba + powers-of-H for longer messages:

polyval/update_padded/10
                        time:   [7.5722 ns 7.6091 ns 7.6540 ns]
                        thrpt:  [1.2168 GiB/s 1.2240 GiB/s 1.2299 GiB/s]
                 change:
                        time:   [−39.217% −38.757% −38.378%] (p = 0.00 < 0.05)
                        thrpt:  [+62.281% +63.285% +64.519%]
                        Performance has improved.

polyval/update_padded/100
                        time:   [32.548 ns 32.694 ns 32.847 ns]
                        thrpt:  [2.8354 GiB/s 2.8486 GiB/s 2.8614 GiB/s]
                 change:
                        time:   [−62.433% −62.232% −62.054%] (p = 0.00 < 0.05)
                        thrpt:  [+163.53% +164.78% +166.19%]
                        Performance has improved.

polyval/update_padded/1000
                        time:   [164.98 ns 165.57 ns 166.20 ns]
                        thrpt:  [5.6036 GiB/s 5.6248 GiB/s 5.6451 GiB/s]
                 change:
                        time:   [−0.2885% +0.1648% +0.6194%] (p = 0.50 > 0.05)
                        thrpt:  [−0.6156% −0.1645% +0.2893%]
                        No change in performance detected.

polyval/update_padded/10000
                        time:   [1.4803 µs 1.4860 µs 1.4938 µs]
                        thrpt:  [6.2348 GiB/s 6.2673 GiB/s 6.2915 GiB/s]
                 change:
                        time:   [+70.640% +71.959% +73.859%] (p = 0.00 < 0.05)
                        thrpt:  [−42.482% −41.847% −41.397%]
                        Performance has regressed.

I tried quickly modifying it to operate over 8-blocks which improved performance slightly but not enough to make up for the regression on larger messages. I suppose Karatsuba+powers-of-H is only doing a (more expensive) reduction every 8-blocks, whereas R/F is doing two simpler reductions.

I'm curious if the algorithm were properly adapted to 8-blocks, so it deferred reductions all 8-blocks the way Karatsuba+powers-of-H does, if it would be faster in that case. I haven't tried that yet.

This algorithm requires a key expansion twice the size of powers-of-H (it's effectively powers-of-H + a D value computed for each power-of-H), so it seems like there's also a compute/memory tradeoff.

tarcieri · 2026-02-08T19:38:26Z

Sidebar: I ended up liking how I reorganized the crate in this PR (effectively a hybrid of how it used to be and the FieldElement refactor, which uses an entirely portable implementation of FieldElement and builds the soft backend on that, rather than coupling the soft backend to FieldElement itself)

tarcieri · 2026-02-08T21:11:24Z

I tried turning down DEFAULT_PARALLELISM on Karatsuba + powers-of-H from 8 to 4 and benchmarking that versus R/F, showing the improvement of the latter over the former:

polyval/update_padded/10
                        time:   [7.5540 ns 7.5604 ns 7.5679 ns]
                        thrpt:  [1.2306 GiB/s 1.2318 GiB/s 1.2329 GiB/s]
                 change:
                        time:   [−40.267% −40.020% −39.769%] (p = 0.00 < 0.05)
                        thrpt:  [+66.029% +66.723% +67.412%]
                        Performance has improved.

polyval/update_padded/100
                        time:   [32.941 ns 33.074 ns 33.213 ns]
                        thrpt:  [2.8041 GiB/s 2.8159 GiB/s 2.8272 GiB/s]
                 change:
                        time:   [−31.774% −31.406% −31.036%] (p = 0.00 < 0.05)
                        thrpt:  [+45.003% +45.784% +46.572%]
                        Performance has improved.

polyval/update_padded/1000
                        time:   [167.07 ns 168.29 ns 169.70 ns]
                        thrpt:  [5.4881 GiB/s 5.5342 GiB/s 5.5743 GiB/s]
                 change:
                        time:   [−17.896% −16.950% −15.831%] (p = 0.00 < 0.05)
                        thrpt:  [+18.809% +20.410% +21.797%]
                        Performance has improved.

polyval/update_padded/10000
                        time:   [1.4745 µs 1.4758 µs 1.4772 µs]
                        thrpt:  [6.3048 GiB/s 6.3108 GiB/s 6.3160 GiB/s]
                 change:
                        time:   [−16.302% −15.891% −15.536%] (p = 0.00 < 0.05)
                        thrpt:  [+18.394% +18.893% +19.477%]
                        Performance has improved.

So yes, R/F wins, but at a cost of twice the memory usage.

I think we should probably adopt it, but then the question becomes do we leave the implementation like this, nice and simple with a one-size-fits-most powers-of-H, or do we make it generic again for people who want to optimize for long messages?

Adapts the Apache 2.0+MIT licensed implementation of the Reduction/Field algorithm (#290, see https://eprint.iacr.org/2025/2171) from the `hpcrypt-mac` crate: https://github.com/seceq/hpcrypt/ Sources were taken as of upstream commit 6c22585, then modified to fit the structure of this crate (e.g. removed buffering). Benchmarks show for 4 powers-of-H, this algorithm is ~66% faster for short messages on ARMv8, and ~20% faster for longer messages. It requires double the memory footprint, however, and for the same size is faster than our previous Karatsuba + 8 x powers-of-H for short messages, but slower for longer ones. This removes the generic parameter, since the implementation we're adapting isn't currently generic, however it could potentially be returned if there's a desire for higher performance on long messages. Additionally support for using a custom init block has been removed, but could be added back. I was noticing there are some oddities to it like `Reset` will always use `0` rather than the custom init block. I think perhaps instead of trying to shoehorn that stuff into `Polyval`/`GHash` and adding undue complexity there, we should instead expose bits through `polyval::hazmat` (like the core R/F algorithm) which make it possible to build your own customized high performance GHASH/POLYVAL-like UHF that can be tweaked in any way desired.

tarcieri · 2026-02-08T22:33:39Z

Note: also double checked this against aes-gcm / aes-gcm-siv, at least on ARMv8

## Added - `hazmat` feature that exposes `FieldElement` type (#279) ## Changed - Edition changed to 2024 and MSRV bumped to 1.85 (#228) - Relax MSRV policy and allow MSRV bumps in patch releases - Replace `polyval_force_soft` with `polyval_backend="soft"` (#259) - Use `cpubits` crate for `soft` backend selection (#261) - Bump `cpufeatures` to v0.3 (#292) - Use [Reduction/Field algorithm] for parallel block processing on `aarch64`/`x86(_64)` (#294) - `polyval::mulx` moved to `FieldElement::mulx` (#296) - Update to `universal-hash` v0.6 (#310) ## Removed - `cfg(polyval_armv8)` - now enabled by default (#214)

tarcieri mentioned this pull request Feb 8, 2026

polyval: implement optimizations from eprint 2025/2171 #290

Closed

tarcieri force-pushed the polyval/rf-algorithm branch 4 times, most recently from 0a4fc63 to 3d7e37d Compare February 8, 2026 21:35

tarcieri changed the title ~~[WIP] polyval: implement R/F algorithm~~ polyval: implement R/F algorithm Feb 8, 2026

tarcieri force-pushed the polyval/rf-algorithm branch from 3d7e37d to 1b706b9 Compare February 8, 2026 22:24

tarcieri marked this pull request as ready for review February 8, 2026 22:24

tarcieri force-pushed the polyval/rf-algorithm branch from 1b706b9 to 2c394f5 Compare February 8, 2026 22:29

tarcieri merged commit a327d42 into master Feb 8, 2026
32 checks passed

tarcieri deleted the polyval/rf-algorithm branch February 8, 2026 22:32

tarcieri mentioned this pull request Feb 27, 2026

polyval v0.7.0 #314

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polyval: implement R/F algorithm#294

polyval: implement R/F algorithm#294
tarcieri merged 1 commit intomasterfrom
polyval/rf-algorithm

tarcieri commented Feb 8, 2026 •

edited

Loading

Uh oh!

tarcieri commented Feb 8, 2026 •

edited

Loading

Uh oh!

tarcieri commented Feb 8, 2026

Uh oh!

tarcieri commented Feb 8, 2026

Uh oh!

Uh oh!

tarcieri commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tarcieri commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarcieri commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarcieri commented Feb 8, 2026

Uh oh!

tarcieri commented Feb 8, 2026

Uh oh!

Uh oh!

tarcieri commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tarcieri commented Feb 8, 2026 •

edited

Loading

tarcieri commented Feb 8, 2026 •

edited

Loading