sha256-arm: fix ABEF_SAVE/CDGH_SAVE var names#16
Open
jrakibi wants to merge 1 commit intonoloader:masterfrom
Open
sha256-arm: fix ABEF_SAVE/CDGH_SAVE var names#16jrakibi wants to merge 1 commit intonoloader:masterfrom
jrakibi wants to merge 1 commit intonoloader:masterfrom
Conversation
apoelstra
added a commit
to rust-bitcoin/rust-bitcoin
that referenced
this pull request
Jan 29, 2026
baaab03 add aarch64 cross testing (needed for ARM SHA acceleration) (jrakibi) 2299350 hashes: add SHA256 ARM hardware acceleration (jrakibi) Pull request description: #1962 adds SIMD SHA256 intrinsics for x86 machines. However, for ARM machines we’re still falling back to `software_process_block()`, which is ~4x slower according to benchmarks I ran on my system. The code is inspired by https://github.com/noloader/SHA-Intrinsics/tree/4e754bec921a9f281b69bd681ca0065763aa911c. Variable names are intentionally kept the same for easier review and comparison, although I fixed some incorrect variable names in the original implementation (more details in noloader/SHA-Intrinsics#16). these are some benchmarks I ran on an AWS EC2 instance (t4g.small) with a Neoverse-N1 CPU: without ARM acceleration ``` sha256/engine_input/10 time: [49.947 ns 49.955 ns 49.965 ns] thrpt: [190.87 MiB/s 190.91 MiB/s 190.94 MiB/s] sha256/engine_input/1024 time: [4.1740 µs 4.1744 µs 4.1747 µs] thrpt: [233.92 MiB/s 233.94 MiB/s 233.96 MiB/s] sha256/engine_input/65536 time: [266.68 µs 266.71 µs 266.75 µs] thrpt: [234.31 MiB/s 234.34 MiB/s 234.36 MiB/s] ``` with ARM ``` sha256/engine_input/10 time: [16.928 ns 16.930 ns 16.931 ns] thrpt: [563.26 MiB/s 563.31 MiB/s 563.36 MiB/s] sha256/engine_input/1024 time: [875.00 ns 875.07 ns 875.14 ns] thrpt: [1.0897 GiB/s 1.0898 GiB/s 1.0899 GiB/s] sha256/engine_input/65536 time: [55.939 µs 55.956 µs 55.979 µs] thrpt: [1.0903 GiB/s 1.0908 GiB/s 1.0911 GiB/s] ``` that’s almost ~5x faster for larger blocks ACKs for top commit: apoelstra: ACK baaab03; successfully ran local tests; though I do not have an aarch64 machine. I reviewed the code to the extent of checking that it looks like a hash function implementation tcharding: code review ACK baaab03 - looks ok when compared to other code in the file. The tests passing speaks for the correctness AFAIU. No further understanding implied and no local testing done by me. Tree-SHA512: ec5e54dfa92991727ebae80b42e4e9e96be55db17c1288587e548352c3b4e01016f2102accf5b766bcf5b088d4d85621d9d53f19d678b9c477c4ac72e9bc8249
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #14 renamed
ABEF_SAVE/CDGH_SAVEtoABCD_SAVE/EFGH_SAVEbut missed the declaration.As noted in the original commit of #14: ARM keeps state in natural order
[A,B,C,D]and[E,F,G,H]unlike x86 SHA-NI which uses[A,B,E,F]and[C,D,G,H]. Doc