Skip to content

hashes: aarch64 acceleration support for sha256#4045

Closed
JeremiahR wants to merge 1 commit intorust-bitcoin:masterfrom
JeremiahR:jr_aarch64_sha256
Closed

hashes: aarch64 acceleration support for sha256#4045
JeremiahR wants to merge 1 commit intorust-bitcoin:masterfrom
JeremiahR:jr_aarch64_sha256

Conversation

@JeremiahR
Copy link
Copy Markdown
Contributor

@JeremiahR JeremiahR commented Feb 13, 2025

Hash sha-256 uses hardware acceleration on x86 platforms, but does not on aarch64. This has been noted in this issue.

The existing accelerated x86 implementation is based the sha-intrinsics code examples by Jeffery Walton.

I was able to port the code example for arm64 to Rust and achieve a ~6x speedup (from 352 MB/s to 2169 MB/s) in sha256 benchmarks on rust-bitcoin.

Notes

Arm/Aarch sha256 acceleration is still unstable and generates an error during cargo test. It is not showing up in CI.

error[E0658]: use of unstable library feature 'stdsimd'
   --> hashes/src/sha256/crypto.rs:705:22
    |
705 |             state1 = vsha256h2q_u32(state1, tmp2, tmp0);
    |                      ^^^^^^^^^^^^^^
    |
    = note: see issue #48556 <https://github.com/rust-lang/rust/issues/48556> for more information

Benchmarks

rust-bitcoin git:(master) ✗ RUSTFLAGS='--cfg=bench' cargo +nightly bench sha256::benches::sha256
...
running 3 tests
test sha256::benches::sha256_10                   ... bench:          30.27 ns/iter (+/- 0.86) = 333 MB/s
test sha256::benches::sha256_1k                   ... bench:       2,902.97 ns/iter (+/- 88.68) = 352 MB/s
test sha256::benches::sha256_64k                  ... bench:     183,857.29 ns/iter (+/- 2,460.84) = 356 MB/s
...
rust-bitcoin git:(jr_aarch64_sha256) ✗ RUSTFLAGS='--cfg=bench' cargo +nightly bench sha256::benches::sha256
...
running 3 tests
test sha256::benches::sha256_10                   ... bench:           8.53 ns/iter (+/- 0.61) = 1250 MB/s
test sha256::benches::sha256_1k                   ... bench:         472.11 ns/iter (+/- 10.47) = 2169 MB/s
test sha256::benches::sha256_64k                  ... bench:      30,171.56 ns/iter (+/- 811.96) = 2172 MB/s

@github-actions github-actions bot added the C-hashes PRs modifying the hashes crate label Feb 13, 2025
@coveralls
Copy link
Copy Markdown

coveralls commented Feb 13, 2025

Pull Request Test Coverage Report for Build 13502616415

Details

  • 0 of 11 (0.0%) changed or added relevant lines in 1 file are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.04%) to 82.696%

Changes Missing Coverage Covered Lines Changed/Added Lines %
hashes/src/sha256/crypto.rs 0 11 0.0%
Files with Coverage Reduction New Missed Lines %
hashes/src/sha256/crypto.rs 4 48.27%
Totals Coverage Status
Change from base Build 13501648209: -0.04%
Covered Lines: 21181
Relevant Lines: 25613

💛 - Coveralls

@JeremiahR JeremiahR changed the title [noship] aarch64 acceleration support for sha256 hashes: aarch64 acceleration support for sha256 Feb 13, 2025
Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! It'll take me some time to review this. In the meantime, if you're in the mood for checking if miri supports this (probably not since it didn't support x86_64 either) and possibly implementing it that'd be really cool. It's completely optional but it'd improve our confidence in the code. You can check my contribution to miri (search me as author in PRs) to get pointers on how to do that.

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 13, 2025

sha2456 acceleration is still unstable and generates an error during cargo test. It is not showing up in CI.

It's stable since 1.72

We do use conditional compilation for these kinds of things so please make it conditional on compiler version. (We could also in principle support the bootstrap hack if we find stuff didn't change since our MSRV but I'd rather not until someone says they need it. I also think our MSRV will get higher in ~half a year.)

@JeremiahR
Copy link
Copy Markdown
Contributor Author

Thank you! I believe this is the pr you are referring to @Kixunil. I will look at adding it for arm, but also admitting this looks over my head.

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 14, 2025

Yes, it is.

also admitting this looks over my head.

When I made that PR I didn't know anything about miri or compiler internals, only very little about SIMD or SHA256 internals and it wasn't too hard to pull it off - I just looked at other SIMD code and copied the basic concepts adapting them to what I needed. It wasn't too crazy and the only difficult/annoying part was getting the endianess right. (I ended up with a bunch of debug prints to know which values are reversed :D)

I think you can pretty much copy my PR and adapt it to ARM by looking at how other ARM intrinsics are implemented. But I also think the API will be pretty much the same, just with different names. Also IIRC ARM is also little endian, so you won't need to mess with that. And you can reuse the test vectors in my test.

@apoelstra
Copy link
Copy Markdown
Member

I'm not sure what to do with this. The code looks good to me -- it has no branches where somebody could slip in a back door and is roughly the shape of a hash function, meaning that if it's correct on our existing tests it's pretty-much guaranteed to be correct.

Of course, it's worthwhile checking with Miri, if possible, regarding soundness. But "if possible" is carrying a lot of weight.

Now, my local CI and Github CI aren't ever going to test this because we don't have aarch64 boxes. If we did, we'd find that this doesn't compile with our MSRV. As @Kixunil says, most likely we're not going to bump our MSRV in the next six months -- though if we did, I think 1.72 would be a candidate (it'd be 2 years old then, Aug 2023, and our MSRV tracking issue shows there's some stuff we want there #3339).

Normally I"d say that it's just unacceptable to have a MSRV break on a particular architecture, but the speedup here is really big. I wonder if we should feature-gate it or what. Maybe we should cfg-gate it and then hopefully we can remove the cfg-gate when we bump MSRV?

This affects our 1.0 plans.

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 15, 2025

I wonder if we should feature-gate it or what. Maybe we should cfg-gate it and then hopefully we can remove the cfg-gate when we bump MSRV?

Yes, this is exactly what I had in mind. Make the code faster on 1.72+ but as slow as it's now below that. When we bump MSRV we can delete the condition.

Also miri allows checking different targets (since it's an interpreter) which is quite handy.

@apoelstra
Copy link
Copy Markdown
Member

I also wonder if we can build (though not run) this code for aarch64 in extra_tests.sh. cc @tcharding does that seem reasonable/possible?

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 15, 2025

We actually have cross test for s390x-unknown-linux-gnu, IIUC adding a single line should be enough to also test aarch64. And I believe we should, since that platform is quite important these days (both Apple devices and RPi nodes).

@JeremiahR
Copy link
Copy Markdown
Contributor Author

Thank you all for the input and vetting.

I'm planning to do conditional compilation next week and then take a look at Miri. Also can try adding CI/test adjustments mentioned.

I'm not actually familiar if @Kixunil for example can push changes into this PR or not with how GitHub is setup. But you are welcome to if you see anything

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 16, 2025

Yes, I'm allowed but I have a bunch of other stuff I want to do this week so I don't expect to be able.

@JeremiahR
Copy link
Copy Markdown
Contributor Author

JeremiahR commented Feb 17, 2025

Tests failing on cargo.lock issues. It might be because of this line added to hashes/Cargo.toml:

internals = { package = "bitcoin-internals", version = "0.4.0" }

I'm not sure what to do here.

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 17, 2025

IIRC there's a script in contrib called update lock files which you need to run. The code looks good from very quick glance.

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I really don't get why CI is failing now. @tcharding were there any changes while I was away?

const fn Sigma1(x: u32) -> u32 { x.rotate_left(26) ^ x.rotate_left(21) ^ x.rotate_left(7) }
#[rustfmt::skip]
const fn sigma0(x: u32) -> u32 { x.rotate_left(25) ^ x.rotate_left(14) ^ (x >> 3) }
#[rustfmt::skip]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Copy Markdown
Contributor Author

@JeremiahR JeremiahR Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these because my editor was autoformatting those functions, and then removed them because it was causing some other autoformat errors (I think in cargo.toml) and I decided to just turn format-on-save off. I can add them back if you'd like.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI we run the formatter weekly in a cron job so theses are needed if the code will get munged. On other thing, we use nightly to run the formatter.

Copy link
Copy Markdown
Contributor Author

@JeremiahR JeremiahR Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (edit: added) these #[rustfmt::skip] back, but they will actually be an addition in this PR because they not present in the current codebase. As far as formatting, curious why not a pre-commit hook?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can't format without nightly and it's a PITA for guix (and anyone wary of downloading binaries) users to obtain a nightly compiler.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a pre-commit hook.

@tcharding
Copy link
Copy Markdown
Member

tcharding commented Feb 18, 2025

Hmm, I really don't get why CI is failing now. @tcharding were there any changes while I was away?

Looks like you just need to run just update-lock-files to update the lock files.

EDIT: Did you create the last two patches manually? You can just remove both of them and run the script to get the required changes to the lock files. This will need to be done in the same patch that modifies the manifest. If it was me I'd squash this whole PR into a single patch.

Comment on lines +546 to +548
rust_version! {
if >= 1.72 {
#[cfg(all(feature = "std", target_arch = "aarch64"))]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to tell me to shoosh and get back in my box; why the usage of 2 characters of indentation? Was that just a copy paste mistake?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's totally valid. I'm used to autoformat on either commit or save. I can change to 4.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks mate.

Copy link
Copy Markdown
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I scanned it but this is out of my domain. My review means very little.

apoelstra
apoelstra previously approved these changes Feb 22, 2025
Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d58e1f2; successfully ran local tests

@apoelstra
Copy link
Copy Markdown
Member

@Kixunil can you ack this? Or at least ack the compiler/cfg stuff?

@JeremiahR
Copy link
Copy Markdown
Contributor Author

apologies for duplicate work @apoelstra , I added compiler feature checks in 75487c3. This should help in case a cpu is aarch64 but doesn't have sha intrinsics.

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the commits need to be squashed since IIUC the first one without the second has UB on some architectures which breaks our rule to have each commit compilable and correctly working. But also removing allow would be preferred.

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So structurally this looks good (except one question below) but I'd really want to see some kind of test to make sure we don't break it if anyone touches it. Having broken sha256 on some architectures would be terrible.

rust_version! {
if >= 1.72 {
#[cfg(all(feature = "std", target_arch = "aarch64"))]
#[target_feature(enable = "sha2")]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also enable asimd since you're checking it above?

@JeremiahR
Copy link
Copy Markdown
Contributor Author

Thank you @Kixunil. I agree on deeper checks, and can make time to look at adding support for this to miri. I am busy with other things at the moment but it is on my list.

Making notes here on instructions, because asimd was incorrect:
vld1q_u32, vreinterpretq_u32_u8, vaddq_u32, vst1q_u32 -> neon
vsha256su0q_u32, vsha256su1q_u32 -> sha2

Pushing up a squashed branch with the correction momentarily.

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 24, 2025

Do you think you could look into adding the cross line in CI? I think it should be only one or very few. Pretty much just copy it from the existing one and somehow make sure that the SIMD code path is tested.

@JeremiahR
Copy link
Copy Markdown
Contributor Author

Do you think you could look into adding the cross line in CI? I think it should be only one or very few. Pretty much just copy it from the existing one and somehow make sure that the SIMD code path is tested.

@Kixunil (my apologies for the very long delay here). Are you talking about editing a github action? Or editing the miri script?

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Jul 13, 2025

I was talking about GH action but also adding it to miri would be great.

@jrakibi
Copy link
Copy Markdown
Contributor

jrakibi commented Jan 18, 2026

Ooh looks like I duplicated this PR with #5493 :) 



@JeremiahR are you interested in continuing to address the remaining feedback?
If so I can close my PR in favor of this one.
Otherwise we can close this one and I’ll read through the discussion here to update my PR with whatever still needs to be addressed

@JeremiahR
Copy link
Copy Markdown
Contributor Author

I am no longer working on this. Closing this so you can take over @jrakibi

@JeremiahR JeremiahR closed this Jan 21, 2026
@JeremiahR JeremiahR deleted the jr_aarch64_sha256 branch January 21, 2026 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-hashes PRs modifying the hashes crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants