hashes: aarch64 acceleration support for sha256#4045
hashes: aarch64 acceleration support for sha256#4045JeremiahR wants to merge 1 commit intorust-bitcoin:masterfrom
Conversation
Pull Request Test Coverage Report for Build 13502616415Details
💛 - Coveralls |
Kixunil
left a comment
There was a problem hiding this comment.
Awesome, thanks! It'll take me some time to review this. In the meantime, if you're in the mood for checking if miri supports this (probably not since it didn't support x86_64 either) and possibly implementing it that'd be really cool. It's completely optional but it'd improve our confidence in the code. You can check my contribution to miri (search me as author in PRs) to get pointers on how to do that.
It's stable since 1.72 We do use conditional compilation for these kinds of things so please make it conditional on compiler version. (We could also in principle support the bootstrap hack if we find stuff didn't change since our MSRV but I'd rather not until someone says they need it. I also think our MSRV will get higher in ~half a year.) |
|
Thank you! I believe this is the pr you are referring to @Kixunil. I will look at adding it for arm, but also admitting this looks over my head. |
|
Yes, it is.
When I made that PR I didn't know anything about I think you can pretty much copy my PR and adapt it to ARM by looking at how other ARM intrinsics are implemented. But I also think the API will be pretty much the same, just with different names. Also IIRC ARM is also little endian, so you won't need to mess with that. And you can reuse the test vectors in my test. |
|
I'm not sure what to do with this. The code looks good to me -- it has no branches where somebody could slip in a back door and is roughly the shape of a hash function, meaning that if it's correct on our existing tests it's pretty-much guaranteed to be correct. Of course, it's worthwhile checking with Miri, if possible, regarding soundness. But "if possible" is carrying a lot of weight. Now, my local CI and Github CI aren't ever going to test this because we don't have aarch64 boxes. If we did, we'd find that this doesn't compile with our MSRV. As @Kixunil says, most likely we're not going to bump our MSRV in the next six months -- though if we did, I think 1.72 would be a candidate (it'd be 2 years old then, Aug 2023, and our MSRV tracking issue shows there's some stuff we want there #3339). Normally I"d say that it's just unacceptable to have a MSRV break on a particular architecture, but the speedup here is really big. I wonder if we should feature-gate it or what. Maybe we should This affects our 1.0 plans. |
Yes, this is exactly what I had in mind. Make the code faster on 1.72+ but as slow as it's now below that. When we bump MSRV we can delete the condition. Also |
|
I also wonder if we can build (though not run) this code for aarch64 in extra_tests.sh. cc @tcharding does that seem reasonable/possible? |
|
We actually have cross test for |
|
Thank you all for the input and vetting. I'm planning to do conditional compilation next week and then take a look at Miri. Also can try adding CI/test adjustments mentioned. I'm not actually familiar if @Kixunil for example can push changes into this PR or not with how GitHub is setup. But you are welcome to if you see anything |
|
Yes, I'm allowed but I have a bunch of other stuff I want to do this week so I don't expect to be able. |
|
Tests failing on cargo.lock issues. It might be because of this line added to
I'm not sure what to do here. |
|
IIRC there's a script in |
Kixunil
left a comment
There was a problem hiding this comment.
Hmm, I really don't get why CI is failing now. @tcharding were there any changes while I was away?
hashes/src/sha256/crypto.rs
Outdated
| const fn Sigma1(x: u32) -> u32 { x.rotate_left(26) ^ x.rotate_left(21) ^ x.rotate_left(7) } | ||
| #[rustfmt::skip] | ||
| const fn sigma0(x: u32) -> u32 { x.rotate_left(25) ^ x.rotate_left(14) ^ (x >> 3) } | ||
| #[rustfmt::skip] |
There was a problem hiding this comment.
I added these because my editor was autoformatting those functions, and then removed them because it was causing some other autoformat errors (I think in cargo.toml) and I decided to just turn format-on-save off. I can add them back if you'd like.
There was a problem hiding this comment.
FYI we run the formatter weekly in a cron job so theses are needed if the code will get munged. On other thing, we use nightly to run the formatter.
There was a problem hiding this comment.
I (edit: added) these #[rustfmt::skip] back, but they will actually be an addition in this PR because they not present in the current codebase. As far as formatting, curious why not a pre-commit hook?
There was a problem hiding this comment.
Because we can't format without nightly and it's a PITA for guix (and anyone wary of downloading binaries) users to obtain a nightly compiler.
Looks like you just need to run EDIT: Did you create the last two patches manually? You can just remove both of them and run the script to get the required changes to the lock files. This will need to be done in the same patch that modifies the manifest. If it was me I'd squash this whole PR into a single patch. |
4fbd01c to
b5dc6c4
Compare
hashes/src/sha256/crypto.rs
Outdated
| rust_version! { | ||
| if >= 1.72 { | ||
| #[cfg(all(feature = "std", target_arch = "aarch64"))] |
There was a problem hiding this comment.
Feel free to tell me to shoosh and get back in my box; why the usage of 2 characters of indentation? Was that just a copy paste mistake?
There was a problem hiding this comment.
No it's totally valid. I'm used to autoformat on either commit or save. I can change to 4.
tcharding
left a comment
There was a problem hiding this comment.
I scanned it but this is out of my domain. My review means very little.
b5dc6c4 to
d58e1f2
Compare
|
@Kixunil can you ack this? Or at least ack the compiler/cfg stuff? |
|
apologies for duplicate work @apoelstra , I added compiler feature checks in 75487c3. This should help in case a cpu is aarch64 but doesn't have sha intrinsics. |
Kixunil
left a comment
There was a problem hiding this comment.
I believe the commits need to be squashed since IIUC the first one without the second has UB on some architectures which breaks our rule to have each commit compilable and correctly working. But also removing allow would be preferred.
75487c3 to
f2ab9c2
Compare
Kixunil
left a comment
There was a problem hiding this comment.
So structurally this looks good (except one question below) but I'd really want to see some kind of test to make sure we don't break it if anyone touches it. Having broken sha256 on some architectures would be terrible.
hashes/src/sha256/crypto.rs
Outdated
| rust_version! { | ||
| if >= 1.72 { | ||
| #[cfg(all(feature = "std", target_arch = "aarch64"))] | ||
| #[target_feature(enable = "sha2")] |
There was a problem hiding this comment.
Shouldn't this also enable asimd since you're checking it above?
|
Thank you @Kixunil. I agree on deeper checks, and can make time to look at adding support for this to Making notes here on instructions, because asimd was incorrect: Pushing up a squashed branch with the correction momentarily. |
f2ab9c2 to
0a81990
Compare
|
Do you think you could look into adding the |
@Kixunil (my apologies for the very long delay here). Are you talking about editing a github action? Or editing the miri script? |
|
I was talking about GH action but also adding it to |
|
Ooh looks like I duplicated this PR with #5493 :) @JeremiahR are you interested in continuing to address the remaining feedback? |
|
I am no longer working on this. Closing this so you can take over @jrakibi |
Hash sha-256 uses hardware acceleration on x86 platforms, but does not on aarch64. This has been noted in this issue.
The existing accelerated x86 implementation is based the sha-intrinsics code examples by Jeffery Walton.
I was able to port the code example for arm64 to Rust and achieve a ~6x speedup (from 352 MB/s to 2169 MB/s) in sha256 benchmarks on rust-bitcoin.
Notes
Arm/Aarch sha256 acceleration is still unstable and generates an error during
cargo test. It is not showing up in CI.Benchmarks