-
Notifications
You must be signed in to change notification settings - Fork 966
Tracking issue: SIMD optimizations for double-sha256 on 64 byte inputs #5540
Description
Creating this issue to track the remaining work to optimize sha256 and to optimize double-sha256 for 64-byte inputs, and to get approval/feedback before I start working on these improvements
Core reports a ~2.5×–6.5× speedup in merkle root computation for 9001 leaves when using 2/4/8-way SIMD optimizations: bitcoin/bitcoin#13191
Basically what we need is to:
Add SIMD optimizaion for architecture without SHA acceleration:
- Add 4-way SSE4.1 implementation for double-sha256 on 64 byte inputs
- Add 8-way AVX2 implementation for double-sha256 on 64-byte inputs
Support ARM and optimize both SHA-NI and SHA2 for 2-way:
- Add sha256 ARM hardware acceleration (ref: hashes: aarch64 acceleration support for sha256 #4045 if Jeremiah wants to finish addressing remaining feedback, otherwise I'll continue working on hashes: add SHA256 ARM hardware acceleration #5493)
- Add 2-way SHA-NI implementation for double-sha256 for 64-byte inputs by interleaving hash operations
- Add 2-way ARM SHA2 implementation for double-sha256 for 64-byte inputs by interleaving hash operations (hashes: Add optimized ARM SHA256d for 64-byte inputs #5888)
- (Low impact) Add a specialized non-SIMD rust impl for 64-byte double-SHA256. I experimented here: https://github.com/jrakibi/rust-bitcoin/tree/18-01-double-sha256-optimized but didn’t see much improvement, main win was the use of constant padding block
(Side note: I built this tool last week, it may help anyone who wants to learn and review these PRs (the one already opened + upcoming ones if we decide to go forward with this) understand the internals of SHA-256 https://github.com/bitcoin-dev-project/hashes-visualizer )