Improve multi-byte access performance when UNALIGNED availability is unknown (#16207)#16319
Merged
normanmaurer merged 1 commit intonetty:5.0from Feb 21, 2026
Merged
Conversation
…unknown (netty#16207) Motivation: When the JVM cannot report UNALIGNED support (e.g., on some ARM platforms or in restricted environments), Netty falls back to an architecture-based guess. If that guess yields UNALIGNED = false, multi-byte reads (getLong, getInt, getShort) degrade to byte-by-byte access (8× getByte) even though the JIT could still optimize at runtime. There is currently no distinction between "the platform is known to not support unaligned access" and "we simply don't know." Modification: Introduce UNALIGNED_AVAILABLE flag to distinguish whether UNALIGNED info came from the runtime (true) or from an architecture-based guess (false). When unaligned info is unavailable, use VarHandle instead of 8× getByte so HotSpot can emit the optimal access strategy at runtime. - PlatformDependent0: add UNALIGNED_AVAILABLE flag; add io.netty.unalignedAccess=true|false|unavailable system property. - PlatformDependent: expose isUnalignedAvailable(); allow VarHandle initialization even when Unsafe is available. - UnsafeByteBufUtil: use `VarHandle` for multi-byte `byte[]` operations (`getShort/getInt/getLong`, `setShort/setInt/setLong` and their LE variants) when unaligned info is unavailable. - UnpooledUnsafeDirectByteBuf: use `VarHandle` for multi-byte `ByteBuffer` operations (`_getShort/_getInt/_getLong`, `_setShort/_setInt/_setLong` and their LE variants) when unaligned info is unavailable. - ByteBufUtil: enable SWAR indexOf/lastIndexOf via VarHandle when unaligned info is unavailable. Three-tier access strategy: | Condition | Strategy | Path | |---------|---------|------| | UNALIGNED = true (known supported) | Unsafe direct access | single `Unsafe.getLong()` | | UNALIGNED info unavailable (unknown) | VarHandle | JIT chooses optimal access | | UNALIGNED = false (known unsupported) | 8× getByte | safe byte-wise assembly | --- Result: Fixes netty#15781. Benchmark (JDK 24.0.1, Apple Silicon, JMH 1.36): ### Benchmark Summary | Benchmark | false (8× getByte) | unavailable (VarHandle) | Improvement | |----------|--------------------|--------------------------|-------------| | getLongDirect | 2.144 ns/op | 1.764 ns/op | 17.7% faster | | getLongHeap | 2.152 ns/op | 1.119 ns/op | 48.0% faster | Sorry for the delay in submitting this PR. If anything looks incorrect or needs adjustment, I will update it promptly. Thanks for the review. --------- Co-authored-by: Norman Maurer <norman_maurer@apple.com> Co-authored-by: Chris Vest <christianvest_hansen@apple.com> (cherry picked from commit 0ee9723)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
When the JVM cannot report UNALIGNED support (e.g., on some ARM platforms or in restricted environments), Netty falls back to an architecture-based guess. If that guess yields UNALIGNED = false, multi-byte reads (getLong,
getInt, getShort) degrade to byte-by-byte access (8× getByte) even though the JIT could still optimize at runtime. There is currently no distinction between "the platform is known to not support unaligned access" and "we
simply don't know."
Modification:
Introduce UNALIGNED_AVAILABLE flag to distinguish whether UNALIGNED info came from the runtime (true) or from an architecture-based guess (false). When unaligned info is unavailable, use VarHandle instead of 8× getByte so
HotSpot can emit the optimal access strategy at runtime.
VarHandlefor multi-bytebyte[]operations (getShort/getInt/getLong,setShort/setInt/setLongand their LE variants) when unaligned info is unavailable.VarHandlefor multi-byteByteBufferoperations (_getShort/_getInt/_getLong,_setShort/_setInt/_setLongand their LE variants) when unaligned info is unavailable.Three-tier access strategy:
Unsafe.getLong()Result:
Fixes #15781.
Benchmark (JDK 24.0.1, Apple Silicon, JMH 1.36):
Benchmark Summary
| Benchmark | false (8× getByte) | unavailable (VarHandle) | Improvement |
|----------|--------------------|--------------------------|-------------| | getLongDirect | 2.144 ns/op | 1.764 ns/op | 17.7% faster | | getLongHeap | 2.152 ns/op | 1.119 ns/op | 48.0% faster |
Sorry for the delay in submitting this PR.
If anything looks incorrect or needs adjustment, I will update it promptly.
Thanks for the review.
Co-authored-by: Norman Maurer norman_maurer@apple.com
Co-authored-by: Chris Vest christianvest_hansen@apple.com
(cherry picked from commit 0ee9723)