Skip to content

Improve multi-byte access performance when UNALIGNED availability is unknown (#16207)#16319

Merged
normanmaurer merged 1 commit intonetty:5.0from
chrisvest:5.0-perf-unaligned
Feb 21, 2026
Merged

Improve multi-byte access performance when UNALIGNED availability is unknown (#16207)#16319
normanmaurer merged 1 commit intonetty:5.0from
chrisvest:5.0-perf-unaligned

Conversation

@chrisvest
Copy link
Copy Markdown
Member

Motivation:
When the JVM cannot report UNALIGNED support (e.g., on some ARM platforms or in restricted environments), Netty falls back to an architecture-based guess. If that guess yields UNALIGNED = false, multi-byte reads (getLong,
getInt, getShort) degrade to byte-by-byte access (8× getByte) even though the JIT could still optimize at runtime. There is currently no distinction between "the platform is known to not support unaligned access" and "we
simply don't know."
Modification:
Introduce UNALIGNED_AVAILABLE flag to distinguish whether UNALIGNED info came from the runtime (true) or from an architecture-based guess (false). When unaligned info is unavailable, use VarHandle instead of 8× getByte so
HotSpot can emit the optimal access strategy at runtime.

  • PlatformDependent0: add UNALIGNED_AVAILABLE flag; add io.netty.unalignedAccess=true|false|unavailable system property.
  • PlatformDependent: expose isUnalignedAvailable(); allow VarHandle initialization even when Unsafe is available.
  • UnsafeByteBufUtil: use VarHandle for multi-byte byte[] operations (getShort/getInt/getLong, setShort/setInt/setLong and their LE variants) when unaligned info is unavailable.
  • UnpooledUnsafeDirectByteBuf: use VarHandle for multi-byte ByteBuffer operations (_getShort/_getInt/_getLong, _setShort/_setInt/_setLong and their LE variants) when unaligned info is unavailable.
  • ByteBufUtil: enable SWAR indexOf/lastIndexOf via VarHandle when unaligned info is unavailable.
    Three-tier access strategy:
Condition Strategy Path
UNALIGNED = true (known supported) Unsafe direct access single Unsafe.getLong()
UNALIGNED info unavailable (unknown) VarHandle JIT chooses optimal access
UNALIGNED = false (known unsupported) 8× getByte safe byte-wise assembly

Result:
Fixes #15781.
Benchmark (JDK 24.0.1, Apple Silicon, JMH 1.36):

Benchmark Summary

| Benchmark | false (8× getByte) | unavailable (VarHandle) | Improvement |

|----------|--------------------|--------------------------|-------------| | getLongDirect | 2.144 ns/op | 1.764 ns/op | 17.7% faster | | getLongHeap | 2.152 ns/op | 1.119 ns/op | 48.0% faster |

Sorry for the delay in submitting this PR.
If anything looks incorrect or needs adjustment, I will update it promptly.
Thanks for the review.


Co-authored-by: Norman Maurer norman_maurer@apple.com
Co-authored-by: Chris Vest christianvest_hansen@apple.com

(cherry picked from commit 0ee9723)

…unknown (netty#16207)

Motivation:
When the JVM cannot report UNALIGNED support (e.g., on some ARM
platforms or in restricted environments), Netty falls back to an
architecture-based guess. If that guess yields UNALIGNED = false,
multi-byte reads (getLong,
getInt, getShort) degrade to byte-by-byte access (8× getByte) even
though the JIT could still optimize at runtime. There is currently no
distinction between "the platform is known to not support unaligned
access" and "we
simply don't know."
Modification:
Introduce UNALIGNED_AVAILABLE flag to distinguish whether UNALIGNED info
came from the runtime (true) or from an architecture-based guess
(false). When unaligned info is unavailable, use VarHandle instead of 8×
getByte so
HotSpot can emit the optimal access strategy at runtime.
- PlatformDependent0: add UNALIGNED_AVAILABLE flag; add
io.netty.unalignedAccess=true|false|unavailable system property.
- PlatformDependent: expose isUnalignedAvailable(); allow VarHandle
initialization even when Unsafe is available.
- UnsafeByteBufUtil: use `VarHandle` for multi-byte `byte[]` operations
(`getShort/getInt/getLong`, `setShort/setInt/setLong` and their LE
variants) when unaligned info is unavailable.
- UnpooledUnsafeDirectByteBuf: use `VarHandle` for multi-byte
`ByteBuffer` operations (`_getShort/_getInt/_getLong`,
`_setShort/_setInt/_setLong` and their LE variants) when unaligned info
is unavailable.
- ByteBufUtil: enable SWAR indexOf/lastIndexOf via VarHandle when
unaligned info is unavailable.
Three-tier access strategy:

| Condition | Strategy | Path |
|---------|---------|------|
| UNALIGNED = true (known supported) | Unsafe direct access | single
`Unsafe.getLong()` |
| UNALIGNED info unavailable (unknown) | VarHandle | JIT chooses optimal
access |
| UNALIGNED = false (known unsupported) | 8× getByte | safe byte-wise
assembly |

---
Result:
Fixes netty#15781.
Benchmark (JDK 24.0.1, Apple Silicon, JMH 1.36):
### Benchmark Summary

| Benchmark | false (8× getByte) | unavailable (VarHandle) | Improvement
|

|----------|--------------------|--------------------------|-------------|
| getLongDirect | 2.144 ns/op | 1.764 ns/op | 17.7% faster |
| getLongHeap | 2.152 ns/op | 1.119 ns/op | 48.0% faster |

Sorry for the delay in submitting this PR.
If anything looks incorrect or needs adjustment, I will update it
promptly.
Thanks for the review.

---------

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Co-authored-by: Chris Vest <christianvest_hansen@apple.com>

(cherry picked from commit 0ee9723)
@chrisvest chrisvest added this to the 5.0.0.Final milestone Feb 20, 2026
@chrisvest chrisvest enabled auto-merge (squash) February 20, 2026 23:35
@normanmaurer normanmaurer merged commit 6d2d89a into netty:5.0 Feb 21, 2026
20 of 23 checks passed
@chrisvest chrisvest deleted the 5.0-perf-unaligned branch February 21, 2026 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants