Skip to content

Add support for creating direct ByteBuffers from MemorySegments#15231

Merged
chrisvest merged 20 commits intonetty:4.2from
chrisvest:4.2-alloc-direct
May 30, 2025
Merged

Add support for creating direct ByteBuffers from MemorySegments#15231
chrisvest merged 20 commits intonetty:4.2from
chrisvest:4.2-alloc-direct

Conversation

@chrisvest
Copy link
Copy Markdown
Member

@chrisvest chrisvest commented May 21, 2025

See https://github.com/netty/netty/wiki/Java-24-and-sun.misc.Unsafe for more information.

Motivation:
Netty's performance, particularly in very busy systems, relies heavily on the ability to immediately free native memory when it's no longer used.

Since time immemorial, we have been relying on sun.misc.Unsafe, and various hacks based upon it, for quickly releasing/cleaning direct ByteBuffers.

From Java 24 onwards, any use of Unsafe has started to produce warnings, and future versions will deny access entirely. For that reason, we no longer rely on Unsafe by default (it can still be explicitly enabled) when running on Java 24 or greater.

This creates a large performance gap that needs to be closed. Fortunately, Java 22 introduced the MemorySegment and related (FFM) APIs, and it is possible to immediately release native memory using these APIs.

As Netty 4.2 is now also based on Java 8, we have access to MethodHandles, which allows to craft efficient mechanisms for calling the FFM APIs, without requiring them to be available at compile time.

Modification:

The lifetime of a memory segment is tracked by the associated Arena instance, but it is not possible to obtain the Arena from a MemorySegment, nor from any ByteBuffer created from the MemorySegment. Therefor, we need to track the lifetime of both, together. The CleanableDirectBuffer interface is introduced for this purpose, and encapsulates both a ByteBuffer and its preferred mechanism for freeing it.

The Netty Cleaner API is then extended with a method for allocating direct buffers in the form of the CleanableDirectBuffer. Putting this in the cleaner allow us to switch the buffer allocation and releasing behaviour depending on the Java version, as we already have version specific implementations.

A new CleanerJava24 implementation is added, which – as mentioned – uses method handles to create and arena, allocate a memory segment and a byte buffer from it, and then encapsulate both in a CleanableDirectBuffer instance.

A new PlatformDependent.allocateDirect(int) method is added, which returns CleanableDirectBuffer instances depending on what Cleaner implementation we end up using.

A so-called Cleaner implementation is also added for the cases where we previously used PlatformDependent.allocateDirectNoCleaner(int). This means that the optimal mechanism for allocating and releasing buffers are now abstracted behind a single common API. This also allows us to remove all the hacks and conditional code-paths that relied on PlatformDependent.hasDirectBufferNoCleanerConstructor().

Replace all instances where we allocate direct buffers, including in our native transports, with the new CleanableDirectBuffer mechanism.

Finally, deprecate all the APIs that bypass the CleanableDirectBuffer API. We cannot remove these APIs, however, as many of them are public, and it would break compatibility.

Result:
We can once again have explicit control over direct buffer lifetimes on Java 24 and greater.

Furthermore, since the FFM APIs are stable, and aligned with the long-term direction of the Java platform, we likely won't have to worry about this functionality ever breaking again.

@chrisvest chrisvest requested a review from normanmaurer May 21, 2025 23:41
@chrisvest
Copy link
Copy Markdown
Member Author

This resolves the performance regression that #14943 causes, but only for Netty 4.2.

@chrisvest
Copy link
Copy Markdown
Member Author

We need to back port the memory segment/compression fix from main as well.

@chrisvest
Copy link
Copy Markdown
Member Author

The sun crypto provider is facing a similar problem to the compression algorithms, with getting the address off of closable shared memory segments:

java.lang.UnsupportedOperationException: ByteBuffer derived from closeable shared sessions not supported
	at java.base/java.nio.DirectByteBuffer.address(DirectByteBuffer.java:319)
	at java.base/com.sun.crypto.provider.GaloisCounterMode$GCMEngine.overlapDetection(GaloisCounterMode.java:919)
	at java.base/com.sun.crypto.provider.GaloisCounterMode$GCMDecrypt.doFinal(GaloisCounterMode.java:1575)
	at java.base/com.sun.crypto.provider.GaloisCounterMode.engineDoFinal(GaloisCounterMode.java:442)
	at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2558)
	at java.base/sun.security.ssl.SSLCipher$T12GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1638)
	at java.base/sun.security.ssl.SSLEngineInputRecord.decodeInputRecord(SSLEngineInputRecord.java:239)
	at java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:196)
	at java.base/sun.security.ssl.SSLEngineInputRecord.decode(SSLEngineInputRecord.java:159)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)

/cc @AlanBateman

@franz1981
Copy link
Copy Markdown
Contributor

Do you have data points for the performance regression (which is expected eh, just curious)?

@chrisvest
Copy link
Copy Markdown
Member Author

@franz1981 It impacts GC behaviour. Not doing anything to deallocate works fine until it suddenly doesn't, and your application starts to have a bad time. I don't have concrete numbers to share, but it was quite significant for some services.

@normanmaurer
Copy link
Copy Markdown
Member

Also its important to call out that it affects if we allocate direct or heap buffers by default (when you call buffer(...). Without unsafe we use heap buffers when then will affect the needed heap size and also will need copy for heap to direct memory.

@normanmaurer normanmaurer added this to the 4.2.2.Final milestone May 22, 2025
@chrisvest
Copy link
Copy Markdown
Member Author

Yeah, I should put an article on the wiki about this, so people have some guidance.

@normanmaurer
Copy link
Copy Markdown
Member

Yeah, I should put an article on the wiki about this, so people have some guidance.

We may also want to reconsider this in general... like if you use the pooled allocator we might still want to allocate direct by default even with no unsafe as hopefully the deallocation etc should be not as frequent

@AlanBateman
Copy link
Copy Markdown

The sun crypto provider is facing a similar problem to the compression algorithms, with getting the address off of closable shared memory segments:

There is work in progress in JDK-8357268 to fix this, and a few others that were missed.

@normanmaurer
Copy link
Copy Markdown
Member

Yeah, I should put an article on the wiki about this, so people have some guidance.

We may also want to reconsider this in general... like if you use the pooled allocator we might still want to allocate direct by default even with no unsafe as hopefully the deallocation etc should be not as frequent

See #15232

Copy link
Copy Markdown
Member

@normanmaurer normanmaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this ? can't we access it via the cleanable ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but it adds an indirection (and makes the change bigger)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just a bit concerned that we add just another field and add more overhead (memory).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the common case yeah, this does increase the object size by 8 bytes… but we don't always have a cleanable, so then we'd have to wrap it in a noop implementation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this interface is package private we can also just directly remove it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still gets used via Unpooled.wrapped(ByteBuffer). Maybe it shouldn't be deprecated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this be used ? It is package-private so can't be called directly ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you create a ByteBuf with Unpooled.wrapped(ByteBuffer), then we don't have a CleanableDirectBuffer instance. In that case, when releasing the ByteBuf, we go through this old code path.

@chrisvest
Copy link
Copy Markdown
Member Author

chrisvest commented May 23, 2025

@minborg We spotted another case for https://bugs.openjdk.org/browse/JDK-8357268 in sun.nio.ch.sctp.SctpChannelImpl:

java.lang.UnsupportedOperationException: ByteBuffer derived from closeable shared sessions not supported
	at java.base/java.nio.DirectByteBuffer.address(DirectByteBuffer.java:319)
	at jdk.sctp/sun.nio.ch.sctp.SctpChannelImpl.receiveIntoNativeBuffer(SctpChannelImpl.java:832)
	at jdk.sctp/sun.nio.ch.sctp.SctpChannelImpl.receive(SctpChannelImpl.java:806)
	at jdk.sctp/sun.nio.ch.sctp.SctpChannelImpl.receive(SctpChannelImpl.java:749)
	at jdk.sctp/sun.nio.ch.sctp.SctpChannelImpl.receive(SctpChannelImpl.java:711)
	at io.netty.channel.sctp.nio.NioSctpChannel.doReadMessages(NioSctpChannel.java:274)

Edit: Ah, I see you found it already: https://github.com/openjdk/jdk/pull/25324/files#diff-9daf1d5a1c92dc1df65dc8b63be53ca9379e7fb2fe5a4d31c2e469e05947d320

normanmaurer added a commit that referenced this pull request May 28, 2025
Motivation:

In our testsuite we often just wrapped a byte[] and passed it to the write method. While this works it also might end up not testing often with direct buffers. We should make this a bit more random and so cover direct and heap buffers. The idea popped up when reviewing #15231

Modifications:

- Add new static helper method that will either return a wrapped buffer or a direct buffers in a random fashion
- Use this method to allocate buffers

Result:

More random testing of heap / direct buffers
chrisvest pushed a commit that referenced this pull request May 28, 2025
…5281)

Motivation:

In our testsuite we often just wrapped a byte[] and passed it to the
write method. While this works it also might end up not testing often
with direct buffers. We should make this a bit more random and so cover
direct and heap buffers. The idea popped up when reviewing
#15231

Modifications:

- Add new static helper method that will either return a wrapped buffer
or a direct buffers in a random fashion
- Use this method to allocate buffers

Result:

More random testing of heap / direct buffers
chrisvest added 4 commits May 28, 2025 11:00
Motivation:
Netty's performance, particularly in very busy systems, relies heavily on the ability to immediately free native memory when it's no longer used.

Since time immemorial, we have been relying on `sun.misc.Unsafe`, and various hacks based upon it, for quickly releasing/cleaning direct ByteBuffers.

From Java 24 onwards, any use of `Unsafe` has started to produce warnings, and future versions will deny access entirely.
For that reason, we no longer rely on `Unsafe` by default (it can still be explicitly enabled) when running on Java 24 or greater.

This creates a large performance gap that needs to be closed.
Fortunately, Java 22 introduced the `MemorySegment` and related (FFM) APIs, and it is possible to immediately release native memory using these APIs.

As Netty 4.2 is now also based on Java 8, we have access to `MethodHandles`, which allows to craft efficient mechanisms for calling the FFM APIs, without requiring them to be available at compile time.

Modification:

1.
The lifetime of a memory segment is tracked by the associated `Arena` instance, but it is not possible to obtain the `Arena` from a `MemorySegment`, nor from any `ByteBuffer` created from the `MemorySegment`.
Therefor, we need to track the lifetime of both, together.
The `CleanableDirectBuffer` interface is introduced for this purpose, and encapsulates both a `ByteBuffer` and its preferred mechanism for freeing it.

2.
The _Netty_ `Cleaner` API is then extended with a method for allocating direct buffers in the form of the `CleanableDirectBuffer`.
Putting this in the cleaner allow us to switch the buffer allocation and releasing behaviour depending on the Java version, as we already have version specific implementations.

3.
A new `CleanerJava24` implementation is added, which – as mentioned – uses method handles to create and arena, allocate a memory segment and a byte buffer from it, and then encapsulate both in a `CleanableDirectBuffer` instance.

4.
A new `PlatformDependent.allocateDirect(int)` method is added, which returns `CleanableDirectBuffer` instances depending on what `Cleaner` implementation we end up using.

5.
A so-called `Cleaner` implementation is also added for the cases where we previously used `PlatformDependent.allocateDirectNoCleaner(int)`.
This means that the optimal mechanism for allocating and releasing buffers are now abstracted behind a single common API.
This also allows us to remove all the hacks and conditional code-paths that relied on `PlatformDependent.hasDirectBufferNoCleanerConstructor()`.

6.
Replace all instances where we allocate direct buffers, including in our native transports, with the new `CleanableDirectBuffer` mechanism.

7.
Finally, deprecate all the APIs that bypass the `CleanableDirectBuffer` API.
We cannot remove these APIs, however, as many of them are public, and it would break compatibility.

Result:
We can once again have explicit control over direct buffer lifetimes on Java 24 and greater.

Furthermore, since the FFM APIs are stable, and aligned with the long-term direction of the Java platform, we likely won't have to worry about this functionality ever breaking again.
Copy link
Copy Markdown
Member

@normanmaurer normanmaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there

chrisvest added a commit to chrisvest/netty that referenced this pull request May 30, 2025
Motivation:
In netty#14943 we disabled Unsafe by default on Java 24+, to avoid producing warnings from the JVM.
However, disabling the use of Unsafe also disables the hacks we use to access `ByteBuffer` cleaners to eagerly deallocate them.
This can have dire performance consequences, particularly for busy applications.

Disabling the warnings from Unsafe also did nothing to stem the warnings produced by our JNI code, so many systems still see warnings anyway.

We should instead revert the changed default, and just tolerate the warning in Netty 4.1.

In Netty 4.2 we have netty#15231 as a long-term fix, where we avoid using Unsafe but will have a way to eagerly release the memory of our native ByteBuffer instances.

Modification:
Change the default for our Unsafe usage to no longer be disabled on Java 24+.

Result:
More consistent performance across Java versions.
Copy link
Copy Markdown
Member

@normanmaurer normanmaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it! Great work

@chrisvest chrisvest merged commit 4dc0b3a into netty:4.2 May 30, 2025
14 of 15 checks passed
@chrisvest chrisvest deleted the 4.2-alloc-direct branch May 30, 2025 20:35
chrisvest added a commit that referenced this pull request May 30, 2025
See https://github.com/netty/netty/wiki/Java-24-and-sun.misc.Unsafe for
more information.

Motivation:
In #14943 we disabled Unsafe by
default on Java 24+, to avoid producing warnings from the JVM. However,
disabling the use of Unsafe also disables the hacks we use to access
`ByteBuffer` cleaners to eagerly deallocate them. This can have dire
performance consequences, particularly for busy applications.

Disabling the warnings from Unsafe also did nothing to stem the warnings
produced by our JNI code, so many systems still see warnings anyway.

We should instead revert the changed default, and just tolerate the
warning in Netty 4.1.

In Netty 4.2 we have #15231 as a
long-term fix, where we avoid using Unsafe but will have a way to
eagerly release the memory of our native ByteBuffer instances.

Modification:
Change the default for our Unsafe usage to no longer be disabled on Java
24+.

Result:
More consistent performance across Java versions.
@franz1981 franz1981 self-requested a review May 30, 2025 20:51
Copy link
Copy Markdown
Contributor

@franz1981 franz1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

chrisvest pushed a commit that referenced this pull request Jun 20, 2025
Motivation:
Currently, the preference if a direct buffer should be allocated by, for
example, `ByteBufAllocator.buffer()` or `ByteBufAllocator.ioBuffer()`
depends on the fact if unsafe is available. It seems like this was
originally introduced as only with unsafe it was possible to get the
cleaner of a ByteBuffer to clean it. However, with the changes
introduced in #15231, it doesn't make sense anymore to depend on this as
unsafe could be unavailable but MemorySegments will still be freeable.

Modification:
Switch the check for unsafe to a check that validates if the current
cleaner implementation is able to free the allocated buffers (each
implementation can do this except the no-op one). This ensures that
direct buffers are actually preferred when they're backed by something
cleanable and still unpreferred when unsafe is unavailable and buffers
cannot be freed in a controlled way.

Result:
Direct buffers will be preferred when they're cleanable in a controlled
way unrelated to the fact if unsafe is available.
normanmaurer added a commit that referenced this pull request Jun 23, 2025
…5281)

Motivation:

In our testsuite we often just wrapped a byte[] and passed it to the
write method. While this works it also might end up not testing often
with direct buffers. We should make this a bit more random and so cover
direct and heap buffers. The idea popped up when reviewing
#15231

Modifications:

- Add new static helper method that will either return a wrapped buffer
or a direct buffers in a random fashion
- Use this method to allocate buffers

Result:

More random testing of heap / direct buffers
normanmaurer added a commit that referenced this pull request Jun 24, 2025
#15395)

…5281)

Motivation:

In our testsuite we often just wrapped a byte[] and passed it to the
write method. While this works it also might end up not testing often
with direct buffers. We should make this a bit more random and so cover
direct and heap buffers. The idea popped up when reviewing
#15231

Modifications:

- Add new static helper method that will either return a wrapped buffer
or a direct buffers in a random fashion
- Use this method to allocate buffers

Result:

More random testing of heap / direct buffers
dongjoon-hyun added a commit to apache/spark-kubernetes-operator that referenced this pull request Mar 26, 2026
…ing to Netty change

### What changes were proposed in this pull request?

This PR removes `-Dio.netty.noUnsafe=true` from the default JVM args in `values.yaml`.

### Why are the changes needed?

Netty 4.2.2+ have a correct implementation instead of `Unsafe` usage for Java 25+ like the following instead of the legacy `Unsafe` operation.

- netty/netty#15231
- netty/netty#15338

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the existing tests.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-4-opus)

Closes #593 from dongjoon-hyun/SPARK-56229.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants