Skip to content

Conversation

@lhotari
Copy link
Contributor

@lhotari lhotari commented Jun 3, 2024

Motivation:

The issue #12919 is titled "ByteBuf slice duplicates are not slices". This behavior is very surprising. It explains why the SslHandler has been causing issues in Apache Pulsar. The incoming messages come from the network and use the LengthFieldBasedFrameDecoder, which slices the buffer into frame buffers. Pulsar later calls retainedDuplicate. This removes the effect of slicing (as described in issue #12919) and makes the buffer writable. The impact of this is that the original buffer used by the LengthFieldBasedFrameDecoder gets mutated by the SslHandler later when the sliced frame buffer (which is made writable with .retainedDuplicate) is passed to the Apache Bookkeeper client, which also uses Netty and contains an SslHandler. The SslHandler's cumulation in the attemptCopyToCumulation method thinks that it's fine to mutate the input buffer since it's writable and that's what causes data corruption.

Issue #12919 references the vert.x issue eclipse-vertx/vert.x#4509. It seems vert.x applied a workaround in eclipse-vertx/vert.x#4517.

I also found a workaround for a data corruption issue within Netty's http codecs, #11802. That was to address #11792.

Such workarounds won't be needed when we eliminate the surprising behaviour. The PR will address it for Sslhandler and also optimise the cumulation. I think #12919 should also be addressed so that a duplicating a slice doesn't provide write access to the parent buffer outside of the slice.

Modification:

For sliced buffers, make the duplicate() method duplicate the slice and not the underlying buffer.
When I was working on the changes, I noticed that io.netty.buffer.AdaptivePoolingAllocator.PooledNonRetainedSlicedByteBuf had already implemented duplicate so that it duplicates the slice and not the underlying buffer:

public ByteBuf duplicate() {
ensureAccessible();
return new PooledNonRetainedSlicedByteBuf(referenceCountDelegate, unwrap(), idx(0), capacity())
.setIndex(readerIndex(), writerIndex());
}

Result:

Fixes #12919.

If there is no issue then describe the changes introduced by this PR.

Copy link
Member

@normanmaurer normanmaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

@lhotari
Copy link
Contributor Author

lhotari commented Jun 3, 2024

Reported a flaky test #14097 for the current build failure. It took some time to find out that finding the error or failure can be achieved by downloading the zip file, extracting it and then running find -name "*.xml" |xargs egrep '<error|<failure'.

@franz1981
Copy link
Contributor

@vietj wdyt? it looks good to me

Copy link
Contributor

@franz1981 franz1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me

@chrisvest chrisvest added this to the 4.1.111.Final milestone Jun 3, 2024
@chrisvest chrisvest merged commit e9caa97 into netty:4.1 Jun 3, 2024
chrisvest pushed a commit that referenced this pull request Jun 3, 2024
…e so that it's not writable (#14093)

Motivation:

The issue #12919 is titled "ByteBuf
slice duplicates are not slices". This behavior is very surprising. It
explains why the SslHandler has been causing issues in Apache Pulsar.
The incoming messages come from the network and use the
LengthFieldBasedFrameDecoder, which slices the buffer into frame
buffers. Pulsar later calls retainedDuplicate. This removes the effect
of slicing (as described in issue
#12919) and makes the buffer
writable. The impact of this is that the original buffer used by the
LengthFieldBasedFrameDecoder gets mutated by the SslHandler later when
the sliced frame buffer (which is made writable with .retainedDuplicate)
is passed to the Apache Bookkeeper client, which also uses Netty and
contains an SslHandler. The SslHandler's cumulation in the
attemptCopyToCumulation method thinks that it's fine to mutate the input
buffer since it's writable and that's what causes data corruption.

Issue #12919 references the vert.x
issue eclipse-vertx/vert.x#4509. It seems
vert.x applied a workaround in
eclipse-vertx/vert.x#4517.

I also found a workaround for a data corruption issue within Netty's
http codecs, #11802. That was to
address #11792.

Such workarounds won't be needed when we eliminate the surprising
behaviour. The PR will address it for Sslhandler and also optimise the
cumulation. I think #12919 should
also be addressed so that a duplicating a slice doesn't provide write
access to the parent buffer outside of the slice.

Modification:

For sliced buffers, make the `duplicate()` method duplicate the slice
and not the underlying buffer.
When I was working on the changes, I noticed that
`io.netty.buffer.AdaptivePoolingAllocator.PooledNonRetainedSlicedByteBuf`
had already implemented `duplicate` so that it duplicates the slice and
not the underlying buffer:

https://github.com/netty/netty/blob/ecebe1c102cf9a9e2faaccbfc12f594963af7326/buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java#L1238-L1242

Result:

Fixes  #12919.

If there is no issue then describe the changes introduced by this PR.
hezhangjian pushed a commit to apache/bookkeeper that referenced this pull request Jun 13, 2024
…4426)

### Motivation

[Netty 4.1.111.Final](https://netty.io/news/2024/06/11/4-1-111-Final.html) contains important fixes:
- netty/netty#14086
- netty/netty#14093

On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS.

### Changes

- Upgrade Netty to 4.1.111.Final
- Switch to use grpc-netty-shaded instead of grpc-netty
  - grpc-java is not compatible with Netty 4.1.111.Final. [grpc-java supports only specific Netty versions](https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty). The solution is to use grpc-netty-shaded instead of grpc-netty dependency.
- Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded. 
  - jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution.
  - this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.
lhotari added a commit to apache/bookkeeper that referenced this pull request Jun 13, 2024
…4426)

[Netty 4.1.111.Final](https://netty.io/news/2024/06/11/4-1-111-Final.html) contains important fixes:
- netty/netty#14086
- netty/netty#14093

On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS.

- Upgrade Netty to 4.1.111.Final
- Switch to use grpc-netty-shaded instead of grpc-netty
  - grpc-java is not compatible with Netty 4.1.111.Final. [grpc-java supports only specific Netty versions](https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty). The solution is to use grpc-netty-shaded instead of grpc-netty dependency.
- Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded.
  - jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution.
  - this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.

(cherry picked from commit a95f698)
lhotari added a commit to apache/bookkeeper that referenced this pull request Jun 13, 2024
…4426)

[Netty 4.1.111.Final](https://netty.io/news/2024/06/11/4-1-111-Final.html) contains important fixes:
- netty/netty#14086
- netty/netty#14093

On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS.

- Upgrade Netty to 4.1.111.Final
- Switch to use grpc-netty-shaded instead of grpc-netty
  - grpc-java is not compatible with Netty 4.1.111.Final. [grpc-java supports only specific Netty versions](https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty). The solution is to use grpc-netty-shaded instead of grpc-netty dependency.
- Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded.
  - jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution.
  - this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.

(cherry picked from commit a95f698)
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
…pache#4426)

### Motivation

[Netty 4.1.111.Final](https://netty.io/news/2024/06/11/4-1-111-Final.html) contains important fixes:
- netty/netty#14086
- netty/netty#14093

On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS.

### Changes

- Upgrade Netty to 4.1.111.Final
- Switch to use grpc-netty-shaded instead of grpc-netty
  - grpc-java is not compatible with Netty 4.1.111.Final. [grpc-java supports only specific Netty versions](https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty). The solution is to use grpc-netty-shaded instead of grpc-netty dependency.
- Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded. 
  - jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution.
  - this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ByteBuf slice duplicates are not slices

4 participants