Skip to content

Do not mandate direct bytes in SslHandler queue#9656

Merged
normanmaurer merged 2 commits intonetty:4.1from
Tim-Brooks:do_not_use_direct_buffer
Oct 12, 2019
Merged

Do not mandate direct bytes in SslHandler queue#9656
normanmaurer merged 2 commits intonetty:4.1from
Tim-Brooks:do_not_use_direct_buffer

Conversation

@Tim-Brooks
Copy link
Copy Markdown
Contributor

Motivation:

Currently when the SslHandler coalesces outbound bytes it always
allocates a direct byte buffer. This does not make sense if the JDK
engine is being used as the bytes will have to be copied back to heap
bytes for the engine to operate on them.

Modifications:

Inspect engine type when coalescing outbound bytes and allocate heap
buffer if heap bytes are preferred by the engine.

Result:

Improved performance for JDK engine. Better performance in environments
without direct buffer pooling.

Motivation:

Currently when the SslHandler coalesces outbound bytes it always
allocates a direct byte buffer. This does not make sense if the JDK
engine is being used as the bytes will have to be copied back to heap
bytes for the engine to operate on them.

Modifications:

Inspect engine type when coalescing outbound bytes and allocate heap
buffer if heap bytes are preferred by the engine.

Result:

Improved performance for JDK engine. Better performance in environments
without direct buffer pooling.
@netty-bot
Copy link
Copy Markdown

Can one of the admins verify this patch?

@normanmaurer
Copy link
Copy Markdown
Member

@netty-bot test this please

@johnou
Copy link
Copy Markdown
Contributor

johnou commented Oct 12, 2019

@tbrooks8 out of interest, do you have any benchmark results you can share?

@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

@johnou unfortunately not. The reason this came to my attention was not because of an obvious performance regression, but because I was seeing direct buffers being allocated in an environment without direct buffer pooling enabled. Netty normal does its best to ensure pooled direct buffers are enabled before using them.

As I investigated more I saw this was only happening in the SslHandler. The reasoning behind my decision to always use a heap buffer for the JDK engine, is that if you pass the JDK engine a direct buffer it will internally allocate a byte array and copy the data to that for wrap. I just presume that is worse than using a heap buffer in the first place.

@normanmaurer normanmaurer self-assigned this Oct 12, 2019
@normanmaurer normanmaurer added this to the 4.1.43.Final milestone Oct 12, 2019
@normanmaurer normanmaurer merged commit ec8d43c into netty:4.1 Oct 12, 2019
@normanmaurer
Copy link
Copy Markdown
Member

@tbrooks8 great catch!

normanmaurer pushed a commit that referenced this pull request Oct 12, 2019
Motivation:

Currently when the SslHandler coalesces outbound bytes it always
allocates a direct byte buffer. This does not make sense if the JDK
engine is being used as the bytes will have to be copied back to heap
bytes for the engine to operate on them.

Modifications:

Inspect engine type when coalescing outbound bytes and allocate heap
buffer if heap bytes are preferred by the engine.

Result:

Improved performance for JDK engine. Better performance in environments
without direct buffer pooling.
@normanmaurer
Copy link
Copy Markdown
Member

@tbrooks8 btw are you sure the copy to heap byte array is still a thing in java11 and later ?

@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

@normanmaurer Obviously it is pretty complicated to read through the JDK code with all the indirection. But here is what I currently see (JDK 12):

The SSLEngine will eventually call to Cipher#doFinal(ByteBuffer input, ByteBuffer output) or Ciper#update(ByteBuffer input, ByteBuffer output).

    public final int doFinal(ByteBuffer input, ByteBuffer output)
            throws ShortBufferException, IllegalBlockSizeException,
            BadPaddingException {
        checkCipherState();

        if ((input == null) || (output == null)) {
            throw new IllegalArgumentException("Buffers must not be null");
        }
        if (input == output) {
            throw new IllegalArgumentException("Input and output buffers must "
                + "not be the same object, consider using buffer.duplicate()");
        }
        if (output.isReadOnly()) {
            throw new ReadOnlyBufferException();
        }

        chooseFirstProvider();
        return spi.engineDoFinal(input, output);
    }

This calls The CipherSpi:

protected int engineDoFinal(ByteBuffer input, ByteBuffer output)
            throws ShortBufferException, IllegalBlockSizeException,
            BadPaddingException {
        return bufferCrypt(input, output, false);
    }

bufferCrypt does the copying.

    /**
     * Implementation for encryption using ByteBuffers. Used for both
     * engineUpdate() and engineDoFinal().
     */
    private int bufferCrypt(ByteBuffer input, ByteBuffer output,
            boolean isUpdate) throws ShortBufferException,
            IllegalBlockSizeException, BadPaddingException {
        if ((input == null) || (output == null)) {
            throw new NullPointerException
                ("Input and output buffers must not be null");
        }
        int inPos = input.position();
        int inLimit = input.limit();
        int inLen = inLimit - inPos;
        if (isUpdate && (inLen == 0)) {
            return 0;
        }
        int outLenNeeded = engineGetOutputSize(inLen);

        if (output.remaining() < outLenNeeded) {
            throw new ShortBufferException("Need at least " + outLenNeeded
                + " bytes of space in output buffer");
        }

        boolean a1 = input.hasArray();
        boolean a2 = output.hasArray();
        int total = 0;
        byte[] inArray, outArray;
        if (a2) { // output has an accessible byte[]
            outArray = output.array();
            int outPos = output.position();
            int outOfs = output.arrayOffset() + outPos;

            if (a1) { // input also has an accessible byte[]
                inArray = input.array();
                int inOfs = input.arrayOffset() + inPos;
                if (isUpdate) {
                    total = engineUpdate(inArray, inOfs, inLen, outArray, outOfs);
                } else {
                    total = engineDoFinal(inArray, inOfs, inLen, outArray, outOfs);
                }
                input.position(inLimit);
            } else { // input does not have accessible byte[]
                inArray = new byte[getTempArraySize(inLen)];
                do {
                    int chunk = Math.min(inLen, inArray.length);
                    if (chunk > 0) {
                        input.get(inArray, 0, chunk);
                    }
                    int n;
                    if (isUpdate || (inLen > chunk)) {
                        n = engineUpdate(inArray, 0, chunk, outArray, outOfs);
                    } else {
                        n = engineDoFinal(inArray, 0, chunk, outArray, outOfs);
                    }
                    total += n;
                    outOfs += n;
                    inLen -= chunk;
                } while (inLen > 0);
            }
            output.position(outPos + total);
        } else { // output does not have an accessible byte[]
            if (a1) { // but input has an accessible byte[]
                inArray = input.array();
                int inOfs = input.arrayOffset() + inPos;
                if (isUpdate) {
                    outArray = engineUpdate(inArray, inOfs, inLen);
                } else {
                    outArray = engineDoFinal(inArray, inOfs, inLen);
                }
                input.position(inLimit);
                if (outArray != null && outArray.length != 0) {
                    output.put(outArray);
                    total = outArray.length;
                }
            } else { // input also does not have an accessible byte[]
                inArray = new byte[getTempArraySize(inLen)];
                do {
                    int chunk = Math.min(inLen, inArray.length);
                    if (chunk > 0) {
                        input.get(inArray, 0, chunk);
                    }
                    int n;
                    if (isUpdate || (inLen > chunk)) {
                        outArray = engineUpdate(inArray, 0, chunk);
                    } else {
                        outArray = engineDoFinal(inArray, 0, chunk);
                    }
                    if (outArray != null && outArray.length != 0) {
                        output.put(outArray);
                        total += outArray.length;
                    }
                    inLen -= chunk;
                } while (inLen > 0);
            }
        }
        return total;
    }

It looks to me that the P11Cipher overrides engineDoFinal and engineUpdate with implementations that do support direct byte buffers. But that is the only Cipher. For example ChaCha20Cipher and AESCipher appear to only operate on byte arrays to me.

So I think my understanding is correct. And I have seen in debugging allocations happening when working with AES. But I can investigate more this week if you think there is something I am missing.

@carl-mastrangelo
Copy link
Copy Markdown
Member

Slightly related, but Conscrypt had a very similar issue a few years back, which never ended up getting resolved: google/conscrypt#317

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants