Describe the bug, including details regarding any error messages, version, and platform.
ArrowBufRetainingCompositeByteBuf isn't supposed to copy data into new Netty buffers. To make it work it extends CompositeByteBuf and passes existing Arrow buffers as components.
But CompositeByteBuf constructors accepts two parameters: max count of components and list of components (buffers) and if count of buffers is above maxNumComponents it will do consolidation and merge some buffers into a new buffer.
ArrowBufRetainingCompositeByteBuf passes maxNumComponents=backingBuffers.size() + 1 and not buffers.size() + 1. When padding is used, buffers will have additional byte buffers for padding and as a result buffers.size() > backingBuffers.size() + 1.
As a result zero-copy doesn't work and a new copy of data is created by CompositeByteBuf.consolidateIfNeeded().
Fun fact: I found this when I was trying to debug why simple client-server benchmark works exactly 2x times faster when result has 199999 rows than when it has 200000 rows. Number of columns didn't matter, only the number of rows.
Fun fact 2: This is zero-copy version that works slower, not the version that does additional memory copy. If I remove listener.setUseZeroCopy(true); from producer implementation, both versions start showing same results.
Component(s)
FlightRPC, Java
Describe the bug, including details regarding any error messages, version, and platform.
ArrowBufRetainingCompositeByteBuf isn't supposed to copy data into new Netty buffers. To make it work it extends CompositeByteBuf and passes existing Arrow buffers as components.
But CompositeByteBuf constructors accepts two parameters: max count of components and list of components (buffers) and if count of buffers is above
maxNumComponentsit will do consolidation and merge some buffers into a new buffer.ArrowBufRetainingCompositeByteBuf passes
maxNumComponents=backingBuffers.size() + 1and notbuffers.size() + 1. When padding is used, buffers will have additional byte buffers for padding and as a resultbuffers.size() > backingBuffers.size() + 1.As a result zero-copy doesn't work and a new copy of data is created by
CompositeByteBuf.consolidateIfNeeded().Fun fact: I found this when I was trying to debug why simple client-server benchmark works exactly 2x times faster when result has 199999 rows than when it has 200000 rows. Number of columns didn't matter, only the number of rows.
Fun fact 2: This is zero-copy version that works slower, not the version that does additional memory copy. If I remove
listener.setUseZeroCopy(true);from producer implementation, both versions start showing same results.Component(s)
FlightRPC, Java