GH-40039: [Java][FlightRPC] Improve performance by removing unnecessary memory copies by tolmalev · Pull Request #40042 · apache/arrow

tolmalev · 2024-02-12T07:47:05Z

Rationale for this change

Described in details in the issue: #40039

Summary: class ArrowMessage uses CompositeByteBuf to avoid memory copies but maxNumComponents for it is calculated incorrectly and as a result memory copies are still performed which significantly affects the performance of the server.

What changes are included in this PR?

Changing maxNumComponents to Integer.MAX_VALUE because we never want to silently merge large buffers into one.

User can set useZeroCopy=false (default) and then the library will copy data into a new buffer before sending it to Netty for write.

Are these changes tested?

TestPerf: 30% throughput boost

BEFORE
Transferred 100000000 records totaling 3200000000 bytes at 877.812629 MiB/s. 28764164.218015 record/s. 7024.784185 batch/s.

AFTER
Transferred 100000000 records totaling 3200000000 bytes at 1145.333893 MiB/s. 37530301.022096 record/s. 9165.650116 batch/s.

Also tested with a simple client-server application and I saw even more significant performance boost if padding isn't needed.

Two tests with zero-copy set to true:
50 batches, 30 columns (Int32), 199999 rows in each batch

before change: throughput ~25Gbit/s (memory copy happens in grpc-nio-worker-ELG-*)
after change: throughput ~32Gbit/s (20% boost)

50 batches, 30 columns (Int32), 200k rows in each batch

before change: throughput ~15Gbit/s (much slower than with 199999 because memory copy happens in flight-server-default-executor-* thread and blocks server from writing next batch.
after change: throughput ~32Gbit/s (115% boost)

Closes: [Java][FlightRpc] server zero-copy doesn't work if padding buffers are needed to serialise response #40039

… memory copies

github-actions · 2024-02-12T07:47:28Z

⚠️ GitHub issue #40039 has been automatically assigned in GitHub to PR creator.

lidavidm

Great find, thank you for the fix & the explanation!

conbench-apache-arrow · 2024-02-12T19:08:20Z

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 66351e3.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

…ecessary memory copies (apache#40042) ### Rationale for this change Described in details in the issue: apache#40039 Summary: class ArrowMessage uses CompositeByteBuf to avoid memory copies but `maxNumComponents` for it is calculated incorrectly and as a result memory copies are still performed which significantly affects the performance of the server. ### What changes are included in this PR? Changing maxNumComponents to `Integer.MAX_VALUE` because we never want to silently merge large buffers into one. User can set useZeroCopy=false (default) and then the library will copy data into a new buffer before sending it to Netty for write. ### Are these changes tested? **TestPerf: 30% throughput boost** ``` BEFORE Transferred 100000000 records totaling 3200000000 bytes at 877.812629 MiB/s. 28764164.218015 record/s. 7024.784185 batch/s. AFTER Transferred 100000000 records totaling 3200000000 bytes at 1145.333893 MiB/s. 37530301.022096 record/s. 9165.650116 batch/s. ``` Also tested with a simple client-server application and I saw even more significant performance boost if padding isn't needed. Two tests with zero-copy set to true: **50 batches, 30 columns (Int32), 199999 rows in each batch** - before change: throughput ~25Gbit/s (memory copy happens in `grpc-nio-worker-ELG-*`) - after change: throughput ~32Gbit/s (20% boost) **50 batches, 30 columns (Int32), 200k rows in each batch** - before change: throughput ~15Gbit/s (much slower than with 199999 because memory copy happens in `flight-server-default-executor-*` thread and blocks server from writing next batch. - after change: throughput ~32Gbit/s (**115% boost**) * Closes: apache#40039 Authored-by: Lev Tolmachev <lev.tolmachev@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>

[Java][FlightRpc][server] Improve performance by removing unnecessary…

d56b5d4

… memory copies

tolmalev requested a review from lidavidm as a code owner February 12, 2024 07:47

github-actions bot added Component: Java awaiting review Awaiting review labels Feb 12, 2024

lidavidm approved these changes Feb 12, 2024

View reviewed changes

lidavidm changed the title ~~GH-40039: [Java][FlightRpc] Improve performance by removing unnecessary memory copies~~ GH-40039: [Java][FlightRPC] Improve performance by removing unnecessary memory copies Feb 12, 2024

github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Feb 12, 2024

lidavidm merged commit 66351e3 into apache:main Feb 12, 2024

lidavidm removed the awaiting merge Awaiting merge label Feb 12, 2024

tolmalev mentioned this pull request Feb 12, 2024

[C++][Java] Arrow Flight C++/Java performance comparison #13980

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-40039: [Java][FlightRPC] Improve performance by removing unnecessary memory copies#40042

GH-40039: [Java][FlightRPC] Improve performance by removing unnecessary memory copies#40042
lidavidm merged 1 commit intoapache:mainfrom
tolmalev:main

tolmalev commented Feb 12, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Feb 12, 2024

Uh oh!

lidavidm left a comment

Uh oh!

conbench-apache-arrow bot commented Feb 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tolmalev commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Uh oh!

github-actions bot commented Feb 12, 2024

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

conbench-apache-arrow bot commented Feb 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tolmalev commented Feb 12, 2024 •

edited

Loading