GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader by vibhatha · Pull Request #38371 · apache/arrow

vibhatha · 2023-10-20T04:43:20Z

Rationale for this change

This PR addresses #37841.

What changes are included in this PR?

Adding compression-based write and read for Dictionary data.

Are these changes tested?

Yes.

Are there any user-facing changes?

No

Closes: [Java] Dictionary decoding not using the compression factory from the ArrowReader #37841
GitHub Issue: [Java] Dictionary decoding not using the compression factory from the ArrowReader #37841

github-actions · 2023-10-20T04:43:49Z

⚠️ GitHub issue #37841 has been automatically assigned in GitHub to PR creator.

danepitkin

Great work, @vibhatha !

danepitkin · 2023-11-14T15:25:23Z

java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowWriter.java

Should these be marked private final and grouped with the other private final vars?

+1, way neater that way.

danepitkin · 2023-11-14T17:46:34Z

java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowWriter.java

Maybe add a helper function for creating a codec?

Can use getCodec() here, too!

danepitkin · 2023-11-14T18:15:25Z

...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java

Optional: Somewhat unrelated to the issue, but should we parameterize the tests to use all the different types of compression? See TestCompressionCodec.java as an example.

danepitkin · 2023-11-14T18:23:19Z

...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java

I think you can get rid of these in favor of this.allocator if you keep the @Before/@After functions

Yes, I agree. I refactored the code to use this approach.

danepitkin · 2023-11-14T18:25:33Z

...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java

If the @Before/@After functionality isn't used for all @Tests, it might be better to move this to a helper function instead.

Good point. I will update.

Should we use @BeforeEach so that the allocator and root is reset for each test? It might make the tests slower, but not sure if it's better to have a fresh allocator for each test.

Good point. I think that is better and safe.

This is unaddressed.

Btw @lidavidm

I updated the JUnit annotations for consistency and compatibility. The @beforeeach annotation from JUnit 5 was being used in conjunction with @test from JUnit 4, causing the setup method not to run as expected before each test.

Furthermore do we need to make the usage of JUnit consistent across tests?

Is this change okay?

danepitkin · 2023-11-14T18:36:31Z

...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java

Do you think we still need the original test function testArrowFileZstdRoundTrip? Is this new test case possibly testing the same code path + dictionaries?

It is the same path + dictionaries, I refactored and reorganized the test cases, does it make sense or useful? I think my previous code had duplication.

vibhatha · 2023-11-15T05:10:11Z

@danepitkin Thanks a lot for the review comments, I will address them.

danepitkin

Great work! The tests look very clean now. left a few small additional comments.

danepitkin · 2023-11-21T19:26:12Z

java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowWriter.java

Can use getCodec() here, too!

...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java

danepitkin · 2023-11-21T19:29:09Z

...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java

Should we use @BeforeEach so that the allocator and root is reset for each test? It might make the tests slower, but not sure if it's better to have a fresh allocator for each test.

danepitkin · 2023-11-21T20:13:47Z

...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java

optional nit: I think the readArrowFile() and readArrowStream() functions are not needed. While they improve code duplication, I think they also decrease code readability. Personally, I like to quickly see what is asserted in the test functions themselves. I do like how you've separated out other functionality into their own functions like createAndWriteArrowFile().

I had doubts after the cleanup 😄
Let me update the PR.

vibhatha · 2023-11-22T08:09:55Z

@danepitkin I am not sure if this one is practical though 🤔

danepitkin · 2023-11-22T15:58:51Z

@danepitkin I am not sure if this one is practical though 🤔

I'm okay with leaving it as-is if there are issues with using @BeforeEach instead of @Before.

Overall, LGTM! I think it's ready for final review from a committer. Excellent job.

vibhatha · 2023-11-22T23:39:21Z

@lidavidm appreciate your feedback.

...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java

lidavidm · 2023-11-27T18:26:00Z

java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowWriter.java

It seems we should be able to initialize the codec once and reuse it in the constructor, rather than add all the new fields?

@lidavidm an issue was filed here: https://github.com/apache/arrow/issues/39222
Also, I can work on this.

Again: why did we pull this out? It's called only once. Why are we adding a bunch of new fields? We don't use them.

vibhatha · 2023-12-13T11:57:00Z

@lidavidm I updated the PR. Appreciate another round of reviews.

lidavidm

@vibhatha can you file a followup to add a new integration test to cover this scenario?

lidavidm · 2023-12-13T14:17:20Z

java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileWriter.java

Isn't this removing a public constructor?

@lidavidm for my clarification.

Maybe I have misunderstood your comment here.

I thought it would be rather cleaner to pass the CompressionCodec rather than passing all the other parameters which make this object.

Did you mean something else?

You can delegate constructors without removing public ones (which breaks API).

@lidavidm does the updated change make sense?

java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowStreamWriter.java

java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowWriter.java

lidavidm · 2023-12-13T14:19:10Z

...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java

This is unaddressed.

This reverts commit 4eb1836ab3bb5a5170fb1e1804cd3cbd71c81a20.

vibhatha · 2024-02-01T15:42:33Z

@github-actions crossbow submit java

github-actions · 2024-02-01T15:45:12Z

Revision: 907195a

Submitted crossbow builds: ursacomputing/crossbow @ actions-27dab1a4d2

Task	Status
java-jars
verify-rc-source-java-linux-almalinux-8-amd64
verify-rc-source-java-linux-conda-latest-amd64
verify-rc-source-java-linux-ubuntu-20.04-amd64
verify-rc-source-java-linux-ubuntu-22.04-amd64
verify-rc-source-java-macos-amd64

conbench-apache-arrow · 2024-02-02T10:34:13Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit f9b7ac2.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

…factory from the ArrowReader (apache#38371) ### Rationale for this change This PR addresses apache#37841. ### What changes are included in this PR? Adding compression-based write and read for Dictionary data. ### Are these changes tested? Yes. ### Are there any user-facing changes? No * Closes: apache#37841 Lead-authored-by: Vibhatha Lakmal Abeykoon <vibhatha@gmail.com> Co-authored-by: vibhatha <vibhatha@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>

github-actions bot added Component: Java awaiting review Awaiting review labels Oct 20, 2023

vibhatha marked this pull request as ready for review November 1, 2023 08:16

vibhatha requested a review from lidavidm as a code owner November 1, 2023 08:16

vibhatha marked this pull request as draft November 1, 2023 08:16

vibhatha marked this pull request as ready for review November 14, 2023 16:58

danepitkin reviewed Nov 14, 2023

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 14, 2023

danepitkin reviewed Nov 21, 2023

View reviewed changes

lidavidm reviewed Nov 27, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Nov 27, 2023

vibhatha force-pushed the gh-37841 branch from 82484f4 to f62277c Compare December 13, 2023 09:41

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 13, 2023

vibhatha requested a review from lidavidm December 13, 2023 11:16

lidavidm reviewed Dec 13, 2023

View reviewed changes

github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes awaiting committer review Awaiting committer review and removed awaiting change review Awaiting change review awaiting review Awaiting review awaiting changes Awaiting changes labels Dec 13, 2023

vibhatha added 7 commits February 1, 2024 17:00

fix: adding a method to get codecs

796f62d

fix: address reviews

90c01aa

fix: addressing reviews

253daef

Revert "fix: addressing reviews"

f4615e1

This reverts commit 4eb1836ab3bb5a5170fb1e1804cd3cbd71c81a20.

fix: reviews v2

3bce128

fix: reviews v2

d09533c

feat: minor change

184d827

vibhatha force-pushed the gh-37841 branch from 81ec17b to 184d827 Compare February 1, 2024 11:31

github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Feb 1, 2024

fix: address reviews

907195a

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 1, 2024

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 1, 2024

fix: address reviews v2

8f482da

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 1, 2024

lidavidm approved these changes Feb 1, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Feb 1, 2024

lidavidm merged commit f9b7ac2 into apache:main Feb 1, 2024

lidavidm removed the awaiting merge Awaiting merge label Feb 1, 2024

Conversation

vibhatha commented Oct 20, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Oct 20, 2023

Uh oh!

danepitkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vibhatha Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vibhatha commented Nov 15, 2023

Uh oh!

danepitkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vibhatha commented Nov 22, 2023

Uh oh!

danepitkin commented Nov 22, 2023

Uh oh!

vibhatha commented Nov 22, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vibhatha commented Dec 13, 2023

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

vibhatha commented Oct 20, 2023 •

edited by github-actions bot

Loading

vibhatha Dec 14, 2023 •

edited

Loading