GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader#38371
GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader#38371lidavidm merged 12 commits intoapache:mainfrom
Conversation
|
|
There was a problem hiding this comment.
Should these be marked private final and grouped with the other private final vars?
There was a problem hiding this comment.
+1, way neater that way.
There was a problem hiding this comment.
Maybe add a helper function for creating a codec?
There was a problem hiding this comment.
Can use getCodec() here, too!
There was a problem hiding this comment.
Optional: Somewhat unrelated to the issue, but should we parameterize the tests to use all the different types of compression? See TestCompressionCodec.java as an example.
There was a problem hiding this comment.
I think you can get rid of these in favor of this.allocator if you keep the @Before/@After functions
There was a problem hiding this comment.
Yes, I agree. I refactored the code to use this approach.
There was a problem hiding this comment.
If the @Before/@After functionality isn't used for all @Tests, it might be better to move this to a helper function instead.
There was a problem hiding this comment.
Good point. I will update.
There was a problem hiding this comment.
Should we use @BeforeEach so that the allocator and root is reset for each test? It might make the tests slower, but not sure if it's better to have a fresh allocator for each test.
There was a problem hiding this comment.
Good point. I think that is better and safe.
There was a problem hiding this comment.
Btw @lidavidm
I updated the JUnit annotations for consistency and compatibility. The @beforeeach annotation from JUnit 5 was being used in conjunction with @test from JUnit 4, causing the setup method not to run as expected before each test.
Furthermore do we need to make the usage of JUnit consistent across tests?
There was a problem hiding this comment.
Is this change okay?
There was a problem hiding this comment.
Do you think we still need the original test function testArrowFileZstdRoundTrip? Is this new test case possibly testing the same code path + dictionaries?
There was a problem hiding this comment.
It is the same path + dictionaries, I refactored and reorganized the test cases, does it make sense or useful? I think my previous code had duplication.
|
@danepitkin Thanks a lot for the review comments, I will address them. |
danepitkin
left a comment
There was a problem hiding this comment.
Great work! The tests look very clean now. left a few small additional comments.
There was a problem hiding this comment.
Can use getCodec() here, too!
...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Should we use @BeforeEach so that the allocator and root is reset for each test? It might make the tests slower, but not sure if it's better to have a fresh allocator for each test.
There was a problem hiding this comment.
optional nit: I think the readArrowFile() and readArrowStream() functions are not needed. While they improve code duplication, I think they also decrease code readability. Personally, I like to quickly see what is asserted in the test functions themselves. I do like how you've separated out other functionality into their own functions like createAndWriteArrowFile().
There was a problem hiding this comment.
I had doubts after the cleanup 😄
Let me update the PR.
|
@danepitkin I am not sure if this one is practical though 🤔 |
I'm okay with leaving it as-is if there are issues with using Overall, LGTM! I think it's ready for final review from a committer. Excellent job. |
|
@lidavidm appreciate your feedback. |
...ression/src/test/java/org/apache/arrow/compression/TestArrowReaderWriterWithCompression.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
It seems we should be able to initialize the codec once and reuse it in the constructor, rather than add all the new fields?
There was a problem hiding this comment.
@lidavidm an issue was filed here: https://github.com/apache/arrow/issues/39222
Also, I can work on this.
There was a problem hiding this comment.
Again: why did we pull this out? It's called only once. Why are we adding a bunch of new fields? We don't use them.
|
@lidavidm I updated the PR. Appreciate another round of reviews. |
There was a problem hiding this comment.
Isn't this removing a public constructor?
There was a problem hiding this comment.
You can delegate constructors without removing public ones (which breaks API).
There was a problem hiding this comment.
@lidavidm does the updated change make sense?
java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowStreamWriter.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowWriter.java
Outdated
Show resolved
Hide resolved
This reverts commit 4eb1836ab3bb5a5170fb1e1804cd3cbd71c81a20.
|
@github-actions crossbow submit java |
|
Revision: 907195a Submitted crossbow builds: ursacomputing/crossbow @ actions-27dab1a4d2 |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit f9b7ac2. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
…factory from the ArrowReader (apache#38371) ### Rationale for this change This PR addresses apache#37841. ### What changes are included in this PR? Adding compression-based write and read for Dictionary data. ### Are these changes tested? Yes. ### Are there any user-facing changes? No * Closes: apache#37841 Lead-authored-by: Vibhatha Lakmal Abeykoon <vibhatha@gmail.com> Co-authored-by: vibhatha <vibhatha@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
…factory from the ArrowReader (apache#38371) ### Rationale for this change This PR addresses apache#37841. ### What changes are included in this PR? Adding compression-based write and read for Dictionary data. ### Are these changes tested? Yes. ### Are there any user-facing changes? No * Closes: apache#37841 Lead-authored-by: Vibhatha Lakmal Abeykoon <vibhatha@gmail.com> Co-authored-by: vibhatha <vibhatha@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
Rationale for this change
This PR addresses #37841.
What changes are included in this PR?
Adding compression-based write and read for Dictionary data.
Are these changes tested?
Yes.
Are there any user-facing changes?
No