-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-5861: [Java] Initial implement to convert Avro record with primitive types #4812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #4812 +/- ##
==========================================
+ Coverage 87.42% 89.58% +2.16%
==========================================
Files 994 661 -333
Lines 140102 96617 -43485
Branches 1418 0 -1418
==========================================
- Hits 122481 86557 -35924
+ Misses 17259 10060 -7199
+ Partials 362 0 -362Continue to review full report at Codecov.
|
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrow.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroStringConsumer.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/test/java/org/apache/arrow/AvroToArrowTest.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrow.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
|
|
||
| allocateVectors(root, DEFAULT_BUFFER_SIZE); | ||
|
|
||
| List<Consumer> consumers = createAvroConsumers(root); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like we might need to understand more about decoders and the order they are expected to [decode(https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/io/ValidatingDecoder.html) in. This can also be for a follow-up PR
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public void consume(Decoder decoder) throws IOException { | ||
| VarBinaryHolder holder = new VarBinaryHolder(); | ||
| ByteBuffer byteBuffer = decoder.readBytes(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably keep a cached byte buffer for values below a certain size.
Even better would be if we could create a ByteBuffer that wraps the Vector and puts bytes directly where they need to go. This can be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we address the first part of my comment (caching a ByteBuffer instead of creating a new one each time)?
Otherwise LGTM as start. @tianchen92
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will create a new JIRA to address this, thanks!
emkornfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this looks a lot closer to what I was expecting. It still seems like there might be some additional details we need to learn about the AVRO API but I think this can be done as a follow-up
Many thanks for your feedback :), revise would be provided later. |
|
@emkornfield Hi, Micah, could we get this PR merged? thanks :) And in the follow PR, I resolve most the comments and still have few questions, would you mind explaining a lot? thanks a lot! |
| import org.apache.avro.Schema.Type; | ||
| import org.apache.avro.io.Decoder; | ||
|
|
||
| import sun.reflect.generics.reflectiveObjects.NotImplementedException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry just noticed this, we should be using https://docs.oracle.com/javase/6/docs/api/java/lang/UnsupportedOperationException.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks! hope could merge soon then I could start follow-up works
|
An adapter for CSV would probably be nice as well. I don't know if there is an efficient CSV parser in java that we could potentially already leverage (or if we want bindings to the C++ version). might be worth an e-mail to the mailing list to get opinions. |
Maybe https://commons.apache.org/proper/commons-csv/ is a good choice. |
Seems reasonable to me. Please ask on the mailing list in case anyone else has input. |
Thanks for your effort, I will start discuss later :) |
|
@emkornfield Build passed :) |
|
+1, thank you. |
…itive types Related to [ARROW-5861](https://issues.apache.org/jira/browse/ARROW-5861). Initial implement to support convert Avro record with primitive types to Arrow objects. Author: tianchen <niki.lj@alibaba-inc.com> Closes #4812 from tianchen92/ARROW-5861 and squashes the following commits: 2439478 <tianchen> use UnsupportedOperationException fa3f39a <tianchen> resolve comments 7c3a730 <tianchen> add consumers and use GenericDatumReader 61d2dac <tianchen> fix style 54479c8 <tianchen> Initial implement to convert Avro record with primitive types
…itive types Related to [ARROW-5861](https://issues.apache.org/jira/browse/ARROW-5861). Initial implement to support convert Avro record with primitive types to Arrow objects. Author: tianchen <niki.lj@alibaba-inc.com> Closes apache#4812 from tianchen92/ARROW-5861 and squashes the following commits: 2439478 <tianchen> use UnsupportedOperationException fa3f39a <tianchen> resolve comments 7c3a730 <tianchen> add consumers and use GenericDatumReader 61d2dac <tianchen> fix style 54479c8 <tianchen> Initial implement to convert Avro record with primitive types
Related to ARROW-5861.
Initial implement to support convert Avro record with primitive types to Arrow objects.