-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1497: [Java] Fix JsonReader to initialize count correctly #1067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| try (VectorSchemaRoot rootFromJson = reader.read();) { | ||
| validateUnionData(count, rootFromJson); | ||
| Validator.compareVectorSchemaRoot(root, rootFromJson); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would fail without the JsonFileReader change
| } | ||
|
|
||
| /* | ||
| * TODO: This method doesn't load some vectors correctly. For instance, it doesn't initialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can open a follow up Jira for this.
| throw new IllegalArgumentException("Expected field " + field.getName() + " but got " + name); | ||
| } | ||
| int count = readNextField("count", Integer.class); | ||
| vector.allocateNew(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not very familiar with when to call the allocateNew() method. Why is it necessary to set the value count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocateNew allocates the buffer for the vector.
setValueCount sets the number of values in the vector.
We need to call setValueCount to correctly create the vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should set lastSet before setting the value count else setValueCount() will corrupt the vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it's hard to do it here unless we do some crazy instance matching here, I don't want to do that because it's too hard to maintain.
The correct way I think is to use the proper loadFieldBuffers in this class which initialize the vectors correctly and set things like lastSet( The TODO above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sounds good to me. I think we need lastSet only for VarChar, VarBinary and ListVector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too sure, but at least, + their Nullable version.
|
|
||
| VectorSchemaRoot root = new VectorSchemaRoot(parent.getChild("root")); | ||
| validateUnionData(count, root); | ||
| try (VectorSchemaRoot root = new VectorSchemaRoot(parent.getChild("root"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why changing to a nested try blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I need both VectorSchemaRoot to compare equality.
|
@siddharthteotia can you take a look at this? @icexelloss can you change the PR title to start with "ARROW-1497:" (remove the brackets). thanks! |
|
+1 |
| } | ||
| int count = readNextField("count", Integer.class); | ||
| vector.allocateNew(); | ||
| vector.getMutator().setValueCount(count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it work to do the following?
- vector.setInitialCapacity(count)
- vector.allocateNew()
- after all inner vectors are read, then vector.getMutator().setValueCount(count)
Doing this, you should be able to clean up all those similar calls in the inner vector loop too. I'm also used to setValueCount being called after values are populated, not sure if that makes a difference though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry I commented too late after the merge. I can look into this at another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create a follow up JIRA about this? thanks!
Author: Li Jin <ice.xelloss@gmail.com> Closes apache#1067 from icexelloss/json-reader-ARROW-1497 and squashes the following commits: 6d4e1df [Li Jin] Fix JsonReader to read union vectors correctly
No description provided.