Skip to content

Conversation

@icexelloss
Copy link
Contributor

No description provided.

@icexelloss icexelloss changed the title [ARROW-1497][Java] Fix JsonReader to initialize counts correctly [ARROW-1497][Java] Fix JsonReader to initialize count correctly Sep 8, 2017

try (VectorSchemaRoot rootFromJson = reader.read();) {
validateUnionData(count, rootFromJson);
Validator.compareVectorSchemaRoot(root, rootFromJson);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail without the JsonFileReader change

}

/*
* TODO: This method doesn't load some vectors correctly. For instance, it doesn't initialize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can open a follow up Jira for this.

@icexelloss
Copy link
Contributor Author

cc @siddharthteotia @alphalfalfa

throw new IllegalArgumentException("Expected field " + field.getName() + " but got " + name);
}
int count = readNextField("count", Integer.class);
vector.allocateNew();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not very familiar with when to call the allocateNew() method. Why is it necessary to set the value count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocateNew allocates the buffer for the vector.
setValueCount sets the number of values in the vector.

We need to call setValueCount to correctly create the vector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set lastSet before setting the value count else setValueCount() will corrupt the vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it's hard to do it here unless we do some crazy instance matching here, I don't want to do that because it's too hard to maintain.

The correct way I think is to use the proper loadFieldBuffers in this class which initialize the vectors correctly and set things like lastSet( The TODO above)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds good to me. I think we need lastSet only for VarChar, VarBinary and ListVector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure, but at least, + their Nullable version.


VectorSchemaRoot root = new VectorSchemaRoot(parent.getChild("root"));
validateUnionData(count, root);
try (VectorSchemaRoot root = new VectorSchemaRoot(parent.getChild("root"))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why changing to a nested try blocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need both VectorSchemaRoot to compare equality.

@wesm
Copy link
Member

wesm commented Sep 20, 2017

@siddharthteotia can you take a look at this?

@icexelloss can you change the PR title to start with "ARROW-1497:" (remove the brackets). thanks!

@icexelloss icexelloss changed the title [ARROW-1497][Java] Fix JsonReader to initialize count correctly ARROW-1497: [Java] Fix JsonReader to initialize count correctly Sep 20, 2017
@siddharthteotia
Copy link
Contributor

+1

@asfgit asfgit closed this in 975f32b Sep 20, 2017
@wesm wesm deleted the json-reader-ARROW-1497 branch September 20, 2017 17:40
}
int count = readNextField("count", Integer.class);
vector.allocateNew();
vector.getMutator().setValueCount(count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to do the following?

  1. vector.setInitialCapacity(count)
  2. vector.allocateNew()
  3. after all inner vectors are read, then vector.getMutator().setValueCount(count)

Doing this, you should be able to clean up all those similar calls in the inner vector loop too. I'm also used to setValueCount being called after values are populated, not sure if that makes a difference though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry I commented too late after the merge. I can look into this at another time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a follow up JIRA about this? thanks!

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
Author: Li Jin <ice.xelloss@gmail.com>

Closes apache#1067 from icexelloss/json-reader-ARROW-1497 and squashes the following commits:

6d4e1df [Li Jin] Fix JsonReader to read union vectors correctly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants