ARROW-15705: [JavaScript] Allowing appending null on children in a StructBuilder#12451
ARROW-15705: [JavaScript] Allowing appending null on children in a StructBuilder#12451Alfred-Mountfield wants to merge 6 commits intoapache:masterfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format? or See also: |
|
|
|
CC @domoritz |
trxcllnt
left a comment
There was a problem hiding this comment.
Left a small suggestion for how to do this cleaner.
The spec is ambiguous here, but I agree the spec should be updated to enforce this. We'll likely need to update the Map and SparseUnion builders too.
|
@Alfred-Mountfield looks good to me. Thanks! |
|
@Alfred-Mountfield I think we have the same issue in the UnionBuilder (but only when the type is |
@trxcllnt If it's the same fix then I'm happy to add the changes at some point tomorrow, otherwise we could just do it in the follow-up as you say |
|
With regards to changing it in the It's probably worth making another Jira issue for that and maybe removing the ambiguity in the specification as well. |
domoritz
left a comment
There was a problem hiding this comment.
Thank you for the pull request. Since you have a local test case, please add a unit test so we can ensure not to repeat this issue.
@domoritz The problem goes a bit deeper, as I've raised in a different Jira ticket, and I'm afraid I'm nowhere near familiar enough with the implementation within this repository to be able to contribute effectively at this point. I have found it difficult to understand the correct way of doing things with this package, as the docs are less beginner-friendly than the other language implementations, and the lack of examples exacerbates this. I'm open to trying to contribute further, but I do not know how much time I can allocate to learning the specifics of this implementation (hence me missing the purposes of the |
|
I totally understand. Thank you for sending the pull request and explaining your background. I can help with adding the test code if you can send a code snippet and describe the expectation before and after the fix. Would that work? |
|
That'd be great, thank you :) I believe the code snippet in that last Jira I sent should probably be about right. I believe the problem is that when a struct vector is made (either through the let vec = makeVector({
data: [null],
type: struct_field.type,
nullable: true,
});or let builder = new Builder({
type: struct_field.type,
nullValues: [null, undefined],
});
builder.append(null);
let vec = builder.toVector();Then the Where the underlying memory layout will have a null-bitmap buffer of As the code-snippet (and StackBlitz) shows, all three of those methods seem to have differing values and I'm not really sure why. It's possible that those three values are not adequate for testing that it is correct though, we might need to check that the underlying buffers are initialised for more complex field types on the struct, etc. |
|
I'm having a hard time finding the time right now. If someone else can write the test case, that would be great. Otherwise I'll do it eventually. |
|
I agree that the behavior right now is odd. My replication is at https://stackblitz.com/edit/node-sep8dg?file=index.js. I seem to be getting the same results in this branch, though. Can you show me a code snippet that produces different results @Alfred-Mountfield? |
I apologise, I seem to have conflated two issues and accidentally implied that this fixes both. That's what I meant by:
The fix I've pushed in this branch stops an error when reading the Array back in Rust. I believe there's an assert within the rust-rs code that checks the children's lengths are equal to the Struct node itself. I hadn't checked to see if it actually changed the values of the latter snippet, but I'm sad to hear it doesn't :( Seems like there's another underlying issue. I happened to notice the discrepancies with the Builder and makeVector methods when trying to debug and check if we were building things incorrectly in the JavaScript side of our app. |
|
Okay. We can't use the rust code in the unit tests so we need an example that changes behaves differently in this branch. Can you write one for this pull request so we can make it into a unit test? |
Jira is a good place. Maybe https://issues.apache.org/jira/browse/ARROW-16039 already covers what you describe. And yes, we need more examples and would love help on that. @Alfred-Mountfield Can you make some time to wrap up this pull request so we can merge it before the next release? |
0a90ba2 to
5f7e2ac
Compare
|
Apologies @domoritz on such a late reply, this dropped off my radar as I completely moved away from Arrow related stuff for a while. I've tried recreating it with the following: describe ('StructBuilder', () => {
test('Appending Null', () => {
let list_field = new Field('n1', new FixedSizeList(2, new Field('item', new Float64())), true);
let struct_field = new Field('foo', new Struct([list_field]), true);
let builder = makeBuilder({
type: struct_field.type,
nullValues: [null, undefined],
});
builder.append(null);
console.log('Struct numChildren: ' + builder.numChildren);
console.log('Struct nullCount: ' + builder.nullCount);
console.log('Struct length: ' + builder.length);
console.log('Child numChildren: ' + builder.children[0].numChildren);
console.log('Child nullCount: ' + builder.children[0].nullCount);
console.log('Child length: ' + builder.children[0].length);
}
}When ran with the change made in this PR: and when ran without the change: As you can see, the null-count and length of the child is 0, whereas the spec seems to suggest the Children arrays should be the same length as the Struct array. I tried extending my testing with As we discussed above, if these changes need to be reflected in the Hopefully the simple test I gave above helps with testing those, specifically we're interested in checking that children of the builders correctly have their |
|
I added a test in #12859. Feel free to merge it into this pull request so we can merge this instead of mine. |
|
Apologies for yet another late reply, have been away from my computer for a week on holiday. I've merged your branch in @domoritz. |
|
Benchmark runs are scheduled for baseline = e0668e2 and contender = fae66cb. fae66cb is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |



When trying to add a null value to a StructBuilder, the code previously only modified the null-bitmap.
I believe this breaks the spec as it results in the child arrays being a different length to the Struct array (and the child arrays should have the null value appended to their values).