Skip to content

Conversation

@aucahuasi
Copy link
Contributor

@github-actions
Copy link

@aucahuasi
Copy link
Contributor Author

Tomorrow I'll work on the python and R bindings for this new function
cc @nealrichardson @edponce @ianmcook

@aucahuasi aucahuasi changed the title ARROW-12669:[C++][Python][R] Implement a new vector function: list_element ARROW-12669:[C++][R] Implement a new vector function: list_element Sep 14, 2021
@aucahuasi aucahuasi force-pushed the return-array-for-each-list-in-array branch from 91f1638 to b38ee7e Compare September 15, 2021 01:14
@aucahuasi aucahuasi marked this pull request as ready for review September 15, 2021 01:19
@aucahuasi aucahuasi changed the title ARROW-12669:[C++][R] Implement a new vector function: list_element ARROW-12669:[C++][Python] Implement a new vector function: list_element Sep 15, 2021
@lidavidm
Copy link
Member

Just a quick thought, why is this a vector kernel instead of a scalar one? (Though with one input it's maybe not a big difference.)

@lidavidm
Copy link
Member

Oh wait nevermind, the index here is a parameter and not an options.

@nealrichardson
Copy link
Member

why is this a vector kernel instead of a scalar one?

I think it has to be a scalar kernel if we want to be able to call it in a projection node, which is why we're doing this now--right?

@lidavidm
Copy link
Member

why is this a vector kernel instead of a scalar one?

I think it has to be a scalar kernel if we want to be able to call it in a projection node, which is why we're doing this now--right?

Ah, right - plus expressions aren't evaluatable unless they're scalar:

if (!expr.IsScalarExpression()) {
return Status::Invalid(
"ExecuteScalarExpression cannot Execute non-scalar expression ", expr.ToString());
}

I think most kernels like this pass the second argument via options, not as an argument. I was also debating whether to comment about this, but might as well: there's no need to generate kernels for different index types, especially if we just make the index an option instead of a parameter. (It'll also limit the number of special cases to account for in type coverage testing.)

@jorisvandenbossche
Copy link
Member

I think most kernels like this pass the second argument via options, not as an argument.

Why can't this be a scalar kernel while still having index as a second argument?
Having the index as argument keeps the flexibility to be able to get a different index of each row (I don't know if this would be an important functionality though, but there are probably use cases)

@lidavidm
Copy link
Member

I think most kernels like this pass the second argument via options, not as an argument.

Why can't this be a scalar kernel while still having index as a second argument?
Having the index as argument keeps the flexibility to be able to get a different index of each row (I don't know if this would be an important functionality though, but there are probably use cases)

Ah, you're right. So this should just be a scalar kernel, then.

@aucahuasi aucahuasi force-pushed the return-array-for-each-list-in-array branch 2 times, most recently from ea560fa to 954ca3e Compare September 16, 2021 01:55
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good to me, I just left one note - the scalar kernel doesn't need to be templated on the list type, so you can save yourself some generated code there.

@aucahuasi aucahuasi force-pushed the return-array-for-each-list-in-array branch from 6201ebf to 2a207d5 Compare September 18, 2021 14:57
@ianmcook ianmcook changed the title ARROW-12669:[C++][Python] Implement a new vector function: list_element ARROW-12669:[C++][Python] Implement a new scalar function: list_element Sep 18, 2021
@aucahuasi aucahuasi force-pushed the return-array-for-each-list-in-array branch from 2a207d5 to 5b899d6 Compare September 20, 2021 17:01
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

add python test and C++ docs for the new function: list_element

convert list_element into a scalar function

minor changes

format

support scalar inputs for list_element, improve tests

minor changes

less generated code thanks to some template tricks

less generated code, again using template tricks
@aucahuasi aucahuasi force-pushed the return-array-for-each-list-in-array branch from 5b899d6 to 2369457 Compare September 20, 2021 18:10
Copy link
Contributor

@edponce edponce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for resolving this issue!

@lidavidm lidavidm closed this in bdb2f74 Sep 20, 2021
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
https://issues.apache.org/jira/browse/ARROW-12669

Closes apache#11159 from aucahuasi/return-array-for-each-list-in-array

Authored-by: Percy Camilo Triveño Aucahuasi <percy.camilo.ta@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants