-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-2515 [Python] Add DictionaryValue class, fixing bugs with nested dictionaries #1954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b48dad9 to
5f460a1
Compare
python/pyarrow/scalar.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a single index, so how about calling this index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was to try to be consistent with the naming of the indices and dictionary properties of the DictionaryArray class, but I agree it's unwieldy and should be changed to simpler names if possible. My first thought was to go with just index and value as well, but is there a potential clash with the index member of the parent class ArrayValue? I guess right now the inherited index is not visible (since it was declared in the Cython and not exposed as a property), but any chance this will change? It seems like all three of these could potentially have some use: index being the index into the indices array, indices_value the value in the indices array (which means it is also the index into the dictionary array), and then dictionary_value being the value in the dictionary array.
But then again I guess it's possible to think of the ArrayValue index field as an implementation detail which maybe should never be exposed in the API, since it doesn't really provide information about the value itself but rather about its origin. If it is ever exposed, I suppose it could be through a different name, maybe something deliberately unappealing, such as _index, and in that case there's no name clash to be worried about in introducing index and value as DictionaryValue properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, that's a good point. Then I would suggest:
- keep the current
indexfor the index into the indices array - use
index_valuefor the value in the indices array - keep
dictionary_valuefor the value in the dictionary array
python/pyarrow/scalar.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps simply value?
Edit: Actually, no, see above :-)
|
Thanks for your contribution @blkerby ! In addition to the naming nits I posted, it would be nice to add a test for the new |
|
Excellent suggestion, I'd be happy to add tests for those! |
5f460a1 to
ddab1e0
Compare
|
Done! |
392be41 to
1e06963
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now.
|
Thanks for your contribution @blkerby ! |
This introduces a scalar value class DictionaryValue, which fixes a couple bugs involving dictionaries nested inside of ListArrays or inside of other DictionaryArrays. This also includes a new test, which failed previous to this commit but now passes. See https://issues.apache.org/jira/browse/ARROW-2515.
This is my first time contributing, so feedback would be most welcome.