Skip to content

Conversation

@blkerby
Copy link
Contributor

@blkerby blkerby commented Apr 26, 2018

This introduces a scalar value class DictionaryValue, which fixes a couple bugs involving dictionaries nested inside of ListArrays or inside of other DictionaryArrays. This also includes a new test, which failed previous to this commit but now passes. See https://issues.apache.org/jira/browse/ARROW-2515.

This is my first time contributing, so feedback would be most welcome.

@blkerby blkerby force-pushed the DictionaryValue branch 2 times, most recently from b48dad9 to 5f460a1 Compare April 26, 2018 23:13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a single index, so how about calling this index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was to try to be consistent with the naming of the indices and dictionary properties of the DictionaryArray class, but I agree it's unwieldy and should be changed to simpler names if possible. My first thought was to go with just index and value as well, but is there a potential clash with the index member of the parent class ArrayValue? I guess right now the inherited index is not visible (since it was declared in the Cython and not exposed as a property), but any chance this will change? It seems like all three of these could potentially have some use: index being the index into the indices array, indices_value the value in the indices array (which means it is also the index into the dictionary array), and then dictionary_value being the value in the dictionary array.

But then again I guess it's possible to think of the ArrayValue index field as an implementation detail which maybe should never be exposed in the API, since it doesn't really provide information about the value itself but rather about its origin. If it is ever exposed, I suppose it could be through a different name, maybe something deliberately unappealing, such as _index, and in that case there's no name clash to be worried about in introducing index and value as DictionaryValue properties.

Copy link
Member

@pitrou pitrou Apr 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, that's a good point. Then I would suggest:

  • keep the current index for the index into the indices array
  • use index_value for the value in the indices array
  • keep dictionary_value for the value in the dictionary array

Copy link
Member

@pitrou pitrou Apr 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps simply value?

Edit: Actually, no, see above :-)

@pitrou
Copy link
Member

pitrou commented Apr 27, 2018

Thanks for your contribution @blkerby ! In addition to the naming nits I posted, it would be nice to add a test for the new DictionaryValue properties in test_scalars.py. Could you do that?

@blkerby
Copy link
Contributor Author

blkerby commented Apr 27, 2018

Excellent suggestion, I'd be happy to add tests for those!

@blkerby
Copy link
Contributor Author

blkerby commented Apr 29, 2018

Done!

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

@pitrou pitrou closed this in e8d45eb Apr 30, 2018
@pitrou
Copy link
Member

pitrou commented Apr 30, 2018

Thanks for your contribution @blkerby !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants