-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Add DataInfo get_sortable_arrays to allow mixins as key columns in join #9340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I like the idea, but first a general comment: could we not more simply rely on the presence of |
|
The problem is if you have (e.g. as in the test) both |
|
Ah, yes, I see how it helps in that sense. I'll try to have a more detailed look later. |
66f5ed4 to
8e44e4a
Compare
astropy/table/operations.py
Outdated
| raise tme | ||
|
|
||
|
|
||
| def _get_join_sort_idxs(out_descrs, keys, len_left, len_right, left, right): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking again I think that out_descrs isn't needed, and len_left and len_right could be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
astropy/table/operations.py
Outdated
| sort_right = {} # sortable ndarray from right table | ||
|
|
||
| for key in keys: | ||
| # TODO: shape here seems out of place. Can a key column have a non-trivial shape? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shape is not actually used now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about this - it seems it should be impossible to sort on anything but 1-dimensional columns.
bsipocz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It all looks and very clever to me. Take my approval with a pinch of salt, as I'm not sure how serious something need to be in here so I would spot it.
| # a new structured array that represents the lexical ordering of those | ||
| # key columns. This structured array is then argsort'ed. The trick here | ||
| # is that some columns (e.g. Time) may need to be expanded into multiple | ||
| # columns for ordering here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these detailed comments, it really helps a lot now and will in the future, too
astropy/time/core.py
Outdated
| Return a list of arrays which can be lexically sorted to represent | ||
| the order of the parent column. | ||
| :return: list of ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super nit picky, and don't need to be changed right now, but longer term it would be nice to use numpydoc format. If you still use pycharm, the docstring format can be changed in preferences-->tools-->python integrated tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, done. Thanks for the hint in setting this preference. I had this set before but I think I accidentally blew away my .idea directory at some point with git clean -fxd, so the docstring format got lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got bitten by that, too, now I only have git clean -dfx -e .idea -e notes in my history and stick to use it from there rather than typing it in :)
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very nice! Only small comments - the main one is your own one that the new sorting function's signature can be simplified.
It does seem to me that columns with non-trivial shape cannot be sorted without further information - in which case your new get_sortable_columns is just the right place to put that information. So, it would seem to make sense to insist on that. (From quick experiments, it seems argsort just takes only the first element)
astropy/table/column.py
Outdated
| For Column this is just the column itself. | ||
| :return: list of Column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like @bsipocz noted, should be something like
Returns
-------
arrays : list of Column
Though I think it is fine to just delete it too, since the title already states what it is.
astropy/table/operations.py
Outdated
| sort_right = {} # sortable ndarray from right table | ||
|
|
||
| for key in keys: | ||
| # TODO: shape here seems out of place. Can a key column have a non-trivial shape? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about this - it seems it should be impossible to sort on anything but 1-dimensional columns.
astropy/table/operations.py
Outdated
|
|
||
| if len(left_sort_cols) != len(right_sort_cols): | ||
| # Should never happen because cols are screened beforehand for compatibility | ||
| raise ValueError('unexpected mismatch in sort cols lengths') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use RuntimeError for things that should never happen (still good to check).
Alternatively, I think it makes sense to use assert ..., "<reason>" here, so that it does not get executed with python -O.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with RuntimeError. I don't actually understand why you would want to not execute this test via python -O. I must just be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matter much, since the test is far cheaper than the rest, but I think assert is reasonable since you write right about that this "Should never happen", i.e., you have already tested in earlier code and thus here you simply 'assert' that the earlier checks were done. (I like this use of assert - it asserts assumptions made in the code that should hold, but that, given that bugs do happen, may be wrong any way - I think when you optimize for speed, it is fine to go with the assumptions...)
astropy/table/operations.py
Outdated
| raise tme | ||
|
|
||
|
|
||
| def _get_join_sort_idxs(out_descrs, keys, len_left, len_right, left, right): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
taldcroft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have addressed all comments.
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks OK now, nice! merging....
This commit removes documentation that has been made obsolete by pull requests astropy#9340 and astropy#11127.
Closes #7728
Should be used as the base for #9287.