-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Allow for mixins in table grouping and unique #7712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi there @taldcroft 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃. Everything looks good from my point of view! 👍 If there are any issues with this message, please report them here. |
eteq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw a super-minor thing in the changelog that was easier to just fix myself (hence the commit I pushed up). Other than that LGTM!
astropy/table/groups.py
Outdated
| # Cannot generically use non-quantity mixins for key columns because of the current | ||
| # sorting / index implementation. | ||
| for col in table_keys.itercols(): | ||
| if has_info_class(col, MixinInfo) and not has_info_class(col, QuantityInfo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the grouping should work for Time mixins, too now that sorting is working? At least in the example below sorting and thus grouping works nicely.
astropy/table/tests/test_groups.py
Outdated
| assert np.all(qt[name][[2]] == qtg.groups[1][name]) | ||
| assert np.all(qt[name][[0]] == qtg.groups[2][name]) | ||
|
|
||
| for key in ['tm', 'aw', 'nd', 'sc']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this exact same test actually seems to be working for 'tm' and 'aw' and 'nd', too, but I'm not confident for the latter two whether I see all the use cases and where/when they should really fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if this should fail, could you add another example where shorting is failing for these columns?
| assert c2.groups[2].pformat() == [' c ', '---', '3.0', '2.0', '1.0'] | ||
|
|
||
|
|
||
| def test_group_mixins(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also have a test for unique test for mixins in test_operations.py?
4a072ab to
3f20b7b
Compare
|
Thank you @taldcroft! Changelog and docs will need an update if @mhvk gives the green light :) |
| def get_sortable_cols(self): | ||
| """Get sortable/groupable cols for col which is the parent of this Info object. In the | ||
| case of SkyCoord subclasses we need to break it into columns for each | ||
| representation component along with any other array attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, the "along with any other array attributes" is no longer true. It just raises warnings now.
| """ | ||
| # Local import, not typically needed for SkyCoord | ||
| from ..table import Column | ||
| # from ..table.serialize import _represent_mixin_as_column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix.
| else: | ||
| # For now ignore non-component data attributes (e.g. array obstime | ||
| # or location since it gets tricky) with a warning. | ||
| warnings.warn('{} attribute being ignored in get_sortable_cols' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to test for this warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| raise ValueError('Input keys array length {0} does not match column length {1}' | ||
| .format(len(keys), len(column))) | ||
|
|
||
| # take advantage of table or column indices, if possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that at this point keys had to be an ndarray, so all the code below dealing with a pre-made index was never being run. In fact get_index requires 2 args, not 1, so this code was just orphaned.
| Subset of the columns in the table argument | ||
| ''' | ||
| cols = set(table_copy.columns) | ||
| indices = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover code? indices not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I bumped into this the other day, too.
|
Travis failures look to be unrelated. |
astropy/table/groups.py
Outdated
|
|
||
| # If table_keys is a Table then ensure that all cols (including mixins) are | ||
| # in a form that can sorted with the code below. | ||
| if isinstance(table_keys, Table): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that the logic here will stomp on a Time column which is indexed, and prevent the optimization below of using the already-existing index to sort. So that needs to be fixed. Testing is slightly tricky because you get the same answer either way, just one is faster.
|
The pytest issues should now be fixed, so I've restarted travis. |
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taldcroft - I'm on holidays so only had a brief look. But I like using info here very much! One small comment in-line.
A larger one is whether we can really not do with the represent_as_dict columns; for Time in particular, I think the current argsort was written (by me...) in a time where we couldn't be absolutely sure that day_frac had been run - but with day_frac applied, I do not see how jd1, jd2 could not be good enough for sorting. If that means that we could in fact use represent_as_dict, then perhaps this should be changed?
On the other hand, I guess that the sheer existence of this question suggests one should have a separate method, so that other mixin classes can define themselves how they are sorted. If so, the question is more whether it might make sense to sort by default on the standard columns (for SkyCoord, the concept of sorting is itself somewhat ill-defined; does one really want to sort by 'ra', 'dec', 'distance'?
But I think it is also fine to leave these worries for later - most important is that there now is a general scheme!
astropy/units/quantity.py
Outdated
|
|
||
| return out | ||
|
|
||
| def get_sortable_cols(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed, given that you define it below on the base info class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this is a leftover from the original where the base method raised NotImplementedError and forced all classes to implement something (i.e. explicit opt-in). In the end I decided that just returning the parent column is an OK default and classes that don't work have to implement the right thing or else opt-out with NotImplementedError.
|
@mhvk - I was being a little conservative and wasn't sure we can always count on day_frac being run. I seem to recall from earlier versions that the ERFA routines could return jd2 in -0.5 to 0.5, so that's no good for sort. But if you can assure me that jd1 is always an int and jd2 is always the fractional remainder, than that's great and I'll update With that, then BTW, about "sorting" SkyCoord, the real need is just any arbitrary sort order that can then be grouped. Maybe |
|
@taldcroft - plan sounds good. After |
- Make a new info functionality to get_sortable_cols and use in grouping - Add support for Time as a group index key column - Add support for the rest of mixins; make support opt-out not opt-in - Minor cleanup and improve testing - Allow mixins in column group_by keys (and clean orphaned code) - More cleanup after reviewing diffs - Fix issue of ignoring available sort index - Simplify down to just _represent_mixins_as_columns
c7c7772 to
5c99f6a
Compare
|
It's been a roundabout journey, but this now uses The downside is that this is hardwired to have the grouping sort columns be the same as what comes out of serialization. For instance, if you have an ICRS SkyCoord with an obstime column, where several of the coordinates are the same RA, Dec but different obstime, these will all be unique. That might be annoying if you wanted to group them by object. (This is really a pathological case because whether obstime matters depends on the frame). If problems like these ever show up in the real world, I have left the original long-form version of this PR at the link below for reference. https://github.com/astropy/astropy/compare/master...taldcroft:table-group-mixins-customize?expand=1 |
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the simplicity of this!
| grouped_table : Table object with groups attr set accordingly | ||
| """ | ||
| from .table import Table | ||
| from .serialize import _represent_mixins_as_columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may consider exposing the function (but not in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Pylint has started complaining to me about using a private method from another module...
|
Thank you @taldcroft! |
Closes #7060.