Skip to content

Conversation

@taldcroft
Copy link
Member

Closes #7060.

@astropy-bot
Copy link

astropy-bot bot commented Aug 6, 2018

Hi there @taldcroft 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃.

Everything looks good from my point of view! 👍

If there are any issues with this message, please report them here.

@taldcroft taldcroft added this to the v3.1 milestone Aug 6, 2018
@taldcroft taldcroft requested review from bsipocz and mhvk August 6, 2018 21:28
Copy link
Member

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a super-minor thing in the changelog that was easier to just fix myself (hence the commit I pushed up). Other than that LGTM!

# Cannot generically use non-quantity mixins for key columns because of the current
# sorting / index implementation.
for col in table_keys.itercols():
if has_info_class(col, MixinInfo) and not has_info_class(col, QuantityInfo):
Copy link
Member

@bsipocz bsipocz Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the grouping should work for Time mixins, too now that sorting is working? At least in the example below sorting and thus grouping works nicely.

assert np.all(qt[name][[2]] == qtg.groups[1][name])
assert np.all(qt[name][[0]] == qtg.groups[2][name])

for key in ['tm', 'aw', 'nd', 'sc']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this exact same test actually seems to be working for 'tm' and 'aw' and 'nd', too, but I'm not confident for the latter two whether I see all the use cases and where/when they should really fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if this should fail, could you add another example where shorting is failing for these columns?

assert c2.groups[2].pformat() == [' c ', '---', '3.0', '2.0', '1.0']


def test_group_mixins():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also have a test for unique test for mixins in test_operations.py?

@taldcroft taldcroft force-pushed the table-group-mixins branch from 4a072ab to 3f20b7b Compare August 7, 2018 19:11
@taldcroft
Copy link
Member Author

OK, pushed by comments from @bsipocz, I realized it is possible to implement general support for all mixins as key columns. This is basically a complete rewrite and now uses the Info class to provide a method that tells the grouping function what to do. Hopefully @mhvk will like it! 😄

@bsipocz
Copy link
Member

bsipocz commented Aug 7, 2018

Thank you @taldcroft!

Changelog and docs will need an update if @mhvk gives the green light :)

def get_sortable_cols(self):
"""Get sortable/groupable cols for col which is the parent of this Info object. In the
case of SkyCoord subclasses we need to break it into columns for each
representation component along with any other array attributes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, the "along with any other array attributes" is no longer true. It just raises warnings now.

"""
# Local import, not typically needed for SkyCoord
from ..table import Column
# from ..table.serialize import _represent_mixin_as_column
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

else:
# For now ignore non-component data attributes (e.g. array obstime
# or location since it gets tricky) with a warning.
warnings.warn('{} attribute being ignored in get_sortable_cols'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to test for this warning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

raise ValueError('Input keys array length {0} does not match column length {1}'
.format(len(keys), len(column)))

# take advantage of table or column indices, if possible
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that at this point keys had to be an ndarray, so all the code below dealing with a pre-made index was never being run. In fact get_index requires 2 args, not 1, so this code was just orphaned.

Subset of the columns in the table argument
'''
cols = set(table_copy.columns)
indices = set()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover code? indices not used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I bumped into this the other day, too.

@taldcroft
Copy link
Member Author

Travis failures look to be unrelated.


# If table_keys is a Table then ensure that all cols (including mixins) are
# in a form that can sorted with the code below.
if isinstance(table_keys, Table):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that the logic here will stomp on a Time column which is indexed, and prevent the optimization below of using the already-existing index to sort. So that needs to be fixed. Testing is slightly tricky because you get the same answer either way, just one is faster.

@bsipocz
Copy link
Member

bsipocz commented Aug 9, 2018

The pytest issues should now be fixed, so I've restarted travis.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taldcroft - I'm on holidays so only had a brief look. But I like using info here very much! One small comment in-line.

A larger one is whether we can really not do with the represent_as_dict columns; for Time in particular, I think the current argsort was written (by me...) in a time where we couldn't be absolutely sure that day_frac had been run - but with day_frac applied, I do not see how jd1, jd2 could not be good enough for sorting. If that means that we could in fact use represent_as_dict, then perhaps this should be changed?

On the other hand, I guess that the sheer existence of this question suggests one should have a separate method, so that other mixin classes can define themselves how they are sorted. If so, the question is more whether it might make sense to sort by default on the standard columns (for SkyCoord, the concept of sorting is itself somewhat ill-defined; does one really want to sort by 'ra', 'dec', 'distance'?

But I think it is also fine to leave these worries for later - most important is that there now is a general scheme!


return out

def get_sortable_cols(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed, given that you define it below on the base info class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this is a leftover from the original where the base method raised NotImplementedError and forced all classes to implement something (i.e. explicit opt-in). In the end I decided that just returning the parent column is an OK default and classes that don't work have to implement the right thing or else opt-out with NotImplementedError.

@taldcroft
Copy link
Member Author

@mhvk - I was being a little conservative and wasn't sure we can always count on day_frac being run. I seem to recall from earlier versions that the ERFA routines could return jd2 in -0.5 to 0.5, so that's no good for sort. But if you can assure me that jd1 is always an int and jd2 is always the fractional remainder, than that's great and I'll update get_sortable_cols for Time accordingly.

With that, then represent_as_dict might work as a default and tidy the code considerably. That depends on the context being jd1_jd2 for this operation. I'll give this a go and see if it works out. One needs to worry about the possibility that a mixin has some array-like meta data that will get into the serialization data but which is not something that should be used for grouping/sorting. That's where having a base get_sortable_cols() as a hook for that possibility might make sense. (But I generally dislike designing interfaces for possibilities where we don't actually have a concrete need in hand).

BTW, about "sorting" SkyCoord, the real need is just any arbitrary sort order that can then be grouped. Maybe get_groupable_cols is a better than? OTOH it occurs to me this technology could be applicable to Table.sort, allowing arbitrary mixins to be the keys there as well.

@mhvk
Copy link
Contributor

mhvk commented Aug 10, 2018

@taldcroft - plan sounds good. After day_frac, which I think we now really always run, jd2 is guaranteed to be between -0.5 and 0.5, so sorting should work. A test might be to use them directly in argsort and check that no tests fail.

- Make a new info functionality to get_sortable_cols and use in grouping
- Add support for Time as a group index key column
- Add support for the rest of mixins; make support opt-out not opt-in
- Minor cleanup and improve testing
- Allow mixins in column group_by keys (and clean orphaned code)
- More cleanup after reviewing diffs
- Fix issue of ignoring available sort index
- Simplify down to just _represent_mixins_as_columns
@taldcroft taldcroft changed the title Allow for mixins in table grouping Allow for mixins in table grouping and unique Aug 12, 2018
@taldcroft
Copy link
Member Author

It's been a roundabout journey, but this now uses _represent_mixins_as_columns to do all the heavy lifting, so now most of the PR is in new tests and minor logic improvements in grouping. The new Info method is not needed, simpler is better! I did rebase down to one commit to remove the journey.

The downside is that this is hardwired to have the grouping sort columns be the same as what comes out of serialization. For instance, if you have an ICRS SkyCoord with an obstime column, where several of the coordinates are the same RA, Dec but different obstime, these will all be unique. That might be annoying if you wanted to group them by object. (This is really a pathological case because whether obstime matters depends on the frame). If problems like these ever show up in the real world, I have left the original long-form version of this PR at the link below for reference.

https://github.com/astropy/astropy/compare/master...taldcroft:table-group-mixins-customize?expand=1

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the simplicity of this!

grouped_table : Table object with groups attr set accordingly
"""
from .table import Table
from .serialize import _represent_mixins_as_columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may consider exposing the function (but not in this PR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Pylint has started complaining to me about using a private method from another module...

@taldcroft taldcroft merged commit 932b0e6 into astropy:master Aug 12, 2018
@taldcroft taldcroft deleted the table-group-mixins branch August 12, 2018 17:00
@bsipocz
Copy link
Member

bsipocz commented Aug 12, 2018

Thank you @taldcroft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants