Skip to content

Implement __eq__ and __ne__ for coordinate classes#10154

Merged
taldcroft merged 14 commits intoastropy:masterfrom
taldcroft:coord-equality
May 2, 2020
Merged

Implement __eq__ and __ne__ for coordinate classes#10154
taldcroft merged 14 commits intoastropy:masterfrom
taldcroft:coord-equality

Conversation

@taldcroft
Copy link
Member

@taldcroft taldcroft commented Apr 15, 2020

Description

This pull request implements a strict equality operator for coordinate classes SkyCoord, BaseFrame, BaseRepresentationOrDifferential, and BaseRepresentation.

Strict equality means that if any of the attributes don't match then an exception is raised.

The logic is inspired by #9857, having realized that this PR went most of the way to defining when these coordinate objects can be compared meaningfully using the representation data.

This will help in testing and bring consistency with the general numpy array interface. In 4.0 one gets silly things like this:

>>> sc = SkyCoord([1,2], [3,4], unit='deg')
>>> sc2 = SkyCoord([1,2], [3,4], unit='deg')
>>> sc == sc2
False  # Should be [True, True]
>>> sc2 = SkyCoord(1, 3, unit='deg')
>>> sc = SkyCoord(1, 3, unit='deg')
>>> sc == sc2
False  # Should be True

Fixes #3182

@taldcroft taldcroft marked this pull request as draft April 15, 2020 22:29
@taldcroft taldcroft requested a review from mhvk April 15, 2020 22:29
@mhvk
Copy link
Contributor

mhvk commented Apr 16, 2020

I do think we should have comparisons, but I wonder about just returning False if a comparison is not directly possible (this especially since numpy has been moving very much away from that). An alternative would be to go the Time route and define them as sc1.separation_3d(sc2) == 0 and (rep1 - rep2).norm() == 0. It might also make sense to at the same time define .isclose methods which do the same thing but with a more sensible default.

if self._data is None and value._data is None:
return True

if self.shape != value.shape:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independently of my larger comment, I would allow different shapes (but catch the possible broadcast error and return False), i.e., sc[0] == sc should be possible.

if isinstance(eq, bool):
return not eq
else:
return ~eq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, never appreciated that ~True equals -2!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That surprised me too!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what? how? why? (I could see it it were -1, but -2?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jdavies-st explained it to me, ~ flips the bits and somehow True is treated as int when ~ is applied.

>>> np.binary_repr(1, width=8)                                                                                                      
'00000001'
>>> np.binary_repr(-2, width=8)                                                                                                     
'11111110'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't __ne__ automagically defined if __eq__ is defined in python3? Does __eq__ as defined above ever return something other than True or False?

Btw, I'm really happy to see __eq__ for strict equality implemented for SkyCoord. Will make testing much cleaner. 🎉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the automagic __ne__ would be numpy-aware, which is necessary since it can return a numpy array of bool type. Also, SkyCoord already has a __ne__ method that comes from somewhere but I honestly don't know where.

In [7]: sc.__ne__??                                                                                                                          
Signature:      sc.__ne__(value, /)
Call signature: sc.__ne__(*args, **kwargs)
Type:           method-wrapper
String form:    <method-wrapper '__ne__' of SkyCoord object at 0x7f7f29204850>
Docstring:      Return self!=value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That makes sense.

In [13]: ~np.array([True, True, False])
Out[13]: array([False, False,  True])

Thanks. I'd forgot that ~ acting on numpy boolean arrays is the same as np.logical_not. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder on np.logical_not. That works on both scalar and np.array values so I got rid of that ugly code.

@taldcroft
Copy link
Member Author

taldcroft commented Apr 16, 2020

I do think we should have comparisons, but I wonder about just returning False if a comparison is not directly possible (this especially since numpy has been moving very much away from that).

Agreed. I'll make that change. I wasn't entirely happy with False as well.

An alternative would be to go the Time route and define them as sc1.separation_3d(sc2) == 0 and (rep1 - rep2).norm() == 0.

I'm not sure how that helps in this case since we are stipulating that no transforms are allowed. So at best you can get a false positive where the representation values are extremely close but not equal, and after doing the separation math going down to a single float you wind up with exact equality. Also separations are just spatial and don't have visibility into velocities AFAIK.

define .isclose methods which do the same thing but with a more sensible default.

Good idea. Along with the first point (raise, not False), this might point to factoring out all or most of the validation checks, maybe to private _is_comparable methods?

@taldcroft
Copy link
Member Author

BTW, about isclose for coordinates, I started thinking about this and realized that it's not going to be so easy. Since a coordinate can have a distance and/or a velocity, which may be 2 or 3-d, it all gets a bit fuzzy to define a single metric for the "distance" between two coordinates. Someone would need to come up with a requirements spec for handling all the cases.

@taldcroft taldcroft changed the title Implement __eq__ and __ne__ for coordinate classes [skip ci] Implement __eq__ and __ne__ for coordinate classes Apr 16, 2020
@mhvk
Copy link
Contributor

mhvk commented Apr 17, 2020

Indeed, and people might want different definitions of closeness in distance than in angular coordinates. It is also a bit similar to Time in that relative tolerances are mostly irrelevant (well, maybe not for parallax). Maybe we should take this out of here to a new issue? Might help consistency with Time too.

@taldcroft taldcroft marked this pull request as ready for review April 17, 2020 17:08
@taldcroft taldcroft added this to the v4.1 milestone Apr 17, 2020
with pytest.raises(ValueError, match='unable to compare column c'):
t1.values_equal([1, 2])

with pytest.raises(TypeError, match='comparison for column sk'):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to fail, no more!

@taldcroft
Copy link
Member Author

Fingers crossed on CI tests (passes locally) but this should be mostly done and ready for review if anyone has the inclination.

BTW I'm not planning to work on isclose() for 4.1, other fish to fry!

@taldcroft
Copy link
Member Author

taldcroft commented Apr 17, 2020

So now I have some higher-level questions. It turns out that this change caused real test failures in WCS. At some level I think that the failures are good because they revealed code which could be improved. But also point to ways that this PR as it stands might cause breakage in the real world. The comparisons in WCS are:

elif frame == 'pixel':

if self.output_system == self.input_system:

Boiling down the issue:

  • Comparing a Frame or SkyCoord to something completely not comparable like a string. Return False or raise TypeError? I vote for raise.
  • Comparing a Frame instance with no data to a Frame class. Likewise I would say to raise an exception. I think this is an actual bug in the original code because if you supply a frame name as string it gets converted to a Frame class, while the rest of the code is insisting on a Frame instance. Also the original code is not working as intended because comparing Frame instances currently returns True only if the are the same object.

Other considerations:

  • Should FK4() == ICRS() raise an exception or return False? In this case with no data it gets a little trickier. Somehow in the case of no data then comparing makes sense.
  • That leads down a slippery slope of wondering if FK4([1]*u.deg,[2]*u.deg) == ICRS([1]*u.deg,[2]*u.deg) should likewise just return [False] (i.e. the broadcasted version of [False]). [EDIT, definitely won't do this]

@mhvk
Copy link
Contributor

mhvk commented Apr 17, 2020

Part of the issue is that python and numpy have different ideas about what __eq__ should do. Python insists it never fails, while numpy is moving to it being comparable to any ufunc and thus failing if, e.g., shapes do not match, etc.

Right now, in Time at least we're following numpy...

cc @eteq, @adrn - would be really good to get your input!

@mhvk
Copy link
Contributor

mhvk commented Apr 17, 2020

p.s. If we're not raising on unequal frames, I think in any case where we returning False just because frames do not match, it should always just be a plain False - any array form would suggest an actual transformation was done and coordinates did not match.

@taldcroft
Copy link
Member Author

@mhvk - the current version stopped short of the slippery slope and only will allow the comparison of unequal frames if they both have no data. I do feel like FK4() == ICRS() should be possible and might reduce some breakage in the wild. Note that with no data, the frame objects are not really analogous to numpy arrays.

In the current version one does still get a TypeError for FK4() == 'a string'. That seems good and stops code bugs.

And for SkyCoord you cannot create that without data, so it is pretty-much right in line with numpy and Time.

BTW, I have been seeing the various numpy deprecations for what seems like years. When is that going to finally happen?

@taldcroft
Copy link
Member Author

@astrofrog - this PR required making some small changes in visualizations/wcsaxes. Git blame says this code is yours so it would be nice if you can review them.

@taldcroft
Copy link
Member Author

Humm, Travis seems to be unhappy, twice in a row I got the "oldest dependencies" job running over an hour and finally quitting. The corresponding tox test runs locally just fine.

https://travis-ci.org/github/astropy/astropy/jobs/676417592

@pllim
Copy link
Member

pllim commented Apr 20, 2020

@taldcroft , the old deps job should be fixed once #10164 is merged. Just need approval...

@bsipocz
Copy link
Member

bsipocz commented Apr 20, 2020

The CI failure workaround has been merged, so a rebase should fix travis here.

Copy link
Member

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I'm happy with this. While I think I originally was opposed to equality like this, time has shown the value of this strict definition requiring exact equality of the data. Some caveats/concerns I have (all of which I think are not major deals/blockers, except possibly the last one which requires agreement).

I think we need to show the user how to do a "tolerance" comparison (I made an inline suggestion you can just accept as-is if you agree, @taldcroft) as early as possible. That was the original reason for disallowing this, to force them to encounter that, so I'm thinking we want to show it as prominently as practical.

Re: the discussion of False's vs the more numpy-like result of failing when comparisons are invalid: I think I like the behavior as it stands right now - I do see it as reasonably consistent with Time (which I think is what @mhvk was saying?) while still having some of the more practical comparisons like the FK4()==ICRS() case work is indeed a compromise but one I think that makes sense for the actual likely things people will do with this operator. So 👍 to that behavior as-is.

However, I noticed the following behavior, which seems rather counter-intuitive:

>>> sc1=coordinates.SkyCoord(ra=0*u.deg, dec=0*u.deg, distance=1*u.kpc)
>>> sc2=coordinates.SkyCoord(sc1.data.to_cartesian(), frame='icrs')
>>> sc1==sc2
True
>>> i1=coordinates.ICRS(ra=0*u.deg, dec=0*u.deg, distance=1*u.kpc)
>>> i2=coordinates.ICRS(i1.data.to_cartesian())
>>> i1==i2
TypeError: cannot compare: objects must have same class: SphericalRepresentation vs. CartesianRepresentation

This comes down to the fact that something in the SkyCoord initializer I think is coercing the representation, which is fine but it hides the confusing behavior in the representations. This behavior becomes even more obvious as a confusion in the following:

>>> i1.data+i2.data-i2.data == i1.data 
True
>>> i2.data+i1.data-i2.data == i1.data 
TypeError: cannot compare: objects must have same class: CartesianRepresentation vs. SphericalRepresentation

Basically, for arithmetic we make the assumption that representations can swap types (basically they all convert to cartesian and then back if need be), but we don't allow that behavior in equality.

So I'd vote for modifying the representation equality behavior to either do direct comparison if the representations are the same (i.e., the behavior in this PR), but if they are not the same, to convert both to cartesian and also try to do the equality test there. Then the above cases I think should work (modulo possible floating-point... but at least they won't raise errors)

Copy link
Member

@adrn adrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this, apart from some minor documentation suggestions.

Do we have an issue to keep track of the idea of "close" coordinate matching? I do think it's something worth considering (despite API challenges related to the different units), so we may want to make an issue to keep track. We may even want to add a link to the issue into the documentation added here, suggesting that anyone wanting that functionality chime in to the discussion. Thanks @taldcroft for this - I do think this is an improvement over the existing behavior!

@taldcroft taldcroft force-pushed the coord-equality branch 2 times, most recently from 9b4bc49 to e7cacd0 Compare May 1, 2020 21:28
@taldcroft
Copy link
Member Author

OK all thanks for the review and feedback, this got to a much better place. As @adrn has approved I will merge if this can get through CI before another changelog merge conflict.

@pllim
Copy link
Member

pllim commented May 4, 2020

I think this broke a test... Please see #10297

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve frame comparison

8 participants