Allow longdouble values through input validation by aarchiba · Pull Request #9368 · astropy/astropy

aarchiba · 2019-10-15T12:30:30Z

Description

This pull request is to make it possible for third-party code to take higher-precision floating-point values as input, in order to more easily work with the extended precision available from astropy Time objects. This would be relevant to #9361 for example.

Fixes #9341

mhvk

Thanks! Some smaller comments, but also a general one: ideally, one could just put in higher precision floats and it would "just work". I think it may be possible to do so by changing day_frac to special-case higher-precision - it may be possible to do so by adjusting two_sum with some calls that would be no-op for regular float (NOTE: NOT TESTED!)

def two_sum(a, b):
    x = np.asanyarray(a + b, float)
    eb = x - a  # bvirtual in Shewchuk
    ea = x - eb  # avirtual in Shewchuk
    eb = b - eb  # broundoff in Shewchuk
    ea = a - ea  # aroundoff in Shewchuk
    return x, (ea + eb).astype(float, copy=False)

It may be safer to just pre-condition the arguments, a bit like done in quantity_day_frac, doing a careful divmod(val[12], 1) before adding up the values and residuals.

astropy/time/core.py

mhvk · 2019-10-15T16:18:05Z

astropy/time/tests/test_custom_formats.py

+@pytest.mark.parametrize("f", ["mjd", "unix", "cxcsec"])
+def test_existing_types_refuse_longdoubles(f):
+    with pytest.raises(ValueError):
+        Time(np.longdouble(58000), format=f)


Not sure we'd want these tests to fail!

Ah. Well. Hmm. I want precision loss to require explicit conversion, and none of these types preserve long double precision (yet?).

I think all of them start with day_frac (in TimeFromEpoch.set_jds) so they might all support it if that routine is adjusted.

The tests now all permit either outright rejection of long doubles or acceptance provided that the precision is preserved.

I'm not sure about precision loss requiring explicit conversion (else raising an unavoidable exception). This will outright break existing code that happens to be using long doubles, and I'm almost always against breaking code without a strong reason.

In general I prefer to be a bit pragmatic about things like this, letting users be responsible for making decisions about data types, and when it matters then requiring that they have some cognizance of the implementation details. The second paragraph of the Time documentation starts with "All time manipulations and arithmetic operations are done internally using two 64-bit floats to represent time." I think that is a pretty good starting point for someone to infer that providing a higher-precision input might not be helping.

But as a compromise it might be fine to issue a warning in the case of potential precision loss.

Our code currently emits a blizzard of warnings even for perfectly normal operation, so in fact people grow used to ignoring warnings. This is seen with LaTeX as well - too many warnings and they are ignored.

What is the use case where users pass long doubles in to these functions and would prefer lost precision to exceptions? Recall that manual downconversion is as easy as applying float.

If we are not fixing the problem here, then raising an exception seems reasonable.

I think of exceptions for issues that are definitely wrong or an error, and warnings for things that might lead to unexpected results. I don't think that saying that some code or package issues a blizzard of warnings is a good argument for ignoring the entire Python warning system.

If I do Time(np.longdouble(1), format='cxcsec'), there is no reason that should fail to execute. It is perfectly legal and gives the result that I expect. But I do think it is reasonable to issue a warning that about precision which an informed user can take into consideration.

It may be that somebody is already using long-double for some reason that we don't know about, and they understand what is happening, and they don't want their code to suddenly break for no good reason with astropy 4.0. I sometimes do things that might seem irrational or downright stupid to someone else, but we all have our reasons.

Ah, convinced the other way again. Let's just agree to fix it!

I agree on this - let's get something like #9361 merged and then there won't be much question left what to do with longdoubles because they will work.

aarchiba · 2019-10-15T17:58:23Z

Thanks! Some smaller comments, but also a general one: ideally, one could just put in higher precision floats and it would "just work". I think it may be possible to do so by changing day_frac to special-case higher-precision - it may be possible to do so by adjusting two_sum with some calls that would be no-op for regular float (NOTE: NOT TESTED!)
def two_sum(a, b):
    x = np.asanyarray(a + b, float)
    eb = x - a  # bvirtual in Shewchuk
    ea = x - eb  # avirtual in Shewchuk
    eb = b - eb  # broundoff in Shewchuk
    ea = a - ea  # aroundoff in Shewchuk
    return x, (ea + eb).astype(float, copy=False)
It may be safer to just pre-condition the arguments, a bit like done in quantity_day_frac, doing a careful divmod(val[12], 1) before adding up the values and residuals.

This would absolutely call for testing with hypothesis, once we figured out exactly what the criterion was for successful computation.

Well, okay, I think two_sum should do exactly what it does regardless of type it is passed: compute the sum to maximum precision available with the types. But making day_frac operate appropriately would take some doing, especially if we want it to behave well when also passed Quantities. And definitely testing.

mhvk · 2019-10-15T18:59:16Z

Yes, makes sense to have two_sum be type-agnostic. There is already a quantity_day_frac so there is precedent... But, yes, substantial testing needed...

aarchiba · 2019-10-15T19:04:54Z

Yes, makes sense to have two_sum be type-agnostic. There is already a quantity_day_frac so there is precedent... But, yes, substantial testing needed...

Sorry for the repetition, but my experience with hypothesis in getting the kinks out of PINT's numerical conversion code was extremely positive. It nailed down some difficult-to-spot bugs - including finding leap-second-day problems before I had even told it about leap second days - and programming it was easy (and had that delightful pieces-clicking-together feeling I get with Haskell).

mhvk · 2019-10-15T19:10:35Z

I'm all for trying hypothesis on Time - I think it is a much easier target than units.

aarchiba · 2019-10-15T19:15:20Z

Maybe they're off-topic for this PR but I also found myself poking around to confirm that there is no data validation on the jd1 and jd2 values set in Time objects, and there is no validation of types returned by .value - but there are strict and hard-to-explain requirements on them. This means that implementing custom time formats involves a lot of impenetrable error messages. I'd like to add some validation on both ends; ensure that a Time object's jd1 and jd2 are always arrays of float64s of the same shape no matter what a custom format tries to set them to, and ensure that .value always returns an array of that same shape of whatevers, or a singleton whatever, as appropriate; also that implementors of custom time formats get helpful error messages if their .value methods don't do this correctly. (It'd sort of be nice to be able to provide element-by-element conversion funcions that would automatically be broadcast; these would be too slow for actual numeric arrays but for object arrays they're not any slower and handling arrays of arbitrary shape is awkward; see https://github.com/nanograv/PINT/blob/a4b33eae5fce21a6010e8665bfeecf9caec8cf96/src/pint/pulsar_mjd.py#L501 for the mess.)

aarchiba · 2019-10-15T19:47:07Z

So uh.

Right now we quietly ignore it if someone passes a val2 when creating a Time object for a string format. That is, the user provides some information in val2, and astropy throws it away and doesn't tell them. I do not approve.

I added some ValueErrors in this case, only to find that astropy has a test for explicitly this behaviour! Astropy's tests demand that we silently throw away information explicitly provided by the user. What? Can I change that? Please?

mhvk · 2019-10-15T20:02:13Z

@aarchiba - I agree ignoring val2 rather than checking it is None as appropriate is bad, and have meant to address that myself. So, happy to change, but let's do that in a separate PR!

@taldcroft and I also discussed that somewhat of a design mistake is to have jd1 and jd2 on the TimeFormat class - it would make more sense to just let it do the transformations to and from input.

p.s. Your mjds_to_str would seem rather like Time._shaped_like_input...

aarchiba · 2019-10-15T20:10:10Z

@aarchiba - I agree ignoring val2 rather than checking it is None as appropriate is bad, and have meant to address that myself. So, happy to change, but let's do that in a separate PR!

I'll pull it out. I have a bad habit of fixing minor problems as they fly past, which gets them fixed but makes my PRs messy.

@taldcroft and I also discussed that somewhat of a design mistake is to have jd1 and jd2 on the TimeFormat class - it would make more sense to just let it do the transformations to and from input.

This does seem awkward. We are sort of stuck with it, though properties allow for some deviousness.

p.s. Your mjds_to_str would seem rather like Time._shaped_like_input...

Ah, yes, probably. I've had to write similar code too many times now, to get functions to behave like ufuncs in terms of correctly maintaining array-ness (scalar->scalar, array->array including zero-dimensional) and shape. It's worse needing to have multiple input arrays that are broadcast together. I thought numpy.vectorize might get me this for free but at least for strings it didn't quite. Modern numpy.vectorize can let a vector-capable function act on an entire one-dimensional array, at least, which broadens its applicability.

mhvk · 2019-10-15T20:18:57Z

On ufuncs, see numpy/numpy#13105 - would definitely be good to have a solution...

aarchiba · 2019-10-15T20:25:56Z

On ufuncs, see numpy/numpy#13105 - would definitely be good to have a solution...

I'm not too hopeful, I remember frustration and argument about numpy zero-dimensional arrays versus scalars on the numpy mailing lists more than a decade ago. But it sure would be nice.

aarchiba · 2019-10-15T20:31:22Z

Validation moved to #9373

aarchiba · 2019-10-15T20:38:50Z

On ufuncs, see numpy/numpy#13105 - would definitely be good to have a solution...

I'm not too hopeful, I remember frustration and argument about numpy zero-dimensional arrays versus scalars on the numpy mailing lists more than a decade ago. But it sure would be nice.

One of my ancient-times fixes was to record the input array shape (or that it wasn't an array), reshape it into a 1d array, do the vectorized calculation, then reshape it into the same shape (and unpack if it wasn't an array) on output. We could do that, though as we allow Time objects to be indexed and reshaped it would be more complicated. We could at least record whether the input was an array or not so as to know whether to strip off the array boxing on output.

mhvk · 2019-10-15T23:24:02Z

I think the shaping, etc., and possibly returning to np.float64 instead of python float, could be done as part of _shaped_like_input. Though I'm not sure in the Time case it matters all that much whether it is a np.float64 or a float - I can only see arguments for never returning a python type, just to ensure output always has a shape, etc.

mhvk · 2019-10-16T12:57:35Z

@aarchiba - on day_frac: like two_sum, it currently works fine on any type of float - it may make sense to do the conversion at the end of it - in particular, just before where the second rounding is done:

day = day.astype(np.float64, copy=False)
frac = frac.astype(np.float64, copy=False)
excess = np.round(frac)
...

aarchiba · 2019-10-16T13:47:52Z

@aarchiba - on day_frac: like two_sum, it currently works fine on any type of float - it may make sense to do the conversion at the end of it - in particular, just before where the second rounding is done:

The day_frac with multiplication/division is not type-agnostic, because two_prod is not - it has the "split" function with a magic constant that would need to be adapted for each type (which I think is not too hard).

More locally, I discovered in writing #9375 that you can't run jd1,jd2 through day_frac upon storage because some tests check for (and presumably some users rely upon?) their not being copied.

mhvk · 2019-10-16T13:58:48Z

Ah, good point about two_prod

astropy/time/tests/test_basic.py

taldcroft · 2019-10-18T15:35:38Z

astropy/time/tests/test_custom_formats.py

+    assert getattr(t, custom_format_name) != getattr(t2, custom_format_name)
+
+
+def test_mjd_unit_validation():


These two unit conversion tests are unrelated to the code change, but presuming they are just to improve coverage then fine to leave them.

It was not at all obvious to me that the code did the right thing in these cases so I wrote test cases, and I concluded that it did but that the added test cases were harmless. I'm not actually sure how they impact coverage numbers.

mhvk

Getting longdouble to work generically in #9361 needs this, so since the only remaining comment is about a possibly superfluous test, I'm going to merge this and rebase #9361 on top.

Thanks, @aarchiba!

astropy-bot bot added the time label Oct 15, 2019

pllim requested review from mhvk and taldcroft October 15, 2019 15:19

mhvk reviewed Oct 15, 2019

View reviewed changes

taldcroft reviewed Oct 18, 2019

View reviewed changes

astropy/time/tests/test_basic.py Outdated Show resolved Hide resolved

taldcroft reviewed Oct 18, 2019

View reviewed changes

Allow longdouble values through input validation

a044ee4

aarchiba force-pushed the flexible-time-input branch from b300306 to a044ee4 Compare October 18, 2019 17:46

Explain changes.

466b5e7

aarchiba mentioned this pull request Oct 19, 2019

Add arm64 to Travis instructions #9407

Closed

mhvk mentioned this pull request Oct 19, 2019

Allow subformats for Numeric Time formats like MJD, JD #9361

Merged

bsipocz added this to the v4.0 milestone Oct 21, 2019

mhvk approved these changes Oct 23, 2019

View reviewed changes

mhvk merged commit 99f4f59 into astropy:master Oct 23, 2019

aarchiba deleted the flexible-time-input branch October 23, 2019 20:34

		assert getattr(t, custom_format_name) != getattr(t2, custom_format_name)


		def test_mjd_unit_validation():

Uh oh!

Conversation

aarchiba commented Oct 15, 2019

Description

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aarchiba commented Oct 15, 2019

Uh oh!

mhvk commented Oct 15, 2019

Uh oh!

aarchiba commented Oct 15, 2019

Uh oh!

mhvk commented Oct 15, 2019

Uh oh!

aarchiba commented Oct 15, 2019

Uh oh!

aarchiba commented Oct 15, 2019

Uh oh!

mhvk commented Oct 15, 2019

Uh oh!

aarchiba commented Oct 15, 2019

Uh oh!

mhvk commented Oct 15, 2019

Uh oh!

aarchiba commented Oct 15, 2019

Uh oh!

aarchiba commented Oct 15, 2019

Uh oh!

aarchiba commented Oct 15, 2019

Uh oh!

mhvk commented Oct 15, 2019

Uh oh!

mhvk commented Oct 16, 2019

Uh oh!

aarchiba commented Oct 16, 2019

Uh oh!

mhvk commented Oct 16, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants