Allow longdouble values through input validation#9368
Conversation
mhvk
left a comment
There was a problem hiding this comment.
Thanks! Some smaller comments, but also a general one: ideally, one could just put in higher precision floats and it would "just work". I think it may be possible to do so by changing day_frac to special-case higher-precision - it may be possible to do so by adjusting two_sum with some calls that would be no-op for regular float (NOTE: NOT TESTED!)
def two_sum(a, b):
x = np.asanyarray(a + b, float)
eb = x - a # bvirtual in Shewchuk
ea = x - eb # avirtual in Shewchuk
eb = b - eb # broundoff in Shewchuk
ea = a - ea # aroundoff in Shewchuk
return x, (ea + eb).astype(float, copy=False)
It may be safer to just pre-condition the arguments, a bit like done in quantity_day_frac, doing a careful divmod(val[12], 1) before adding up the values and residuals.
| @pytest.mark.parametrize("f", ["mjd", "unix", "cxcsec"]) | ||
| def test_existing_types_refuse_longdoubles(f): | ||
| with pytest.raises(ValueError): | ||
| Time(np.longdouble(58000), format=f) |
There was a problem hiding this comment.
Not sure we'd want these tests to fail!
There was a problem hiding this comment.
Ah. Well. Hmm. I want precision loss to require explicit conversion, and none of these types preserve long double precision (yet?).
There was a problem hiding this comment.
I think all of them start with day_frac (in TimeFromEpoch.set_jds) so they might all support it if that routine is adjusted.
There was a problem hiding this comment.
The tests now all permit either outright rejection of long doubles or acceptance provided that the precision is preserved.
There was a problem hiding this comment.
I'm not sure about precision loss requiring explicit conversion (else raising an unavoidable exception). This will outright break existing code that happens to be using long doubles, and I'm almost always against breaking code without a strong reason.
In general I prefer to be a bit pragmatic about things like this, letting users be responsible for making decisions about data types, and when it matters then requiring that they have some cognizance of the implementation details. The second paragraph of the Time documentation starts with "All time manipulations and arithmetic operations are done internally using two 64-bit floats to represent time." I think that is a pretty good starting point for someone to infer that providing a higher-precision input might not be helping.
But as a compromise it might be fine to issue a warning in the case of potential precision loss.
There was a problem hiding this comment.
Our code currently emits a blizzard of warnings even for perfectly normal operation, so in fact people grow used to ignoring warnings. This is seen with LaTeX as well - too many warnings and they are ignored.
What is the use case where users pass long doubles in to these functions and would prefer lost precision to exceptions? Recall that manual downconversion is as easy as applying float.
There was a problem hiding this comment.
If we are not fixing the problem here, then raising an exception seems reasonable.
There was a problem hiding this comment.
I think of exceptions for issues that are definitely wrong or an error, and warnings for things that might lead to unexpected results. I don't think that saying that some code or package issues a blizzard of warnings is a good argument for ignoring the entire Python warning system.
If I do Time(np.longdouble(1), format='cxcsec'), there is no reason that should fail to execute. It is perfectly legal and gives the result that I expect. But I do think it is reasonable to issue a warning that about precision which an informed user can take into consideration.
It may be that somebody is already using long-double for some reason that we don't know about, and they understand what is happening, and they don't want their code to suddenly break for no good reason with astropy 4.0. I sometimes do things that might seem irrational or downright stupid to someone else, but we all have our reasons.
There was a problem hiding this comment.
Ah, convinced the other way again. Let's just agree to fix it!
There was a problem hiding this comment.
I agree on this - let's get something like #9361 merged and then there won't be much question left what to do with longdoubles because they will work.
This would absolutely call for testing with hypothesis, once we figured out exactly what the criterion was for successful computation. Well, okay, I think two_sum should do exactly what it does regardless of type it is passed: compute the sum to maximum precision available with the types. But making day_frac operate appropriately would take some doing, especially if we want it to behave well when also passed Quantities. And definitely testing. |
|
Yes, makes sense to have |
Sorry for the repetition, but my experience with hypothesis in getting the kinks out of PINT's numerical conversion code was extremely positive. It nailed down some difficult-to-spot bugs - including finding leap-second-day problems before I had even told it about leap second days - and programming it was easy (and had that delightful pieces-clicking-together feeling I get with Haskell). |
|
I'm all for trying hypothesis on |
|
Maybe they're off-topic for this PR but I also found myself poking around to confirm that there is no data validation on the jd1 and jd2 values set in Time objects, and there is no validation of types returned by .value - but there are strict and hard-to-explain requirements on them. This means that implementing custom time formats involves a lot of impenetrable error messages. I'd like to add some validation on both ends; ensure that a Time object's jd1 and jd2 are always arrays of float64s of the same shape no matter what a custom format tries to set them to, and ensure that .value always returns an array of that same shape of whatevers, or a singleton whatever, as appropriate; also that implementors of custom time formats get helpful error messages if their .value methods don't do this correctly. (It'd sort of be nice to be able to provide element-by-element conversion funcions that would automatically be broadcast; these would be too slow for actual numeric arrays but for object arrays they're not any slower and handling arrays of arbitrary shape is awkward; see https://github.com/nanograv/PINT/blob/a4b33eae5fce21a6010e8665bfeecf9caec8cf96/src/pint/pulsar_mjd.py#L501 for the mess.) |
|
So uh. Right now we quietly ignore it if someone passes a val2 when creating a Time object for a string format. That is, the user provides some information in val2, and astropy throws it away and doesn't tell them. I do not approve. I added some ValueErrors in this case, only to find that astropy has a test for explicitly this behaviour! Astropy's tests demand that we silently throw away information explicitly provided by the user. What? Can I change that? Please? |
|
@aarchiba - I agree ignoring @taldcroft and I also discussed that somewhat of a design mistake is to have p.s. Your |
I'll pull it out. I have a bad habit of fixing minor problems as they fly past, which gets them fixed but makes my PRs messy.
This does seem awkward. We are sort of stuck with it, though properties allow for some deviousness.
Ah, yes, probably. I've had to write similar code too many times now, to get functions to behave like ufuncs in terms of correctly maintaining array-ness (scalar->scalar, array->array including zero-dimensional) and shape. It's worse needing to have multiple input arrays that are broadcast together. I thought numpy.vectorize might get me this for free but at least for strings it didn't quite. Modern numpy.vectorize can let a vector-capable function act on an entire one-dimensional array, at least, which broadens its applicability. |
|
On ufuncs, see numpy/numpy#13105 - would definitely be good to have a solution... |
I'm not too hopeful, I remember frustration and argument about numpy zero-dimensional arrays versus scalars on the numpy mailing lists more than a decade ago. But it sure would be nice. |
|
Validation moved to #9373 |
One of my ancient-times fixes was to record the input array shape (or that it wasn't an array), reshape it into a 1d array, do the vectorized calculation, then reshape it into the same shape (and unpack if it wasn't an array) on output. We could do that, though as we allow Time objects to be indexed and reshaped it would be more complicated. We could at least record whether the input was an array or not so as to know whether to strip off the array boxing on output. |
|
I think the shaping, etc., and possibly returning to |
|
@aarchiba - on |
The day_frac with multiplication/division is not type-agnostic, because two_prod is not - it has the "split" function with a magic constant that would need to be adapted for each type (which I think is not too hard). More locally, I discovered in writing #9375 that you can't run jd1,jd2 through day_frac upon storage because some tests check for (and presumably some users rely upon?) their not being copied. |
|
Ah, good point about |
| assert getattr(t, custom_format_name) != getattr(t2, custom_format_name) | ||
|
|
||
|
|
||
| def test_mjd_unit_validation(): |
There was a problem hiding this comment.
These two unit conversion tests are unrelated to the code change, but presuming they are just to improve coverage then fine to leave them.
There was a problem hiding this comment.
It was not at all obvious to me that the code did the right thing in these cases so I wrote test cases, and I concluded that it did but that the added test cases were harmless. I'm not actually sure how they impact coverage numbers.
b300306 to
a044ee4
Compare
Description
This pull request is to make it possible for third-party code to take higher-precision floating-point values as input, in order to more easily work with the extended precision available from astropy Time objects. This would be relevant to #9361 for example.
Fixes #9341