Skip to content

Conversation

@brianmaissy
Copy link
Contributor

@brianmaissy brianmaissy commented Feb 9, 2019

This is a preliminary implementation of the feature suggested in #4736.

I wanted to get something down in code to further our ability to discuss API and implementation options. Even if we decide not to change any of the main points, I will go over the implementation again for style and documentation, improve the tests, update the docs, and write a changelog entry.

I implemented approximate comparison for both datetimes and timedeltas. In the current implementation at least, the same implementation pretty elegantly works for both.

Here are some issues that I though of while implementing it:

  1. I implemented it as a top-level subclass of ApproxBase, so it does not support approximate comparison of tuples, lists, or dicts of datetimes and timedeltas. On the one hand, users might be surprised that something they expect to work doesn't. On the other, this is by far the simpler way, both in terms of implementation and API. In order to support a mixed list of numbers and datetimes, we would need a way of passing different tolerances for different elements, and I can't think of a way to do that cleanly. We could support iterables containing only datetimes and timedeltas, which might be a good compromise. In either case, maybe we should add an explicit type check to the constructor of ApproxScalar explaining that it is not supported.
  2. I went with disallowing relative tolerances both for datetimes and timedeltas, since their interpretation is less clear (and less useful) than for numbers. A case could be made for allowing relative tolerance for timedeltas, but it's not clear what a reasonable default should be, and it breaks some of the symmetry between the datetime and timedelta cases.
  3. In the current implementation, the absolute tolerance parameter must be a timedelta. I considered the idea of allowing numbers and interpreting them as a number of seconds, or adding other kwargs so we could do something like approx(expected, seconds=5), similar to the constructor of timedelta. But I think it obscures clarity and simplicity for a very marginal gain in conciseness.
  4. Since there is no relative tolerance, the only parameter which is relevant to pass to approx() is abs. The sort of annoying thing is that abs is the third parameter to approx(), so we can't pass the tolerance as a second parameter without the keyword abs=. I thought of playing games with the arguments in order to switch the order, but I think it would create more problems than it's worth.
  5. I didn't do a short-circuit check that the actual value is exactly equal to the expected, because the cost of the relative calculation isn't much more than the cost of the exactly equals calculation, so it will be a waste in the common case without really saving much in the worst case.
  6. Should we validate the type of the actual value at comparison time? I noticed that for numeric types we don't, and the user will get the TypeError whenever it happens to arise, for example when subtracting the actual from the expected value. Should we stick with this behavior, or do explicit type checking when comparing, or at least catch the TypeError and re-raise a more indicative one? Explicit checking might be a good idea because subtracting a timedelta from a datetime will work, so comparing a timedelta to the approx of a datetime and comparing a datetime to the approx of a timedelta will raise different exceptions.

@brianmaissy
Copy link
Contributor Author

brianmaissy commented Feb 10, 2019

TODO: fix timezone compatibility with python 2.7

@kalekundert
Copy link
Contributor

Thanks for taking a first pass at this. Here are a few comments off the top of my head:

  1. I was expecting this functionality to be implemented as part of the ApproxScalar class, so that all the various container types would be supported*. I don't think mixed-type containers would be a problem as long as ApproxScalar raises a ValueError when two incompatible types (e.g. datetime and int) are compared.

  2. The short-circuit check if only necessary (if I remember right) to correctly handle nan and inf. So I don't think you would need to worry about that here.

  • That said, I forgot that ApproxDecimal is it's own subclass. Presumably it doesn't work with containers either? If so, that's a problem. Maybe the best thing would be to have the container classes somehow invoke approx() rather than ApproxScalar. That would make it easier to implement support for new types like this, and it might also give support for nested containers, which is definitely something we get asked for sometimes. I've got a feeling this wouldn't be a trivial change, though.

@brianmaissy
Copy link
Contributor Author

How would you propose to define the tolerance for mixed-type containers?

@RonnyPfannschmidt
Copy link
Member

im under the impression we need to take a look at this more structurally - until we have a structure that brings together numeric and time data we should perhaps take a look at having the approximations for date/time under a different name as they require different behaviour

@brianmaissy
Copy link
Contributor Author

Why does time require behavior that is more different than Decimal? I think they are similar in that the semantics of approx apply to them, and in that they require a tolerance specified in different units.

The whole premise of my proposal is that time data can be compared numerically, under the same approx framework. If not, I'm not sure it's worthwhile.

If we are ok with the current implementation of ApproxDecimal, what's wrong with a parallel implementation of ApproxDateTime?

@RonnyPfannschmidt
Copy link
Member

@brianmaissy the main issue i see that the behaviours wrt values of rel/abs are structurally different and im not seeing a unification there off the bat, if there was i'd happily integrate it

@brianmaissy
Copy link
Contributor Author

Ok, never mind then

@RonnyPfannschmidt
Copy link
Member

@brianmaissy actually with a night of sleep i do see a possible unification

the key idea is that unlike number, where we know sane defaults, a datetime in a nested structure must be turned into a approx explicitly, so a nested object would be declared like

expected = approx({'a': 1, 'b': approx(datetime.now(), second=5)})

time related approximations would require a explicit cast in all cases and thus ambiguity of the nested case goes away

@RonnyPfannschmidt
Copy link
Member

i just noticed that i had gone with a display example of the extended api,
for now we really should just use the abs parameter and use a explicit time-delta to avoid confusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants