Skip to content

ENH,DEP: Add DTypePromotionError and finalize the == and != FutureWarning/Deprecation#22707

Merged
seberg merged 22 commits intonumpy:mainfrom
seberg:invalid-promotion-exc
Dec 5, 2022
Merged

ENH,DEP: Add DTypePromotionError and finalize the == and != FutureWarning/Deprecation#22707
seberg merged 22 commits intonumpy:mainfrom
seberg:invalid-promotion-exc

Conversation

@seberg
Copy link
Copy Markdown
Member

@seberg seberg commented Dec 1, 2022

Opening as a draft, because I think I should probably split it out the pieces (although it isn't as huge as one might think).

The PR effectively does a few things:

  1. Add a new DTypePromotionError exception and use it for all promotion errors (DType or common instance where promotion shouldn't work).
  2. Reorganize ufunc dispatching very mildly to chain this error when it occurs with a UFuncNoLoopError, also make the two-operand version inherit from the first.
  3. Prohibit np.equal(timedelta, datetime), etc. to work. This is a very small break, because np.equal(timedelta, datetime, casting="unsafe") actually used to work.
  4. Finalize the derprecation/futurewarning :). When dtypes are fundamentally not comparable, an array of all False or all True is returned.

There are some tiny things to (possibly consider):

  • Object arrays could cause a nested error. That could probably be mitigated, but since == calls == in the ufunc loop NumPy arrays can't fall into that trap easily.
  • The timedelta/datetime change is a "breaking change", I think its fine, it is just too absurd and ridiculously niche (with the need for casting=).

The main changes to think about:

  • Should we push this into the ufunc? This could be done. It would require either some hacks for void, or properly moving void comparisons into the ufunc code (which is a larger project probably). I think this could also be done at a later time.
  • What errors do we want? It might actually be sufficient to have a single reliable DTypeTypeError for everything. I wouldn't mind introducing one as a baseclass for both DTypePromotionError and the UFunc* errors (we can still delay exposing the UFunc* errors. We could probably also get away without DTypePromotionError , but I think I like it, it is pretty nicely and narrowly defined.

If promotion fails internally, this effectively means that there is
no loop available for the given ufunc, so chain that error.

We only do this for `InvalidPromotion` since we assume other errors
may well be critical.
This finalize the deprecation warning, there are a couple of corner
cases that we don't get as nicely as we could.  E.g. we (for now?)
manually create the all-false or all-true result array with a bit
clunky subclass support (subclasses can always override `==` and
`!=` though).
We also keep some (existing) behavior for 0-D objects where we
just return `NotImplemented`, which means the result should end
up with Python booleans (which is probably just as well).
This allows correctly cathing the error, also adjust the the NoLoop
error is used for the the "binary no loop error".
For now, I think I may want to keep the casting error distinct.
@seberg seberg marked this pull request as draft December 1, 2022 16:56
@seberg seberg changed the title ENH,DEP: Add InvalidPromotion error and *finally* finalize the == and != FutureWarning/Deprecation ENH,DEP: Add InvalidPromotion error and finalize the == and != FutureWarning/Deprecation Dec 1, 2022
@seberg seberg force-pushed the invalid-promotion-exc branch from 404ecb8 to 4929777 Compare December 1, 2022 16:58
Copy link
Copy Markdown
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! It seems my main comment is about the name of the new exception...

import sys
import warnings

from ..exceptions import InvalidPromotion
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names are always hard and easy to argue about, but might PromotionError or DtypePromotionError be better?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I once heard the opinion that having "error" or "exception" inside an Exception is really redundant, but no strong opions.
I do agree that there is a point in including DType in the name though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle agreed, but given that essentially all CPython exceptions have "Error" in the name... I think I also prefer having DType in the name. So, maybe DTypePromotionError?

Py_RETURN_NOTIMPLEMENTED;
}
/*
* Unfortunately, this path doesn't do the full __array_ufunc__
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this matter? Didn't subclasses already get a chance when entering the ufunc?

Copy link
Copy Markdown
Member Author

@seberg seberg Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn't thought of that, but I think it still does because of nested ufunc calls. I.e. if Quantities would not implemented == but their __array_ufunc__ calls np.equal() on base-class arrays the error (correctly) is floated up, but the result (incorrectly) would not be wrapped.

EDIT: Should probably add a test that does implement __array_ufunc__ like that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see, you mean something like np.array([MyClass(...), MyClass(...)], dtype=object)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of something like:

class MyArr(np.ndarray):
    def __array_ufunc__(self, ufunc, ops, ...):
        array_ops = cast_self_to_numpy(ops)
        np_result = ufunc(array_ops, ...)
        return MyArr.from_numpy(np_result)

    def __eq__(self, other):
        return super().__eq__(self, other)  # gets called, but does nothing

where == does nothing and relies on __array_ufunc__. In that case ufunc(array_ops, ...) raises, so MyArr.from_numpy is never called (and thus the __array_ufunc__ call is effectively meaningless).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The class would then also have to implement __array_wrap__ or write __eq__ better. I guess that just has to be the case, since np.equal and ndarray.__eq__ work differently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment will just end up being confusing. How about

The following means that subclasses have to override __eq__ in order to
override the output of comparisons of incompatible types; overriding
__array_ufunc__ is itself not enough as that only helps to determine the
output of np.equal, not what happens to errors raised by it (though those
could be caught).

* so simply fill it with 0 or 1.
*/
result = _failed_comparison_workaround(self, other, cmp_op);
memset(PyArray_BYTES(res), cmp_op == Py_EQ ? 0 : 1, PyArray_NBYTES(res));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd tend to just use a standard fill function since this is not a critical path at all, but obviously not a big deal.

* should promote (two datetimes should have a common datetime), but cannot.
* The promotion is thus valid, but fails.
* This is important, because `arr == arr` for these cannot reasonably return
* all ``False`` values.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it fail instead? Maybe extend sentence "... values, but should fail instead."?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, should expand on this... The cast normally fails because it is considered "unsafe". The old hack is unnecessary because we now find the loop directly in either case.

# Make sure they broadcast to test result shape, use random values, since
# the actual value should be ignored
arr1 = np.random.randint(5, size=100).astype(dt1)
arr2 = np.random.randint(5, size=99)[:, None].astype(dt2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: np.newaxis is clearer.


Notes
-----
``InvalidPromotion`` derives from ``TypeError`` and if it occurs within
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace "if it occurs within" with "in"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrasing a bit, the use of parentheses wasn't nice anyway.

Datetimes and complex numbers are incompatible classes and cannot be
promoted:

>>> np.result_type(np.dtype("M8", "D")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example is scrambled; needs two np.dtype?

@mhvk
Copy link
Copy Markdown
Contributor

mhvk commented Dec 1, 2022

p.s. This fails one astropy test, but it was one that tested the status quo; having this would be an improvement!

@seberg seberg force-pushed the invalid-promotion-exc branch from 3952d1e to a6dc750 Compare December 1, 2022 18:18
@seberg seberg force-pushed the invalid-promotion-exc branch from a6dc750 to 0a638a9 Compare December 1, 2022 19:59
Python does even crash when you get it wrong, so this is unnecessar
@seberg
Copy link
Copy Markdown
Member Author

seberg commented Dec 2, 2022

@mroeschke I am curious about pandas. I am seeing a lot of failures in this test:

Details
    @pytest.mark.filterwarnings("ignore:elementwise comp:DeprecationWarning")
    def test_get_indexer_non_unique_np_nats(self, np_nat_fixture, np_nat_fixture2):
        expected_missing = np.array([], dtype=np.intp)
        # matching-but-not-identical nats
        if is_matching_na(np_nat_fixture, np_nat_fixture2):
            # ensure nats are different objects
            index = Index(
                np.array(
                    ["2021-10-02", np_nat_fixture.copy(), np_nat_fixture2.copy()],
                    dtype=object,
                ),
                dtype=object,
            )
            # pass as index to prevent target from being casted to DatetimeIndex
            indexer, missing = index.get_indexer_non_unique(
                Index([np_nat_fixture], dtype=object)
            )
            expected_indexer = np.array([1, 2], dtype=np.intp)
            tm.assert_numpy_array_equal(indexer, expected_indexer)
            tm.assert_numpy_array_equal(missing, expected_missing)
        # dt64nat vs td64nat
        else:
            index = Index(
                np.array(
                    [
                        "2021-10-02",
                        np_nat_fixture,
                        np_nat_fixture2,
                        np_nat_fixture,
                        np_nat_fixture2,
                    ],
                    dtype=object,
                ),
                dtype=object,
            )
            # pass as index to prevent target from being casted to DatetimeIndex
>           indexer, missing = index.get_indexer_non_unique(
                Index([np_nat_fixture], dtype=object)
            )


>   ???
E   OverflowError: Integer overflow getting a common metadata divisor for NumPy datetime metadata [as] and [m].

Mostly failing due to incompatible time units (NumPy cannot convert between them and now raises an error somewhere deeper in index.get_indexer_non_unique()).

Failing for incompatible time units like that seems rather minor on first sight. We could make whatever comparison is failing here pass if we argue that the values cannot possibly be equal. (The DeprecationWarning didn't warn of that possibility, but if we are sure of the result it makes sense to change, IMO...)

@seberg seberg force-pushed the invalid-promotion-exc branch from 256d2ae to c8be159 Compare December 2, 2022 11:19
@seberg seberg changed the title ENH,DEP: Add InvalidPromotion error and finalize the == and != FutureWarning/Deprecation ENH,DEP: Add DTypePromotionError and finalize the == and != FutureWarning/Deprecation Dec 2, 2022
@seberg
Copy link
Copy Markdown
Member Author

seberg commented Dec 2, 2022

OK, decided to change to DTypePromotionError. I agree that DType should be in there and "InvalidPromotion" reads no or barely better than "PromotionError", so there is no reason not to follow the typical patter.

@seberg seberg marked this pull request as ready for review December 2, 2022 12:31
Copy link
Copy Markdown
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Only trivial comments.

------------------------------------
The ``==`` and ``!=`` operators on arrays now always:

* raise errors that occur during comparisons
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to give at least one example "such as arrays that have incompatible shapes"

Py_RETURN_NOTIMPLEMENTED;
}
/*
* Unfortunately, this path doesn't do the full __array_ufunc__
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment will just end up being confusing. How about

The following means that subclasses have to override __eq__ in order to
override the output of comparisons of incompatible types; overriding
__array_ufunc__ is itself not enough as that only helps to determine the
output of np.equal, not what happens to errors raised by it (though those
could be caught).

* The following errors do not currently use "DTypePromotionError". This
* could be replaced (changing the OverflowError, though) with an explicit
* promotion error.
* This makes sense if we `arr_dt1 == arr_dt2` can reasonably return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if we" -> "since"?

I have a bit of trouble following the comment. Maybe easier to write it as

It might make sense to replace the following errors with "DTypePromotionError",
since we are here unable to convert one dtype to another.

But one would have to think of whether one then wants `arr_dt1 == arr_dt2` to return
an array of all `False`.

Aside: I don't think it should return an array of all False for either case, since it is just that
the conversion is difficult, not that it is not possible in principle. And some of the times
might well actually be equal (e.g., if they are 0?). So, I think an error is actually better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point about 0, I think that is a killer argument (until we wish to change the == implementation, but even then I think the argument makes sense).

@mroeschke
Copy link
Copy Markdown
Contributor

@mroeschke I am curious about pandas. I am seeing a lot of failures in this test:

Mostly failing due to incompatible time units (NumPy cannot convert between them and now raises an error somewhere deeper in index.get_indexer_non_unique()).

From a glance given the exception, integer overflow due to time unit conversion, that exception reason generally matches what pandas will do for supporting other resolutions other than nanoseconds (e.g. datetime64[us]) so I want to say it's reasonable to this to fail.

Copy link
Copy Markdown
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions and questions. Martin already did most of the review.

np.result_type("M8[s]", np.complex128)

Errors that occur while trying to find the correct promotion may raise a
different error.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is incomplete. Either remove it, or give more detail: what other error class can be emitted?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is mainly the weirder datetime unit conversions that fail. But maybe that is all obscure enough that removing the sentence is best, since I don't want to give the example.
(Will remove it)


from ._globals import _NoValue, _CopyMode
from . import exceptions
# Note that the following names are imported explicitly for backcompat:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have to export Python errors/warnings in the NumPy top-level namespace? If so those should be deprecated.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should follow-up with deprecating or at least hiding them from the main namespace. We could move the "namespace", I held off with that in the first PR, but the only thing that might notice is unpickling in an older version, and that is probably not worth the trouble for exceptions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #22735

Comment on lines +977 to +978
* At this point we are locked into comparing (both are arrays) or
* something we would have tried to convert.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment above: is the closing parenthesis in the wrong place? If we tried to convert it, did we fail? I would prefer it said this, but maybe it is not true.

Suggested change
* At this point we are locked into comparing (both are arrays) or
* something we would have tried to convert.
* At this point we comparing two ndarrays.

seberg and others added 2 commits December 5, 2022 10:39
Co-authored-by: Matti Picus <matti.picus@gmail.com>
It is a bit too obscure, but explaining that it happens for datetime
unit conversions seems more detailed then useful.

Updated to include a last non-indent line, which the release notes
machinery doesn't like otherwise...
@seberg seberg force-pushed the invalid-promotion-exc branch from b601e8a to 37d95c8 Compare December 5, 2022 12:01
Copy link
Copy Markdown
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one typo. I will push a fix, feel free to merge once CI passes


from ._globals import _NoValue, _CopyMode
from . import exceptions
# Note that the following names are imported explicitly for backcompat:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #22735

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Dec 5, 2022

Wow, that was a quick turnaround on this, thanks Matti and Marten! Fingers crossed that downstream doesn't mind the FutureWarning's suddenly being fixed :).

If anyone has an opinion about the error naming or anything else, I am happy to follow up of course!

@h-vetinari
Copy link
Copy Markdown
Contributor

h-vetinari commented Dec 6, 2022

I realise it's probably too late but given that this fixes a pretty long-standing issue (or at least annoyance), should this be backported to 1.24 (since it hasn't actually been released yet)?

@charris
Copy link
Copy Markdown
Member

charris commented Dec 6, 2022

this fixes a pretty long-standing issue

That is an argument for not backporting :) If it only fixed one simple issue, maybe, but I think this is more expansive and should probably sit in the development branch for a while.

@neutrinoceros
Copy link
Copy Markdown
Contributor

I'd like to add another argument for not backporting (as is, at least): this patch caused a regression downstream (see #22744)

@mhvk
Copy link
Copy Markdown
Contributor

mhvk commented Dec 6, 2022

Indeed, it also fails one astropy test (see #22707 (comment)), though it is one that we are very happy to fix. But it means one definitely should not backport - much better if numpy version checks can be by minor number only, not micro.

@h-vetinari
Copy link
Copy Markdown
Contributor

much better if numpy version checks can be by minor number only, not micro.

1.24.0 has not been released yet, so there'd be no need to drop to micro level.

I see the argument that it should sit in main for a while, though as pointed out the "breakage" often a positive thing due to removing the workarounds that were necessary for this issue.

I still find it a pity that we'll have to wait another ~6month for this. Had this PR been merged 2 weeks ago, its inclusion in 1.24.0 wouldn't be controversial.

But oh well, those 6 months will pass as well 🤷

@mhvk
Copy link
Copy Markdown
Contributor

mhvk commented Dec 6, 2022

Hmm, I had not realized numpy 1.24 was not out. Still, probably best to give people time to fix bugs; astropy and others test against -dev just for that purpose, and a bit of extra time is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants