MAINT: Add error return to all casting functionality and NpyIter by seberg · Pull Request #17029 · numpy/numpy

seberg · 2020-08-07T22:12:27Z

NpyIter cannot properly indicate errors right away, defering them
in some case until the actual cleanup occurs. to make that slightly
with easier, this adds a new API function with (non-standard for
NpyIter) error return as -1.
A second function is one which only checks whether an error
occurred (i.e. NpyIter_ErrOccurred()) which is necessary in the
(more rare) case the iterator is not deallocated promptly after
the iteration is finished.

Well, this was mostly surprisingly straight forward, I am sure there are some stranger fallout in error-case corner cases, but those were always somewhat buggy, so more likely a few got fixed.

What we will have to discuss is the NpyIter API. Since the iteration functions cannot indicate an error, the first place where we can set it for sure is the deallocation (although many places forget to check for error returns). None of that actually makes the situation worse, but it made me feel like adding a slight modification of the NpyIter_Deallocate, as well as a function to check for such an error earlier if desired (needed for the python exposure).

At some point, we may want to (or need to) pass the PyThreadState to the casting functions as a next step. To do that, we would require new API. I frankly still have no idea whether PyGILState functions are considered bad, or if they are OK. If they are OK, there is not too much to worry about for now. (I send an email to python c-sig about it).

In any case, on this topic: I think we have to do this retrofitting, the main discussion that is needed is the NpyIter modification. ~~There are currently two tests failing with regards to structured array fields. So I will owe a beer or two to whoever notices the (hopefully stupid) mistake leading to that :/.~~

Marking this as draft, because at least the NpyIter changes should get a test probably.

numpy/core/src/multiarray/mapping.c

mattip · 2020-08-09T06:50:55Z

numpy/core/code_generators/numpy_api.py

I understand we can do without NpyIter_SafeDealloc. Do we need to export NpyIter_ErrorOccurred or can it remain an internal API?

I am not sure what to do best. We can definitely do without the first, yes. For the second one we need to figure out what to do with the error which is set on the fly (possibly when the API is not held).

The current solution is to stash the error and raise it later when we (probably) have the API and can return error values, because iternext does not report error returns. I think we will need to stash the error quite often to cleanup correctly, but that does not have to be done by NpyIter itself in principle. It may be possible to just flag an error return, I think this was my first reaction, because it can't be disrupted e.g. by a random PyErr_Clear() call.

The issue is that the iternext functions use 0 to indicate stop-iteration so we can't distinguish an aborted iteration (due to an error) from a finished iteration. That is why I needed to add NpyIter_ErrorOccurred(), there needs to be a way to know that the iteration was aborted when dealloc is not called immediately. That is mostly important for np.nditer (the python API).

As to the NpyIter_SafeDealloc, no I am not sure its the best design. If it seems nicer than the alternative, we could use it privately I suppose. NpyIter_Deallocate may need the Python API, so as I wrote it here, it has to either check PyErr_Occurred() or the caller has to stash the error.

Right now of course, we ignore that this type of thing can happen. In practice NpyIter_Dealloc is simply called with an error already set, and we basically hope it will all work out and the correct error is raised (possibly much belated).

maybe @eric-wieser has some thoughts

Hmmm, there is probably a remaining hole here. When the iteration is never finished, either due to an error or because np.nditer is just not iterated to the end, we need to clean up buffers in that case as well. Technically, we should copy the buffer back if (but I guess only if) there was no error. Also an existing issue.

I guess right now we simply, assume that NpyIter_Deallocate can handle such things (its called by np.nditer.__exit__). Probably the Reset functions are in a similar spot.

mattip · 2020-08-09T06:53:48Z

CI failure due to S390x, not related to this PR.

numpy/core/src/multiarray/nditer_constr.c

seberg · 2020-08-11T00:00:36Z

OK, I did not get to new tests today, but this now adds only a single function NpyIter_FinalizeBuffers(iterator, error_state), some some wonkier logic in places where the API is technically not held (the iterator reset functions).

This fixes the following for example:

def leak():
    arr = np.arange(10000)
    for i in range(1000):
        it = np.nditer(arr.astype(np.int32), op_dtypes=[np.object_], flags=["buffered", "external_loop"])
        next(it)

this is a no-error example, examples with error should behave identically.

Alternatively shows the same issue with structured:

def leak():
    arr = np.arange(10000)
    for i in range(1000):
        it = np.nditer(arr.astype("i,i"), op_dtypes=[np.dtype("O,i")], flags=["buffered", "external_loop"])
        next(it)

All of these examples leak the numbers from 1 to 8192 (or so), since that is the size of the first buffer chunk.

seberg · 2020-08-11T00:02:33Z

Not adding the new dealloc function basically means that we call PyErr_Occurred() a few times for no particularly good reason at all, other than that it is fine. I think the new function itself is good, I am not quite sure about the return value, since I dislike NPY_SUCCEED, but its what NpyIter uses throughout :(.

seberg · 2020-08-11T05:05:17Z

sorry for the back and forth. I don't particularly like that the NpyIter user (and we) have to call `PyErr_Occurred(), but here is the solution relying on no new API, but rather on that.

In particular, I think this means that there is no way to even find out if a casting error occurred without PyErr_Occurred() (and thus, the GIL). Usually that should not matter of course, since the user will iterate exactly once (ufunc.at is different, but its terrible.)

Now, this is reasonable, as long as we are happy with leaving the error set (users can stash it), and the fact that users must use PyErr_Occurred() (or its implicit call in NpyIter_Dealloc().

I guess practically, it changes almost nothing except that iteration may get aborted, when before that was not possible.

seberg · 2020-08-11T20:11:43Z

OK, a bit of proactive thinking would help me here. We don't have to figure out the full picture yet, but we do have two issues:

There are a few places, which technically should float the error (the Reset* functions), but don't. Technically, these are called without the GIL, so it would possibly be nice to know if an error occurred without re-grabbing the GIL. I admit, that is super corner case API.
Technically, we actually have to check floating point error flags as well (see bullet point 1. below). I honestly think the only way to do this is to add a single new function, if I must privately for now. Which checks floating point flags as well as whether iteration was aborted due to a cast error (bullet point 2. below). This can be retrofitted to only require (and grab) the GIL when an error occurred, otherwise an error should already be set.

So who is supposed to check for errors? Especially for floating point errors? Either the iterator does it, or the iterator may need to be able to signal that FPEs should be checked, does that require more API than the existing Ufunc API?

And do we live with the fact that there may be no way to know if an error occurred until calling NpyIter_Dealloc? Its probably fine, but dunno...

The full picture:

We have currently broken behaviour, which also proofs that in the correct world we sometimes can only report errors after the fact:
```
 np.positive(np.array([3e300]*10000), dtype=np.float32)  # OverflowWarning, because ufuncs always check
 np.array([3e300]*10000).astype(dtype=np.float32)  # No warning, because NpyIter does not check
```
I.e. the iterator may have to check for floating point error flags to behave correctly.
- It is tempting to say that users should be allowed to do similar "after the fact" error checking. That seems like a can of worms to me, so I am tempted to disregard it. Users would have to grab the GIL and abort from the inner-loop. I.e. if we add cleanup, errors will still not be possible during cleanup! Only floating point errors are special (or any other new builtin error support).
When we have the API (the GIL cannot be released), a Python error can be set, and we should abort iteration immediately. We currently never do that (we use faith-based error handling after all). The only way the user (and NpyIter itself, but that could be fixed) can find about this currently is PyErr_Occurred()
- We do sometimes (and should generally) set errors in random ufuncs, by temporarily grepping the GIL (maybe only integer powers), and especially for warnings that is (or will be) useful, IMO. It seems just a lot more convenient to handle errors right away normally. (Yes, this requires the PyGILState API, and we may need to reserve passing in more info at some point, but that seems viable) makes just as much sense for casting. Honestly, the only argument against this is floating point flags (where it would be costly) and some strange sense that ufunc inner-loops should work without Python, which in my opinion would be misguided.

NpyIter cannot properly indicate errors right away, defering them in some case until the actual cleanup occurs. to make that slightly with easier, this adds a new API function with (non-standard for NpyIter) error return as -1. A second function is one which only checks whether an error occurred (i.e. `NpyIter_ErrOccurred()`) which is necessary in the (more rare) case the iterator is not deallocated promptly after the iteration is finished.

seberg · 2020-08-12T00:16:32Z

OK, marking ready for review. Its messed up, but there is nothing to do about it. I think the tests should cover things fairly well.

I would be happy to think about what new API can help to fix the issue, and we may very much have to soon enough, but there is no obvious choice, so I can't just make a call on my own. And if we are unwilling to add new API, this is probably about as good as it can get.

The NpyIter changes should probably be looked at carefully, to ensure I did not accidentally modify things seriously. Generally, it seems the whole API might be a bit dubious about partial iterations, I guess its plausible nobody ever used that...

seberg · 2020-08-12T16:57:30Z

I don't think the codecov failure has too much relevance. Quite a few things simply can't be changed, or are obvious error-path corrections.

doc/release/upcoming_changes/17029.compatibility.rst

mattip

LGTM. I would like to put this in soon to enable further refactoring.

mattip · 2020-08-17T14:07:08Z

Thanks @seberg

Comparison result is always the same Comparison is always false because res <= 0. Indeed the for(;;) iteration may stop on 3 different breaks: res = -1; break; if (!res) { break; if (!res) { break; Therefore, after the loop, the only possible values of res are -1 and 0, and (res > 0) is always false. This originates in 37dc69c (numpy#17029).

Comparison result is always the same Comparison is always false because res <= 0. Indeed the for(;;) iteration may stop on 3 different breaks: res = -1; break; if (!res) { break; if (!res) { break; Therefore, after the loop, the only possible values for res are -1 and 0, and (res > 0) is always false. This originates in 37dc69c (numpy#17029).

seberg added Priority: high High priority, also add milestones for urgent issues 01 - Enhancement 03 - Maintenance component: numpy._core labels Aug 7, 2020

seberg commented Aug 7, 2020

View reviewed changes

numpy/core/src/multiarray/mapping.c Outdated Show resolved Hide resolved

seberg force-pushed the cast-error-return branch from 01f8b51 to 8a4b362 Compare August 7, 2020 22:56

seberg marked this pull request as draft August 7, 2020 22:57

seberg force-pushed the cast-error-return branch 2 times, most recently from af7fc01 to 2b677e7 Compare August 8, 2020 01:22

mattip reviewed Aug 9, 2020

View reviewed changes

eric-wieser reviewed Aug 9, 2020

View reviewed changes

numpy/core/src/multiarray/nditer_constr.c Outdated Show resolved Hide resolved

seberg commented Aug 9, 2020

View reviewed changes

numpy/core/src/multiarray/nditer_constr.c Outdated Show resolved Hide resolved

seberg commented Aug 9, 2020

View reviewed changes

numpy/core/src/multiarray/nditer_constr.c Outdated Show resolved Hide resolved

seberg force-pushed the cast-error-return branch from 262cb92 to 82fbf87 Compare August 10, 2020 23:57

seberg added 2 commits August 11, 2020 18:54

TST: Add tests for correct nditer buffer clearing on cast error

101192c

seberg force-pushed the cast-error-return branch from 6a2cb7f to 101192c Compare August 11, 2020 23:55

seberg marked this pull request as ready for review August 11, 2020 23:57

DOC: Add release note for casting error changes

0328a8c

seberg commented Aug 12, 2020

View reviewed changes

doc/release/upcoming_changes/17029.compatibility.rst Outdated Show resolved Hide resolved

Update doc/release/upcoming_changes/17029.compatibility.rst

f89486c

mattip approved these changes Aug 13, 2020

View reviewed changes

mattip merged commit 67c7066 into numpy:master Aug 17, 2020

seberg deleted the cast-error-return branch August 17, 2020 14:09

seberg mentioned this pull request Aug 22, 2020

BUG: nditer (and cast functions in general) do not support cast errors #15411

Closed

larsoner mentioned this pull request Aug 31, 2020

BUG: np.less segfault regression #17198

Closed

seberg mentioned this pull request Oct 11, 2020

SystemError on invalid ndarray / Path operation #7360

Closed

ahaldane mentioned this pull request Feb 20, 2021

BUG: Segfault in nditer buffer dealloc for Object arrays #18450

Merged

charris mentioned this pull request Feb 22, 2021

BUG: Segfault in nditer buffer dealloc for Object arrays #18469

Merged

DimitriPapadopoulos mentioned this pull request Sep 24, 2021

MAINT: Fix LGTM.com warning: Comparison result is always the same (res) #19944

Merged

mattip mentioned this pull request Nov 21, 2022

BLD: Move some builds to Cirrus CI #22640

Closed

6 tasks

Uh oh!

Conversation

seberg commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mattip Aug 9, 2020

Choose a reason for hiding this comment

Uh oh!

seberg Aug 9, 2020

Choose a reason for hiding this comment

Uh oh!

mattip Aug 9, 2020

Choose a reason for hiding this comment

Uh oh!

seberg Aug 9, 2020

Choose a reason for hiding this comment

Uh oh!

mattip commented Aug 9, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seberg commented Aug 11, 2020

Uh oh!

seberg commented Aug 11, 2020

Uh oh!

seberg commented Aug 11, 2020

Uh oh!

seberg commented Aug 11, 2020

Uh oh!

seberg commented Aug 12, 2020

Uh oh!

seberg commented Aug 12, 2020

Uh oh!

Uh oh!

mattip left a comment

Choose a reason for hiding this comment

Uh oh!

mattip commented Aug 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seberg commented Aug 7, 2020 •

edited

Loading