Skip to content

MAINT: Add error return to all casting functionality and NpyIter#17029

Merged
mattip merged 4 commits intonumpy:masterfrom
seberg:cast-error-return
Aug 17, 2020
Merged

MAINT: Add error return to all casting functionality and NpyIter#17029
mattip merged 4 commits intonumpy:masterfrom
seberg:cast-error-return

Conversation

@seberg
Copy link
Copy Markdown
Member

@seberg seberg commented Aug 7, 2020

NpyIter cannot properly indicate errors right away, defering them
in some case until the actual cleanup occurs. to make that slightly
with easier, this adds a new API function with (non-standard for
NpyIter) error return as -1.
A second function is one which only checks whether an error
occurred (i.e. NpyIter_ErrOccurred()) which is necessary in the
(more rare) case the iterator is not deallocated promptly after
the iteration is finished.


Well, this was mostly surprisingly straight forward, I am sure there are some stranger fallout in error-case corner cases, but those were always somewhat buggy, so more likely a few got fixed.

What we will have to discuss is the NpyIter API. Since the iteration functions cannot indicate an error, the first place where we can set it for sure is the deallocation (although many places forget to check for error returns). None of that actually makes the situation worse, but it made me feel like adding a slight modification of the NpyIter_Deallocate, as well as a function to check for such an error earlier if desired (needed for the python exposure).

At some point, we may want to (or need to) pass the PyThreadState to the casting functions as a next step. To do that, we would require new API. I frankly still have no idea whether PyGILState functions are considered bad, or if they are OK. If they are OK, there is not too much to worry about for now. (I send an email to python c-sig about it).

In any case, on this topic: I think we have to do this retrofitting, the main discussion that is needed is the NpyIter modification. There are currently two tests failing with regards to structured array fields. So I will owe a beer or two to whoever notices the (hopefully stupid) mistake leading to that :/.

Marking this as draft, because at least the NpyIter changes should get a test probably.

@seberg seberg added Priority: high High priority, also add milestones for urgent issues 01 - Enhancement 03 - Maintenance component: numpy._core labels Aug 7, 2020
@seberg seberg force-pushed the cast-error-return branch from 01f8b51 to 8a4b362 Compare August 7, 2020 22:56
@seberg seberg marked this pull request as draft August 7, 2020 22:57
@seberg seberg force-pushed the cast-error-return branch 2 times, most recently from af7fc01 to 2b677e7 Compare August 8, 2020 01:22
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we can do without NpyIter_SafeDealloc. Do we need to export NpyIter_ErrorOccurred or can it remain an internal API?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what to do best. We can definitely do without the first, yes. For the second one we need to figure out what to do with the error which is set on the fly (possibly when the API is not held).

The current solution is to stash the error and raise it later when we (probably) have the API and can return error values, because iternext does not report error returns. I think we will need to stash the error quite often to cleanup correctly, but that does not have to be done by NpyIter itself in principle. It may be possible to just flag an error return, I think this was my first reaction, because it can't be disrupted e.g. by a random PyErr_Clear() call.

The issue is that the iternext functions use 0 to indicate stop-iteration so we can't distinguish an aborted iteration (due to an error) from a finished iteration. That is why I needed to add NpyIter_ErrorOccurred(), there needs to be a way to know that the iteration was aborted when dealloc is not called immediately. That is mostly important for np.nditer (the python API).

As to the NpyIter_SafeDealloc, no I am not sure its the best design. If it seems nicer than the alternative, we could use it privately I suppose. NpyIter_Deallocate may need the Python API, so as I wrote it here, it has to either check PyErr_Occurred() or the caller has to stash the error.

Right now of course, we ignore that this type of thing can happen. In practice NpyIter_Dealloc is simply called with an error already set, and we basically hope it will all work out and the correct error is raised (possibly much belated).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe @eric-wieser has some thoughts

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, there is probably a remaining hole here. When the iteration is never finished, either due to an error or because np.nditer is just not iterated to the end, we need to clean up buffers in that case as well. Technically, we should copy the buffer back if (but I guess only if) there was no error. Also an existing issue.

I guess right now we simply, assume that NpyIter_Deallocate can handle such things (its called by np.nditer.__exit__). Probably the Reset functions are in a similar spot.

@mattip
Copy link
Copy Markdown
Member

mattip commented Aug 9, 2020

CI failure due to S390x, not related to this PR.

@seberg seberg force-pushed the cast-error-return branch from 262cb92 to 82fbf87 Compare August 10, 2020 23:57
@seberg
Copy link
Copy Markdown
Member Author

seberg commented Aug 11, 2020

OK, I did not get to new tests today, but this now adds only a single function NpyIter_FinalizeBuffers(iterator, error_state), some some wonkier logic in places where the API is technically not held (the iterator reset functions).

This fixes the following for example:

def leak():
    arr = np.arange(10000)
    for i in range(1000):
        it = np.nditer(arr.astype(np.int32), op_dtypes=[np.object_], flags=["buffered", "external_loop"])
        next(it)

this is a no-error example, examples with error should behave identically.

Alternatively shows the same issue with structured:

def leak():
    arr = np.arange(10000)
    for i in range(1000):
        it = np.nditer(arr.astype("i,i"), op_dtypes=[np.dtype("O,i")], flags=["buffered", "external_loop"])
        next(it)

All of these examples leak the numbers from 1 to 8192 (or so), since that is the size of the first buffer chunk.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Aug 11, 2020

Not adding the new dealloc function basically means that we call PyErr_Occurred() a few times for no particularly good reason at all, other than that it is fine. I think the new function itself is good, I am not quite sure about the return value, since I dislike NPY_SUCCEED, but its what NpyIter uses throughout :(.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Aug 11, 2020

sorry for the back and forth. I don't particularly like that the NpyIter user (and we) have to call `PyErr_Occurred(), but here is the solution relying on no new API, but rather on that.

In particular, I think this means that there is no way to even find out if a casting error occurred without PyErr_Occurred() (and thus, the GIL). Usually that should not matter of course, since the user will iterate exactly once (ufunc.at is different, but its terrible.)

Now, this is reasonable, as long as we are happy with leaving the error set (users can stash it), and the fact that users must use PyErr_Occurred() (or its implicit call in NpyIter_Dealloc().

I guess practically, it changes almost nothing except that iteration may get aborted, when before that was not possible.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Aug 11, 2020

OK, a bit of proactive thinking would help me here. We don't have to figure out the full picture yet, but we do have two issues:

  1. There are a few places, which technically should float the error (the Reset* functions), but don't. Technically, these are called without the GIL, so it would possibly be nice to know if an error occurred without re-grabbing the GIL. I admit, that is super corner case API.

  2. Technically, we actually have to check floating point error flags as well (see bullet point 1. below). I honestly think the only way to do this is to add a single new function, if I must privately for now. Which checks floating point flags as well as whether iteration was aborted due to a cast error (bullet point 2. below). This can be retrofitted to only require (and grab) the GIL when an error occurred, otherwise an error should already be set.

So who is supposed to check for errors? Especially for floating point errors? Either the iterator does it, or the iterator may need to be able to signal that FPEs should be checked, does that require more API than the existing Ufunc API?

And do we live with the fact that there may be no way to know if an error occurred until calling NpyIter_Dealloc? Its probably fine, but dunno...


The full picture:

  1. We have currently broken behaviour, which also proofs that in the correct world we sometimes can only report errors after the fact:

     np.positive(np.array([3e300]*10000), dtype=np.float32)  # OverflowWarning, because ufuncs always check
     np.array([3e300]*10000).astype(dtype=np.float32)  # No warning, because NpyIter does not check
    

    I.e. the iterator may have to check for floating point error flags to behave correctly.

    • It is tempting to say that users should be allowed to do similar "after the fact" error checking. That seems like a can of worms to me, so I am tempted to disregard it. Users would have to grab the GIL and abort from the inner-loop. I.e. if we add cleanup, errors will still not be possible during cleanup! Only floating point errors are special (or any other new builtin error support).
  2. When we have the API (the GIL cannot be released), a Python error can be set, and we should abort iteration immediately. We currently never do that (we use faith-based error handling after all). The only way the user (and NpyIter itself, but that could be fixed) can find about this currently is PyErr_Occurred()

    • We do sometimes (and should generally) set errors in random ufuncs, by temporarily grepping the GIL (maybe only integer powers), and especially for warnings that is (or will be) useful, IMO. It seems just a lot more convenient to handle errors right away normally. (Yes, this requires the PyGILState API, and we may need to reserve passing in more info at some point, but that seems viable) makes just as much sense for casting. Honestly, the only argument against this is floating point flags (where it would be costly) and some strange sense that ufunc inner-loops should work without Python, which in my opinion would be misguided.

seberg added 2 commits August 11, 2020 18:54
NpyIter cannot properly indicate errors right away, defering them
in some case until the actual cleanup occurs. to make that slightly
with easier, this adds a new API function with (non-standard for
NpyIter) error return as -1.
A second function is one which only checks whether an error
occurred (i.e. `NpyIter_ErrOccurred()`) which is necessary in the
(more rare) case the iterator is not deallocated promptly after
the iteration is finished.
@seberg seberg force-pushed the cast-error-return branch from 6a2cb7f to 101192c Compare August 11, 2020 23:55
@seberg seberg marked this pull request as ready for review August 11, 2020 23:57
@seberg
Copy link
Copy Markdown
Member Author

seberg commented Aug 12, 2020

OK, marking ready for review. Its messed up, but there is nothing to do about it. I think the tests should cover things fairly well.

I would be happy to think about what new API can help to fix the issue, and we may very much have to soon enough, but there is no obvious choice, so I can't just make a call on my own. And if we are unwilling to add new API, this is probably about as good as it can get.

The NpyIter changes should probably be looked at carefully, to ensure I did not accidentally modify things seriously. Generally, it seems the whole API might be a bit dubious about partial iterations, I guess its plausible nobody ever used that...

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Aug 12, 2020

I don't think the codecov failure has too much relevance. Quite a few things simply can't be changed, or are obvious error-path corrections.

Copy link
Copy Markdown
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would like to put this in soon to enable further refactoring.

@mattip mattip merged commit 67c7066 into numpy:master Aug 17, 2020
@mattip
Copy link
Copy Markdown
Member

mattip commented Aug 17, 2020

Thanks @seberg

@seberg seberg deleted the cast-error-return branch August 17, 2020 14:09
DimitriPapadopoulos added a commit to DimitriPapadopoulos/numpy that referenced this pull request Sep 24, 2021
Comparison result is always the same
Comparison is always false because res <= 0.

Indeed the for(;;) iteration may stop on 3 different breaks:

            res = -1;
            break;

            if (!res) {
                break;

            if (!res) {
                break;

Therefore, after the loop, the only possible values of res are -1 and 0,
and (res > 0) is always false.

This originates in 37dc69c (numpy#17029).
DimitriPapadopoulos added a commit to DimitriPapadopoulos/numpy that referenced this pull request Sep 24, 2021
Comparison result is always the same
Comparison is always false because res <= 0.

Indeed the for(;;) iteration may stop on 3 different breaks:

            res = -1;
            break;

            if (!res) {
                break;

            if (!res) {
                break;

Therefore, after the loop, the only possible values for res are -1 and 0,
and (res > 0) is always false.

This originates in 37dc69c (numpy#17029).
DimitriPapadopoulos added a commit to DimitriPapadopoulos/numpy that referenced this pull request Sep 24, 2021
Comparison result is always the same
Comparison is always false because res <= 0.

Indeed the for(;;) iteration may stop on 3 different breaks:

            res = -1;
            break;

            if (!res) {
                break;

            if (!res) {
                break;

Therefore, after the loop, the only possible values for res are -1 and 0,
and (res > 0) is always false.

This originates in 37dc69c (numpy#17029).
howjmay pushed a commit to howjmay/numpy that referenced this pull request Sep 29, 2021
Comparison result is always the same
Comparison is always false because res <= 0.

Indeed the for(;;) iteration may stop on 3 different breaks:

            res = -1;
            break;

            if (!res) {
                break;

            if (!res) {
                break;

Therefore, after the loop, the only possible values for res are -1 and 0,
and (res > 0) is always false.

This originates in 37dc69c (numpy#17029).
@mattip mattip mentioned this pull request Nov 21, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants