-
-
Notifications
You must be signed in to change notification settings - Fork 12k
BUG: Initialize the full nditer buffer in case of error #18813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/core/tests/test_nditer.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cananot -> cannot
3922836 to
0f9b316
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the code to reset skip_transfer = 0 remain? Is there ever a case where both reuse_reduce_loops && prev_dataptrs[iop] == ad_ptrs[iop] and PyDataType_FLAGCHK(dtypes[iop], NPY_NEEDS_INIT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case you are saying seems very plausible to me. Note that NPY_NEEDS_INIT is pretty much equivalent to "object dtype", so it isn't much of a limitation.
Since this is limited to object-dtype and for all other dtypes the transfer is skipped, I would be surprised if it was necessary. But, I have to dig deeper :/... In the case of an object read-write loop, we have to be sure that the buffer is not cleared to be able to reuse it (this translates to "move references" not being used to write it back to the array). I guess I have to try and make sure we have a test for this (we may well have, but object dtype tests tend to be a bit spotty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think you are right, we have to copy always if:
transferinfo[iop].write.func != NULL &&
PyDataType_REFCHK(transferinfo[iop].write.descriptors[0])
(or similar). But I am still struggling with a reproducer, maybe I have to set up a weirdly complicated nditer manually...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess its also a bit confusing that NEEDS_INIT and HASREF are really synonymous... Anything that has reference must require initialization. But I can't see of a reason why the other way around can make sense (requiring initialization but for a reason other than HASREF).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest thing to do here would be to remove only line 2536, maybe with a /* TODO: do we really need this? */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was just the, not really ideal, try to rephrase the current logic into something slightly more to the point. I prefer to rephrase it (although by just making it if (!skip_transfer || PyDataType_REFCHK(dtypes[iop])) {)
I finally found a reproducer for at least one way this can fail with reductions...
This is necessary because in some rare cases (reductions), we may not actually use the full buffer. In that case, the cleanup-on-error code would have to grow smart enough to handle these cases. It seems much simpler to just always initialize the full buffers, even if we may not end up using them. Admittedly, the old logic might have skipped the buffer clearing (especially the full buffer) in a fair bit of cases, but since this is only relevant for `object` dtype, I assume this is fine.
Buffer must always either contain NULL or valid references (assuming the dtype supports references) in order to allow cleanup in case of errors. It is thus unnecessary to clear buffers before every copy, if they contain NULL, all is fine. If they contain non-NULL, we should also DECREF those references (so it would be incorrect as well). Buffers thus need the memset exactly once: directly upon allcoation (which we do now). After this any transfer from/to the buffer needs to ensure that the buffer is always in a good state.
0f9b316 to
b292603
Compare
| # force casting with `dtype=object` | ||
| res = np.sum(arr, axis=(1, 2), dtype=object, out=out) | ||
| assert_array_equal(res, np.full(8000, 4, dtype=object)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That is quite the edge case.
|
Thanks @seberg |
This is necessary because in some rare cases (reductions), we may
not actually use the full buffer. In that case, the cleanup-on-error
code would have to grow smart enough to handle these cases.
It seems much simpler to just always initialize the full buffers, even
if we may not end up using them.
Admittedly, the old logic might have skipped the buffer clearing
(especially the full buffer) in a fair bit of cases, but since
this is only relevant for
objectdtype, I assume this is fine.The second commit removes the current
memsetcalls that I am very sure don't make much sense. But to err on the safe side, we could keep them in for the 1.20.x branch.Closes gh-18810