ENH: umath: ensure ufuncs are well-defined with memory overlapping inputs#8043
ENH: umath: ensure ufuncs are well-defined with memory overlapping inputs#8043charris merged 33 commits intonumpy:masterfrom
Conversation
5537a37 to
48253f2
Compare
There was a problem hiding this comment.
Seems rather large for inline ;)
There was a problem hiding this comment.
OTOH, I don't really want to put it in a .c file since the logic is tightly coupled to the macros here (and I suspect the multiarray/umath split adds to the mess). Removing the inline makes it emit compiler warnings. I could write this as a macro, but that's messy.
numpy/core/tests/test_mem_overlap.py
Outdated
There was a problem hiding this comment.
Should update this now that 1.12 is branched. The cversion and C-API number also need updating, see documentation in doc/HOWTO_RELEASE.rst.txt.
|
Just starting to look at this. |
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Now that I look at this again, I guess op_tmp should be placed back into op instead of being discarded.
There was a problem hiding this comment.
?. What other array? I think the operation here could be best explained with the help of some diagrams showing the data access and writebacks. The operation can be performed, and is valid, even if there is no overlap, just that it isn't necessary if no overlap.
There was a problem hiding this comment.
I think this might be help clarify behavior when the indexing is many to one in either array.
There was a problem hiding this comment.
This is only for two operands? For inplace operations, both inputs are read, but only one is written too. Or does this apply for three operands, two inputs and one output. Or is it more general than that?
There was a problem hiding this comment.
It applies to any number of operands.
numpy/core/src/multiarray/mapping.c
Outdated
There was a problem hiding this comment.
Hmm, so you are actually checking indexes rather than the resulting view?
There was a problem hiding this comment.
It's used to checking the overlap between the array operated on, and the index array used in ufunc.at, for example
x = np.arange(100)
np.invert.at(x[::-1], x) # <- crazy stuff if no x.copy() of the second argument
It doesn't look at the content of the indices, so it's a somewhat conservative check.
|
Just a first read through. This will need a release note eventually. |
|
Probably should bring in @seberg on this also. |
numpy/core/tests/test_mem_overlap.py
Outdated
There was a problem hiding this comment.
I would need to check again if these tests cover all places where the copyifoverlap behavior is in, and/or whether some of the tests are unnecessary.
There was a problem hiding this comment.
ok, checked --- removing any of NPY_COPY_IF_OVERLAP et al. causes some test to fail
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
why is this line added? a bug in the old code or due to the other changes?
There was a problem hiding this comment.
I think it's a noop. Not necessary to add.
There was a problem hiding this comment.
On second thought, I think it's useful to have here to ensure correctness, if we later on change NPY_ITER_COPY_IF_OVERLAP to make copies of also the input arrays.
Add a new NPY_ITER_COPY_IF_OVERLAP iterator flag to NpyIter, which instructs it to check if read operands overlap with write operands in memory, and make temporary copies to eliminate detected overlap. Thanks to Sebastian Berg.
These can differ if the iterator decides to make temporary copies of output arrays.
…ABLE loops These loops iterate over whole arrays in "trivial" order, so that it is possible to reason about the data dependency.
After this commit, all code paths in PyUFunc_GeneralizedFunction and PyUFunc_GenericFunction perform input/output operand memory overlap detection, and make copies to eliminate it.
…c_loops Because the wheremask loop does not write to all elements of the array, it needs to consider output arrays as READWRITE, in case COPY_IF_OVERLAP makes temporary copies. As the iterator may make temporary copies of arrays, make sure to pass those to __array_prepare__ instead of the originals.
This adds a new method PyArray_MapIterArrayCopyIfOverlap to the API.
|
Oh, right, now I understand! ...this rings a bell, too. @seberg, weren't you telling me at some point about how there's some code inside numpy that relies on implementing reducations via zero-strided calls to the non-reduce loops? Would that be affected by this? (My guess is that whatever code is using this trick is "below" the overlap checking in the stack so unaffected by it, and I'm totally happy with this not being something we guarantee to end-users. Also hopefully no-one is writing horrible |
|
The copies are triggered only if the iterator is given
NPY_ITER_COPY_IF_OVERLAP flag, which I was careful to set only if I
understand the code --- so it is not set for the reduction iterators,
and the overlap issue is handled before the iterator.
|
|
@pv Do you feel that this is about ready? |
|
@charris: I've read this several times through now, and only finding forgotten decref on error handling. I'm fairly confident now that it's what it says on the tin, and doesn't do anything more. It's a bit ugly that this adds |
|
One additional thing is that this probably won't work on PyPy due to the use of updateifcopy, but I guess that's part of a bigger pypy issue that needs separate fixes (probably sprinkling |
|
Well, I don't really understand the UPDATEIFCOPY issue, but since @mattip is on holiday, I'll comment anyway. All the new tests appear to pass on both pypy and pypy3. I'm guessing that's because the copy cannot be seen by the RPython GC. So I think this is OK for us. |
|
Yes, the updateifcopy arrays here are never passed back to Python. If
cpyext is effectively refcounted in this case, then there should be no
problem. I guess things would break also in other places if this wasn't
the case, as the iterator uses updateifcopy temporaries also for other
purposes.
|
numpy/core/src/multiarray/mapping.c
Outdated
|
|
||
| if (index_type & (HAS_FANCY | HAS_BOOL)) { | ||
| for (i = 0; i < num; ++i) { | ||
| if (indices[i].object != NULL && PyArray_Check(indices[i].object) && |
There was a problem hiding this comment.
Should have extra indent here to separate the condition from the block.
There was a problem hiding this comment.
Bit tricky with the long line, but it doesn't read well as is.
| } | ||
| } | ||
|
|
||
| if (extra_op != NULL && PyArray_Check(extra_op) && |
numpy/core/src/multiarray/mapping.c
Outdated
|
|
||
| /*NUMPY_API | ||
| * | ||
| * Use advanced indexing to iterate an array. Please note |
There was a problem hiding this comment.
This needs to be documented somewhere public. It seems dangerous to expose an API that may change...
There was a problem hiding this comment.
How could someone detect the change to this particular function?
There was a problem hiding this comment.
This the is old comment for PyArray_MapIterArray, not added by me.
There was a problem hiding this comment.
Anyway, removing it, since it's unlikely we will change this...
| } | ||
|
|
||
| if (flags & NPY_ITER_COPY_IF_OVERLAP) { | ||
| /* Perform operand memory overlap checks if requested */ |
There was a problem hiding this comment.
Could add more detail on what is being checked here, the notes below are a bit too local.
| * make sense to go further. | ||
| */ | ||
| may_share_memory = solve_may_share_memory( | ||
| op[iop], op[iother], 1); |
There was a problem hiding this comment.
Maybe extra indent here, or even break the args and align.
| return 1; | ||
| } | ||
|
|
||
| /* Arrays overlapping in memory may be equivalently iterable |
There was a problem hiding this comment.
Multiline comment not correctly formatted ;) Also below.
| } | ||
| else if (stride1 < 0) { | ||
| arr1_ahead = (stride1 <= stride2 && | ||
| (npy_uintp)PyArray_BYTES(arr1) <= (npy_uintp)PyArray_BYTES(arr2)); |
There was a problem hiding this comment.
Hmm... Out of curiousity, did the compiler complain without the cast?
There was a problem hiding this comment.
The casts are not needed (probably leftovers from earlier versions).
numpy/core/src/umath/ufunc_object.c
Outdated
| for (i = nin; i < nop; ++i) { | ||
| /* | ||
| * Because we don't write to all elements, the output arrays | ||
| * must be considered READWRITE by the iterator. |
There was a problem hiding this comment.
Updateifcopy copies all elements back, not only those for which wheremask is true, so the old content of the output array must be copied there.
|
Not done reviewing yet. Determining when a copy isn't needed for overlapping arrays seems tricky except in single operand the linear case. ISTR that the iterator may coalesce axes. |
It's unlikely we will break backward compatibility, so better not claim we will do it in the code.
|
I'm going to put this in so it will start getting more testing. I don't claim to have understood all the details, but I doubt I will discover any errors by inspection. The write up in the release notes is very nice, you should consider copying that over to the official documentation. There are still some style nits that I may attempt to clean up later, I would gently urge you to pay more attention to spaces around operators ('+') and line length, even though the latter may not always be as readable. Thanks for this work, Pauli. |
|
Wow, cool. Assuming nothing goes haywire, this is the next huge addition :)! |
Make ufuncs and ufunc operations to make temporary copies if input and output operands overlap in memory. Currently (cf. gh-6272, gh-1683) output from such operations is undefined. With this PR, the result is the same as if the operands were non-overlapping.
Fixes gh-1683
Adds new flags to NpyIter, and one new API function is added to deal with MapIter.
This PR does not fix up overlap handling in array assignment, that needs to be done later (and is slightly more hairy --- some common overlapping cases are in fact already handled). Unlike adding overlap-if-copy flag to NpyIter, it appears a similar flag would not be as useful in MapIter, as there are many special-cased code paths, and MapIter is not always made aware of all arrays operated on.