Skip to content

Conversation

@seberg
Copy link
Member

@seberg seberg commented Apr 20, 2021

This is necessary because in some rare cases (reductions), we may
not actually use the full buffer. In that case, the cleanup-on-error
code would have to grow smart enough to handle these cases.
It seems much simpler to just always initialize the full buffers, even
if we may not end up using them.

Admittedly, the old logic might have skipped the buffer clearing
(especially the full buffer) in a fair bit of cases, but since
this is only relevant for object dtype, I assume this is fine.


The second commit removes the current memset calls that I am very sure don't make much sense. But to err on the safe side, we could keep them in for the 1.20.x branch.

Closes gh-18810

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Apr 20, 2021
@charris charris added this to the 1.20.3 release milestone Apr 20, 2021
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cananot -> cannot

@seberg seberg force-pushed the complex-reduction-buffer-fix branch from 3922836 to 0f9b316 Compare April 20, 2021 03:07
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the code to reset skip_transfer = 0 remain? Is there ever a case where both reuse_reduce_loops && prev_dataptrs[iop] == ad_ptrs[iop] and PyDataType_FLAGCHK(dtypes[iop], NPY_NEEDS_INIT)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case you are saying seems very plausible to me. Note that NPY_NEEDS_INIT is pretty much equivalent to "object dtype", so it isn't much of a limitation.

Since this is limited to object-dtype and for all other dtypes the transfer is skipped, I would be surprised if it was necessary. But, I have to dig deeper :/... In the case of an object read-write loop, we have to be sure that the buffer is not cleared to be able to reuse it (this translates to "move references" not being used to write it back to the array). I guess I have to try and make sure we have a test for this (we may well have, but object dtype tests tend to be a bit spotty).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think you are right, we have to copy always if:

                    transferinfo[iop].write.func != NULL &&
                    PyDataType_REFCHK(transferinfo[iop].write.descriptors[0])

(or similar). But I am still struggling with a reproducer, maybe I have to set up a weirdly complicated nditer manually...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess its also a bit confusing that NEEDS_INIT and HASREF are really synonymous... Anything that has reference must require initialization. But I can't see of a reason why the other way around can make sense (requiring initialization but for a reason other than HASREF).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest thing to do here would be to remove only line 2536, maybe with a /* TODO: do we really need this? */

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was just the, not really ideal, try to rephrase the current logic into something slightly more to the point. I prefer to rephrase it (although by just making it if (!skip_transfer || PyDataType_REFCHK(dtypes[iop])) {)

I finally found a reproducer for at least one way this can fail with reductions...

seberg added 2 commits April 20, 2021 17:36
This is necessary because in some rare cases (reductions), we may
not actually use the full buffer.  In that case, the cleanup-on-error
code would have to grow smart enough to handle these cases.
It seems much simpler to just always initialize the full buffers, even
if we may not end up using them.

Admittedly, the old logic might have skipped the buffer clearing
(especially the full buffer) in a fair bit of cases, but since
this is only relevant for `object` dtype, I assume this is fine.
Buffer must always either contain NULL or valid references
(assuming the dtype supports references) in order to allow
cleanup in case of errors.

It is thus unnecessary to clear buffers before every copy, if
they contain NULL, all is fine. If they contain non-NULL, we should
also DECREF those references (so it would be incorrect as well).

Buffers thus need the memset exactly once: directly upon allcoation
(which we do now).  After this any transfer from/to the buffer needs
to ensure that the buffer is always in a good state.
@seberg seberg force-pushed the complex-reduction-buffer-fix branch from 0f9b316 to b292603 Compare April 20, 2021 22:36
# force casting with `dtype=object`
res = np.sum(arr, axis=(1, 2), dtype=object, out=out)
assert_array_equal(res, np.full(8000, 4, dtype=object))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That is quite the edge case.

@mattip mattip merged commit dabc536 into numpy:main Apr 20, 2021
@mattip
Copy link
Member

mattip commented Apr 20, 2021

Thanks @seberg

@seberg seberg deleted the complex-reduction-buffer-fix branch April 20, 2021 23:37
@charris charris added component: numpy._core and removed 09 - Backport-Candidate PRs tagged should be backported labels May 5, 2021
@charris charris removed this from the 1.20.3 release milestone May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segfault when summing array slice from Pandas dataframe

4 participants