Skip to content

BUG: fix another cast setup in array_assign_subscript#27057

Merged
ngoldbaum merged 2 commits into
numpy:mainfrom
ngoldbaum:fix-array-assign-subscript
Aug 6, 2024
Merged

BUG: fix another cast setup in array_assign_subscript#27057
ngoldbaum merged 2 commits into
numpy:mainfrom
ngoldbaum:fix-array-assign-subscript

Conversation

@ngoldbaum

Copy link
Copy Markdown
Member

Thanks for the bug-finding @egesip!

Fixes #27053

@ngoldbaum ngoldbaum added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported component: numpy.strings String dtypes and functions labels Jul 26, 2024
@ngoldbaum

Copy link
Copy Markdown
Member Author

Hmm, it looks like there are some other code paths that depend on the input and output descriptor being the same...

@ngoldbaum

Copy link
Copy Markdown
Member Author

I just pushed a hacky fix - I suspect @seberg is going to tell me I'm doing this wrong :)

It wasn't clear to me how exactly this code path is supposed to work, since it seems to depend on the inferred source dtype not being used in some cases.

Comment thread numpy/_core/src/multiarray/mapping.c Outdated
}
else {
src = self;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below makes sense to me, but this one I don't understand? Legacy should work identically, no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should, in practice lots of tests seem to depend on the old behavior. As I said in the description, I don't fully understand what this code block is doing, it seems to be here to capture oddball conversions (unicode to half goes through here apparently - one of the failing tests I was looking at).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case you are using the wrong descr below, and similar tests must fail also for string dtype (maybe not because there is an object special path that uses refcheck).

Check MapIter, descr is passed, it sets up a buffering iterator to keep the copy loop simple, that ensures that the descriptor is identical between src/dst. It is not for your dtype, so you need to fetch the src one from the iterator.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing it this way, and it almost works, but the dtype array that lives on the iterator contains references to the dtypes that were passed in to create the iterator, so it still doesn't fix the issue for stringdtype. I tried checking for the case when the descriptor gets replaced and then updated the iterator's descriptor array, but for mysterious reasons that led to crashes.

Maybe the rewrite in the latest push is a bit more appealing? The basic difference between the case with stringdtype and the case that's breaking for other dtypes is for stringdtype the MapIter is allocating an array inline, but for other dtypes the extra_op is already allocated. I added a flag that marks whether or not the iterator allocated an array and used it here to distinguish between the cases.

If not I'd appreciate it if you could try your hand at a fix, I tried for several hours today to get this right but couldn't manage it along the lines that you outlined the other day.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better, I agree it's all a bit of a maze although, I want would like to see if we can't untangle it. I might help to make descr (the one used for the extra-op/tmp array) a strong reference.

One thing that would may be better for all paths anyway, may be to use descr = finalize_descr(descr). That way, we don't pollute the assigned arrays side car buffer with temporary copies!
That will mean that we would need to pass _NPY_ARRAY_ENSURE_DTYPE_IDENTITY when creating copies and actually make sure we honor that flag (which we should anyway).

Anyway, need to take another look and it may still be subtly wrong in another pass if REFCHK is false, although I am not sure that is ever relevant.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea to use finalize_descrs when setting up the iterator, I might try my hand at that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I think it may make sense to do it on descr that gets passed in. But not sure where to put it exactly.

@ngoldbaum ngoldbaum force-pushed the fix-array-assign-subscript branch from 37d1dac to 0237342 Compare July 29, 2024 19:24
@ngoldbaum ngoldbaum added this to the 2.1.0 release milestone Aug 1, 2024
@charris

charris commented Aug 5, 2024

Copy link
Copy Markdown
Member

This is one of the last fixes needed before branching.

@ngoldbaum

Copy link
Copy Markdown
Member Author

@seberg would you be ok with merging this as-is?

@seberg

seberg commented Aug 5, 2024

Copy link
Copy Markdown
Member

I suppose so, it would be nice to have some form of TODO to point out that it may still be broken (at least for potential other user DTypes).

But, yeah, if this is the one thing that is blocking 2.1 then fair.

@ngoldbaum ngoldbaum force-pushed the fix-array-assign-subscript branch from 0237342 to 6a0d05c Compare August 5, 2024 20:16
@ngoldbaum

Copy link
Copy Markdown
Member Author

Added a TODO

@seberg seberg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I don't really think its quite right, but approving since it is probably right for the places where it matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

00 - Bug component: numpy.strings String dtypes and functions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: StringDType Subscript assignment problems

3 participants