BUG: ensure text padding ufuncs handle stringdtype nan-like nulls#26353
Merged
ngoldbaum merged 1 commit intonumpy:mainfrom Apr 29, 2024
Merged
BUG: ensure text padding ufuncs handle stringdtype nan-like nulls#26353ngoldbaum merged 1 commit intonumpy:mainfrom
ngoldbaum merged 1 commit intonumpy:mainfrom
Conversation
32ca7eb to
197c915
Compare
Member
Author
|
I initially marked this as a backport candidate but that was wrong, these ufuncs won't be in numpy 2.0. |
mhvk
approved these changes
Apr 28, 2024
Contributor
mhvk
left a comment
There was a problem hiding this comment.
Guess we should have caught that the maximum width was really not necessary for StringDType. Anyway, solution looks all good!
Member
Author
|
Thanks for looking this over! |
seberg
pushed a commit
that referenced
this pull request
Apr 30, 2024
This fixes an issue similar to the one fixed by #26353. In particular, right now np.strings.replace calls the count ufunc to get the number of replacements. This is necessary for fixed-width strings, but it turns out to make it impossible to support null strings in replace. I went ahead and instead found the replacement counts inline in the ufunc loop. This lets me add support for nan-like null strings, which it turns out pandas needs.
charris
pushed a commit
to charris/numpy
that referenced
this pull request
May 2, 2024
…6355) This fixes an issue similar to the one fixed by numpy#26353. In particular, right now np.strings.replace calls the count ufunc to get the number of replacements. This is necessary for fixed-width strings, but it turns out to make it impossible to support null strings in replace. I went ahead and instead found the replacement counts inline in the ufunc loop. This lets me add support for nan-like null strings, which it turns out pandas needs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently these ufuncs fail for arrays containing null strings with the following error:
The use of
str_leninnp.stringsis only necessary for fixed-width strings, so I moved the pythonwidthcalculations in thenp.stringswrappers after the check for stringdtype in each function and added width checking in the C++ ufunc loops.I also modified the tests to register the text-padding ufuncs as passing through nan-like null strings.