fix: ensure masked string/bytestring dtypes can hold 'nan' fill value…#3692
Merged
ianna merged 5 commits intoscikit-hep:mainfrom Oct 22, 2025
Merged
Conversation
… in to_dataframe When converting masked string or bytestring arrays to pandas DataFrames, narrow dtypes (e.g., U1, U2, S1, S2) may not have sufficient width to hold the fill value 'nan' (3 characters) or b'nan' (3 bytes). This caused issues during the conversion process. This fix checks the dtype width before filling masked values and resizes the dtype to at least 3 characters/bytes when necessary, ensuring that 'nan' or b'nan' can be properly inserted for masked values. - Check if dtype is string (U) or bytestring (S) - Calculate character/byte width from dtype.itemsize - Resize to minimum width of 3 if needed - Add comprehensive tests for edge cases
5d7752b to
bddd2ff
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
🚀 New features to boost your workflow:
|
ianna
requested changes
Oct 21, 2025
Member
ianna
left a comment
There was a problem hiding this comment.
@JamesBrofos - This looks great! Please, check if pandas is installed to skip the tests when it's not. We don't have pandas installed in all CI workflows. Thanks!
233571b to
b8306f1
Compare
|
The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3692 |
Member
|
@all-contributors please add @JamesBrofos for code |
Contributor
|
I've put up a pull request to add @JamesBrofos! 🎉 |
ianna
approved these changes
Oct 22, 2025
Member
ianna
left a comment
There was a problem hiding this comment.
@JamesBrofos - Great! Thanks for fixing it. The tests pass, I will merge it. Thanks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
… in to_dataframe
When converting masked string or bytestring arrays to pandas DataFrames, narrow dtypes (e.g., U1, U2, S1, S2) may not have sufficient width to hold the fill value 'nan' (3 characters) or b'nan' (3 bytes). This caused issues during the conversion process.
This fix checks the dtype width before filling masked values and resizes the dtype to at least 3 characters/bytes when necessary, ensuring that 'nan' or b'nan' can be properly inserted for masked values.
This PR addresses the concern in #3691