Skip to content

fix: ensure masked string/bytestring dtypes can hold 'nan' fill value…#3692

Merged
ianna merged 5 commits intoscikit-hep:mainfrom
JamesBrofos:fix-to-dataframe-narrow-string-dtype
Oct 22, 2025
Merged

fix: ensure masked string/bytestring dtypes can hold 'nan' fill value…#3692
ianna merged 5 commits intoscikit-hep:mainfrom
JamesBrofos:fix-to-dataframe-narrow-string-dtype

Conversation

@JamesBrofos
Copy link
Copy Markdown
Contributor

@JamesBrofos JamesBrofos commented Oct 21, 2025

… in to_dataframe

When converting masked string or bytestring arrays to pandas DataFrames, narrow dtypes (e.g., U1, U2, S1, S2) may not have sufficient width to hold the fill value 'nan' (3 characters) or b'nan' (3 bytes). This caused issues during the conversion process.

This fix checks the dtype width before filling masked values and resizes the dtype to at least 3 characters/bytes when necessary, ensuring that 'nan' or b'nan' can be properly inserted for masked values.

  • Check if dtype is string (U) or bytestring (S)
  • Calculate character/byte width from dtype.itemsize
  • Resize to minimum width of 3 if needed
  • Add tests for edge cases

This PR addresses the concern in #3691

… in to_dataframe

When converting masked string or bytestring arrays to pandas DataFrames,
narrow dtypes (e.g., U1, U2, S1, S2) may not have sufficient width to
hold the fill value 'nan' (3 characters) or b'nan' (3 bytes). This caused
issues during the conversion process.

This fix checks the dtype width before filling masked values and resizes
the dtype to at least 3 characters/bytes when necessary, ensuring that
'nan' or b'nan' can be properly inserted for masked values.

- Check if dtype is string (U) or bytestring (S)
- Calculate character/byte width from dtype.itemsize
- Resize to minimum width of 3 if needed
- Add comprehensive tests for edge cases
@JamesBrofos JamesBrofos force-pushed the fix-to-dataframe-narrow-string-dtype branch from 5d7752b to bddd2ff Compare October 21, 2025 20:24
@codecov
Copy link
Copy Markdown

codecov bot commented Oct 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.71%. Comparing base (b749e49) to head (bb8e0d3).
⚠️ Report is 448 commits behind head on main.

Additional details and impacted files
Files with missing lines Coverage Δ
src/awkward/operations/ak_to_dataframe.py 92.08% <100.00%> (+1.46%) ⬆️

... and 199 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesBrofos - This looks great! Please, check if pandas is installed to skip the tests when it's not. We don't have pandas installed in all CI workflows. Thanks!

@JamesBrofos JamesBrofos force-pushed the fix-to-dataframe-narrow-string-dtype branch from 233571b to b8306f1 Compare October 22, 2025 11:21
@JamesBrofos JamesBrofos requested a review from ianna October 22, 2025 18:03
@github-actions
Copy link
Copy Markdown

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3692

@ianna
Copy link
Copy Markdown
Member

ianna commented Oct 22, 2025

@all-contributors please add @JamesBrofos for code

@allcontributors
Copy link
Copy Markdown
Contributor

@ianna

I've put up a pull request to add @JamesBrofos! 🎉

Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesBrofos - Great! Thanks for fixing it. The tests pass, I will merge it. Thanks.

@ianna ianna merged commit 0f61879 into scikit-hep:main Oct 22, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants