fix: ensure masked string/bytestring dtypes can hold 'nan' fill value… by JamesBrofos · Pull Request #3692 · scikit-hep/awkward

JamesBrofos · 2025-10-21T20:19:01Z

… in to_dataframe

When converting masked string or bytestring arrays to pandas DataFrames, narrow dtypes (e.g., U1, U2, S1, S2) may not have sufficient width to hold the fill value 'nan' (3 characters) or b'nan' (3 bytes). This caused issues during the conversion process.

This fix checks the dtype width before filling masked values and resizes the dtype to at least 3 characters/bytes when necessary, ensuring that 'nan' or b'nan' can be properly inserted for masked values.

Check if dtype is string (U) or bytestring (S)
Calculate character/byte width from dtype.itemsize
Resize to minimum width of 3 if needed
Add tests for edge cases

This PR addresses the concern in #3691

… in to_dataframe When converting masked string or bytestring arrays to pandas DataFrames, narrow dtypes (e.g., U1, U2, S1, S2) may not have sufficient width to hold the fill value 'nan' (3 characters) or b'nan' (3 bytes). This caused issues during the conversion process. This fix checks the dtype width before filling masked values and resizes the dtype to at least 3 characters/bytes when necessary, ensuring that 'nan' or b'nan' can be properly inserted for masked values. - Check if dtype is string (U) or bytestring (S) - Calculate character/byte width from dtype.itemsize - Resize to minimum width of 3 if needed - Add comprehensive tests for edge cases

codecov · 2025-10-21T21:08:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.71%. Comparing base (b749e49) to head (bb8e0d3).
⚠️ Report is 448 commits behind head on main.

Additional details and impacted files

Files with missing lines	Coverage Δ
src/awkward/operations/ak_to_dataframe.py	`92.08% <100.00%> (+1.46%)`	⬆️

... and 199 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ianna

@JamesBrofos - This looks great! Please, check if pandas is installed to skip the tests when it's not. We don't have pandas installed in all CI workflows. Thanks!

tests/test_3692_to_dataframe_masked_string_dtype_resize.py

github-actions · 2025-10-22T18:57:01Z

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3692

ianna · 2025-10-22T19:46:09Z

@all-contributors please add @JamesBrofos for code

allcontributors · 2025-10-22T19:46:19Z

@ianna

I've put up a pull request to add @JamesBrofos! 🎉

ianna

@JamesBrofos - Great! Thanks for fixing it. The tests pass, I will merge it. Thanks.

JamesBrofos added 2 commits October 21, 2025 16:16

update name

bddd2ff

JamesBrofos force-pushed the fix-to-dataframe-narrow-string-dtype branch from 5d7752b to bddd2ff Compare October 21, 2025 20:24

ianna requested changes Oct 21, 2025

View reviewed changes

tests/test_3692_to_dataframe_masked_string_dtype_resize.py Show resolved Hide resolved

fix imports

b8306f1

JamesBrofos force-pushed the fix-to-dataframe-narrow-string-dtype branch from 233571b to b8306f1 Compare October 22, 2025 11:21

style: pre-commit fixes

e6dbe52

JamesBrofos requested a review from ianna October 22, 2025 18:03

Merge branch 'main' into fix-to-dataframe-narrow-string-dtype

bb8e0d3

allcontributors bot mentioned this pull request Oct 22, 2025

docs: add JamesBrofos as a contributor for code #3693

Merged

ianna approved these changes Oct 22, 2025

View reviewed changes

ianna merged commit 0f61879 into scikit-hep:main Oct 22, 2025
41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ensure masked string/bytestring dtypes can hold 'nan' fill value…#3692

fix: ensure masked string/bytestring dtypes can hold 'nan' fill value…#3692
ianna merged 5 commits intoscikit-hep:mainfrom
JamesBrofos:fix-to-dataframe-narrow-string-dtype

JamesBrofos commented Oct 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

ianna left a comment

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

ianna commented Oct 22, 2025

Uh oh!

allcontributors bot commented Oct 22, 2025

Uh oh!

ianna left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JamesBrofos commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

ianna commented Oct 22, 2025

Uh oh!

allcontributors bot commented Oct 22, 2025

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JamesBrofos commented Oct 21, 2025 •

edited

Loading

codecov bot commented Oct 21, 2025 •

edited

Loading