Skip to content

Conversation

@phdparedes
Copy link
Contributor

This pull request includes a minor fix in the _check_fill_value function in numpy/ma/core.py. The change adjusts the condition to include the character 'T' in the type check for ndtype.char.

This fix adds the StringDType data type character code 'T' to the list of accepted data types that can use a 'fill_value' as string. Currently just accepts None. "See numpy#29421" Issue.
@phdparedes phdparedes changed the title BUG: allow ma.MaskedArray.fill_value be a string when dtype=StringDType BUG: allow MaskedArray.fill_value be a string when dtype=StringDType Jul 23, 2025
@melissawm melissawm moved this to Awaiting a code review in NumPy first-time contributor PRs Jul 23, 2025
@ngoldbaum
Copy link
Member

Sorry - it's not totally clear to me what this fixes.

Can you add a test? Maybe there's an existing test for this in the masked array tests that check the other dtypes you can extend.

@ngoldbaum
Copy link
Member

It looks like if I just extend the existing tests I can trigger the bug this PR is fixing. See ngoldbaum@4343115. You can pull that commit and push or I can just push directly to your PR branch, whichever you feel most comfortable with. Generally I don't push to people's PR branches without explicit permission.

Once there are tests I'm happy to merge this.

@phdparedes
Copy link
Contributor Author

Oh I see now, my apologies Nathan, as a first time contributor I totally miss the testing inclusion with numpy development standards. I think I'll do the pull and push, but let me double check this just to make sure I'm understanding this correctly. Thank again.

ma2 = masked_array(["cde", "b", "a"], mask=[0, 1, 0], fill_value=fill, dtype=dt)
assert_equal(op(ma1, ma2)._data, op(ma1._data, ma2._data))


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

to appease the linter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding test functions for test_core.py now.

@phdparedes
Copy link
Contributor Author

Not ready to merge yet, I'm double checking something I just noticed.

@ngoldbaum
Copy link
Member

Not ready to merge yet, I'm double checking something I just noticed.

Still double-checking?

@phdparedes
Copy link
Contributor Author

I was checking the behavior of MaskedArray when is fill_value is not set. For the other dtypes it uses a default fill_value defined in default_filler dictionary

numpy/numpy/ma/core.py

Lines 176 to 185 in fda7b4c

default_filler = {'b': True,
'c': 1.e20 + 0.0j,
'f': 1.e20,
'i': 999999,
'O': '?',
'S': b'N/A',
'u': 999999,
'V': b'???',
'U': 'N/A'
}

I was wondering about adding one for StringDType like N/A as for string. As it is now still works if MaskedArray initializes without fill_value set, a fill_value is given regardless, just like for dtype=object O.
do you think I should add 'T' : 'N/A' or 'T' : b'N/A' to the default_filler dict?

@ngoldbaum
Copy link
Member

ngoldbaum commented Jul 28, 2025

I think using "N/A" is fine. It might make sense eventually to also do something sensible if someone wants to use a fill value that isn't a string, but we can handle that then.

You can use StringDType with non-string fill values, so you can do stuff like this:

In [10]: dt = np.dtypes.StringDType(na_object=None)

In [11]: arr = np.array(['hello', 'world', None], dtype=dt)

In [12]: arr
Out[12]: array(['hello', 'world', None], dtype=StringDType(na_object=None))

In [13]: arr[2] is None
Out[13]: True

but "N/A" is probably a fine default to use. If you wanted to make it so non-string fill values work, you'll need to set na_object on the StringDType instance on the MaskedArray.

@ngoldbaum
Copy link
Member

OK, awesome. It’s borderline but IMO this deserves a release note too.

See https://github.com/numpy/numpy/blob/main/doc/release/upcoming_changes/README.rst and the other release note fragments in that directory for instructions and examples.

@ngoldbaum
Copy link
Member

Oops, looks like I forgot to come back to this. I'll try to give this another once-over and hopefully hit the merge button later today.

@phdparedes
Copy link
Contributor Author

Let me know if there anything else you need me to do @ngoldbaum

@ngoldbaum
Copy link
Member

Thanks for your patience @phdparedes and for the contribution! Hope to see your github handle again :)

@ngoldbaum ngoldbaum merged commit 7683d85 into numpy:main Aug 13, 2025
76 checks passed
@github-project-automation github-project-automation bot moved this from Awaiting a code review to Completed in NumPy first-time contributor PRs Aug 13, 2025
IndifferentArea pushed a commit to IndifferentArea/numpy that referenced this pull request Dec 7, 2025
…pe` (numpy#29423)

* BUG: allow ma.MaskedArray.fill_value be a string when dtype=StringDType

This fix adds the StringDType data type character code 'T' to the list of accepted data types that can use a 'fill_value' as string. Currently just accepts None. "See numpy#29421" Issue.

* TST: add tests for StringDType MaskedArray use

* TST: add tests for fill_value in StringDType MaskedArray

* TST: fix format in the test functions added

* Add default fill value 'N/A' for StringDType

* TST: Add tests for StringDType masked array default fill and slicing

* DOC: add release note for StringDType fill_value support (numpygh-29423)

---------

Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@valeriupredoi
Copy link

valeriupredoi commented Jan 6, 2026

hi folks, cheers for this - @ngoldbaum this has a side-effect that I'm not sure if it's a bug or a feature: previously a fill_value that was a one-element ndarray eg [-900.] was accepted and worked, now that'll toss a ValueError: setting an array element with a sequence in np.ma.masked_equal(data, missing_value) - see #30591

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

3 participants