BUG: safer bincount casting by tylerjereddy · Pull Request #28355 · numpy/numpy

tylerjereddy · 2025-02-19T01:00:09Z

Fixes BUG: np.bincount casting uint64 to int64 #28354
Allow some uint types to be cast a bit more gracefully for np.bincount().

tylerjereddy · 2025-02-19T03:45:01Z

The CI failures for the new test case I added are real; npy_intp gets used in other places in the control flow as well, so patching locally to account for more explicit type casts alone isn't sufficient.

tylerjereddy · 2025-02-19T18:30:14Z

Briefly discussed at triage meeting this morning--Sebastian suggested forcing a cast to npy_intp, even for uint64 input, since indexing does that as well and it isn't really practical to guard against the unreasonably large ints that would wrap to negative values.

numpy/_core/src/multiarray/compiled_base.c

* Fixes numpy#28354 * Force usage of `npy_intp` type in `np.bincount()` and avoid unsafe casting errors with i.e., `npy_uint64`. This is similar to our behavior with indexing.

* `arr_bincount()` now only uses unsafe casting for integer input types, and the number of casting operations has been reduced for the code path used in above PR. * a test for non-crashing behavior with non-contiguous `bincount()` input has been added.

tylerjereddy · 2025-02-21T23:11:41Z

We shouldn't force-cast everything.

Ok, I added a guard for that.

PyArray_FromAny needs the flags to be passed because the later code relies on them..

I couldn't replicate the need for this because what I did originally was two casting operations instead of one--the Any cast was added in the control flow to precede the PyArray_ContiguousFromAny after it, mostly because I was being lazy about constructing the appropriate requirements/flags. In any case, I've simplified things to restore to a single casting operation now, with custom flags depending on input type, which I suspect is closer to what you'd want?

The current version should crash hard if you do something like np.bincount(np.arange(10000)[::2])

I couldn't find any evidence of a hard crash with that even when I intentionally made things worse with regards to C contiguity, though it may require i.e., pytest-repeat and some patience. In any case, I've added that test case as a "doesn't segfault" test just in case.

* The actual C function signature is `PyArray_Size(PyObject *op)`, and the compiler will issue a warning if you follow the current C API docs that specify `PyArrayObject*`, so clean this up. I noticed this while working on numpygh-28355, but it was also mentioned as an aside 7 years ago in numpygh-10977. [skip azp] [skip cirrus] [skip actions]

* The actual C function signature is `PyArray_Size(PyObject *op)`, and the compiler will issue a warning if you follow the current C API docs that specify `PyArrayObject*`, so clean this up. I noticed this while working on gh-28355, but it was also mentioned as an aside 7 years ago in gh-10977. [skip azp] [skip cirrus] [skip actions]

seberg · 2025-02-24T14:08:34Z

numpy/_core/src/multiarray/compiled_base.c

+        flags = NPY_ARRAY_WRITEABLE | NPY_ARRAY_ALIGNED | NPY_ARRAY_C_CONTIGUOUS;
+        if (PyArray_Size((PyObject *)list) &&


The deprecation makes this pretty messy, almost wondering if we should just finalize it...

In either case, I think the PyArray_Size() here ends up as an imprecise PyArray_Check() here (as it misses an empty uint8 array for example)?

I am not quite sure that not setting force-cast in the non-array case won't finalize the deprecation (in small parts).
Since both paths use PyArray_ContiguousFromAny, I think that is fine. Although, it may be that we are not relaxing allowing uint64 for an array-like here i.e. something like:

class a: def __array__(self): return np.ones(10, dtype=np.uint64)

seberg · 2025-02-24T14:10:26Z

numpy/_core/src/multiarray/compiled_base.c

    npy_intp *numbers, *ians, len, mx, mn, ans_size;
    npy_intp minlength = 0;
    npy_intp i;
+    int flags;


Would move this down into the if, but OK.

I think we can put this in, if we change things to a PyArray_Check below, unless I am missing something. It seems fine, even if it might not be 100% for array-likes.

I would lean towards not backporting. It has been around forever, and the reason it came up was just cupy testing against NumPy as far as I am aware.

* Based on reviewer feedback, narrow the scope of the `flags` variable in `arr_bincount()`. * Based on reviewer feedback, add an array-like test for the `uint64` casting issue, which indeed fails before and passes after adding a similar shim to the array-like code path in `arr_bincount()`. * Based on reviewer feedback, switch the patching from `PyArray_Size` to `PyArray_Check` in a few places.

tylerjereddy · 2025-02-28T00:31:47Z

@seberg I think I've addressed most of the comments in the latest commit (see commit message).

Covering the array-like case does add a bit of code duplication now I suppose. I didn't follow the empty uint8 worry, but switched to PyArray_Check in any case since I believe I was indeed guessing a bit with Size (now in two places, though didn't check we really need it in both).

Note that the test for array-like did fail before/pass after the additional coercion shim just added (I also noticed that our error message for passing in a raw array-like class, rather than instance, to bincount seems terrible, but that's probably another matter).

I don't think I followed the deprecation-related concern--I thought I was just coercing for integer type inputs and it was the non-integer cases that were deprecated. It is a bit confusing though, because the deprecation message itself seems simple while the commented discussion a bit higher up seems to obfuscate things...

seberg

Thanks, duplicating this seems indeed like the easiest path.

numpy/_core/src/multiarray/compiled_base.c

seberg · 2025-03-03T14:32:39Z

FWIW, I think this has been around forever and the issue was just a cupy test, I think. So I don't care much for backporting, but I also don't think there is a problem with that..

jakirkham · 2025-03-03T15:52:52Z

Thanks Tyler and Sebastian! 🙏

* BUG: safer bincount casting * Fixes numpy#28354 * Force usage of `npy_intp` type in `np.bincount()` and avoid unsafe casting errors with i.e., `npy_uint64`. This is similar to our behavior with indexing. * MAINT, BUG: PR 28355 revisions * `arr_bincount()` now only uses unsafe casting for integer input types, and the number of casting operations has been reduced for the code path used in above PR. * a test for non-crashing behavior with non-contiguous `bincount()` input has been added. * MAINT, BUG: PR 28355 revisions * Based on reviewer feedback, narrow the scope of the `flags` variable in `arr_bincount()`. * Based on reviewer feedback, add an array-like test for the `uint64` casting issue, which indeed fails before and passes after adding a similar shim to the array-like code path in `arr_bincount()`. * Based on reviewer feedback, switch the patching from `PyArray_Size` to `PyArray_Check` in a few places. * Update numpy/_core/src/multiarray/compiled_base.c --------- Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>

jakirkham · 2025-03-03T18:34:02Z

As 2.2.x looks pretty active, went ahead and created a backport to 2.2.x with PR: #28420

It looks like versions older than that are not seeing much activity. So didn't go further. Please let me know whether that would be worthwhile

tylerjereddy added 00 - Bug component: numpy._core labels Feb 19, 2025

charris added the 09 - Backport-Candidate PRs tagged should be backported label Feb 19, 2025

tylerjereddy force-pushed the treddy_issue_28354 branch from dca9fb4 to 8530c81 Compare February 19, 2025 21:27

tylerjereddy commented Feb 19, 2025

View reviewed changes

numpy/_core/src/multiarray/compiled_base.c Show resolved Hide resolved

tylerjereddy added 2 commits February 21, 2025 14:37

BUG: safer bincount casting

6948e3d

* Fixes numpy#28354 * Force usage of `npy_intp` type in `np.bincount()` and avoid unsafe casting errors with i.e., `npy_uint64`. This is similar to our behavior with indexing.

MAINT, BUG: PR 28355 revisions

f7046ed

* `arr_bincount()` now only uses unsafe casting for integer input types, and the number of casting operations has been reduced for the code path used in above PR. * a test for non-crashing behavior with non-contiguous `bincount()` input has been added.

tylerjereddy force-pushed the treddy_issue_28354 branch from 8530c81 to f7046ed Compare February 21, 2025 23:05

tylerjereddy mentioned this pull request Feb 21, 2025

DOC: fix C API docs for PyArray_Size #28382

Merged

seberg reviewed Feb 24, 2025

View reviewed changes

seberg approved these changes Feb 28, 2025

View reviewed changes

numpy/_core/src/multiarray/compiled_base.c Outdated Show resolved Hide resolved

Update numpy/_core/src/multiarray/compiled_base.c

29dbd5c

seberg merged commit d014c16 into numpy:main Mar 3, 2025
65 of 66 checks passed

tylerjereddy deleted the treddy_issue_28354 branch March 3, 2025 16:50

jakirkham mentioned this pull request Mar 3, 2025

BUG: safer bincount casting (backport to 2.2.x) #28420

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Mar 3, 2025

seberg mentioned this pull request May 11, 2025

DOC: clarify int type constraint in numpy.bincount #23552

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: safer bincount casting#28355

BUG: safer bincount casting#28355
seberg merged 4 commits intonumpy:mainfrom
tylerjereddy:treddy_issue_28354

tylerjereddy commented Feb 19, 2025

Uh oh!

tylerjereddy commented Feb 19, 2025

Uh oh!

tylerjereddy commented Feb 19, 2025

Uh oh!

Uh oh!

tylerjereddy commented Feb 21, 2025

Uh oh!

seberg Feb 24, 2025

Uh oh!

seberg Feb 24, 2025

Uh oh!

tylerjereddy commented Feb 28, 2025 •

edited

Loading

Uh oh!

seberg left a comment

Uh oh!

Uh oh!

Uh oh!

seberg commented Mar 3, 2025

Uh oh!

jakirkham commented Mar 3, 2025

Uh oh!

jakirkham commented Mar 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		flags = NPY_ARRAY_WRITEABLE \| NPY_ARRAY_ALIGNED \| NPY_ARRAY_C_CONTIGUOUS;
		if (PyArray_Size((PyObject *)list) &&

Uh oh!

Conversation

tylerjereddy commented Feb 19, 2025

Uh oh!

tylerjereddy commented Feb 19, 2025

Uh oh!

tylerjereddy commented Feb 19, 2025

Uh oh!

Uh oh!

tylerjereddy commented Feb 21, 2025

Uh oh!

seberg Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

seberg Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

tylerjereddy commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

seberg commented Mar 3, 2025

Uh oh!

jakirkham commented Mar 3, 2025

Uh oh!

jakirkham commented Mar 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tylerjereddy commented Feb 28, 2025 •

edited

Loading