Skip to content

fix: allow numpy integers and 0-dim arrays through shape_item_as_index to support numpy 2.3#3531

Closed
lgray wants to merge 3 commits intomainfrom
fix_numpy2.3_expected_int
Closed

fix: allow numpy integers and 0-dim arrays through shape_item_as_index to support numpy 2.3#3531
lgray wants to merge 3 commits intomainfrom
fix_numpy2.3_expected_int

Conversation

@lgray
Copy link
Copy Markdown
Collaborator

@lgray lgray commented Jun 7, 2025

fixes #3530 (but we'll see what the tests say)

@lgray lgray changed the title check if shape item is numpy int dtype as well fix: check if shape item is numpy int dtype as well Jun 7, 2025
@lgray lgray requested a review from pfackeldey June 7, 2025 19:04
@lgray
Copy link
Copy Markdown
Collaborator Author

lgray commented Jun 7, 2025

Though now this technically violates the typing of ShapeItem, but the awkward authors should debate that and figure out what they want to do.

@ikrommyd
Copy link
Copy Markdown
Collaborator

ikrommyd commented Jun 7, 2025

I think the typing of ShapeItem atm is not perfect because we allow numpy ints in the shape of virtual arrays. The length of a virtual array is often the last element of some offsets so it's offsets[-1]. Since numpy 2, offsets[-1] is a numpy int and not a python int so there is a case to be made that the shapeitem should be updated to support np.integer indeed.

Edit: actually...I think we do length = int(length) for this case so that the shape tuple contains only python ints but now that I'm thinking this...That's probably bad for virtual arrays on GPU right? Cupy errors in this case I think and tells you to use .get() to get the integer on the cpu.

Copy link
Copy Markdown
Collaborator

@pfackeldey pfackeldey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer:

import operator 
from awkward._regularize import is_integer_like

class ArrayModuleNumpyLike:
    ...
    def shape_item_as_index(self, x1: ShapeItem) -> int:
        if x1 is unknown_length:
            raise TypeError("array module nplikes do not support unknown lengths")
        elif is_integer_like(x1):
            return operator.index(x1)
        else:
            raise TypeError(f"expected None or int type, received {x1}")

This should be working automatically for all relevant types.

This change (likely) needs to be propagated to the TypeTracer nplike as well: https://github.com/scikit-hep/awkward/blob/main/src/awkward/_nplikes/typetracer.py#L1040-L1046 to:

import operator 
from awkward._regularize import is_integer_like

class TypeTracer:
    ...
    def shape_item_as_index(self, x1: ShapeItem) -> IndexType:
        if x1 is unknown_length:
            return TypeTracerArray._new(np.dtype(np.int64), shape=())
        elif is_integer_like(x1):
            return operator.index(x1)
        else:
            raise TypeError(f"expected unknown_length or int type, received {x1}")

Could you give this a try?

@ikrommyd
Copy link
Copy Markdown
Collaborator

ikrommyd commented Jun 8, 2025

Just for context, this happens because in numpy 2.3 np.count_nonzero returns a numpy integer unlike numpy 2.2

In [2]: np.count_nonzero([1,2,3])
Out[2]: np.int64(3)
In [2]: np.count_nonzero([1,2,3])
Out[2]: 3

If the point of ShapeItem is to express "things that can live in the shape property" then maybe it should be only python built in integer and unknown_lengthand then it would be awkward's job to make sure no np.integer is passed inside the shape_item_as_index function. Numpy and other nplikes still return builtin python integers with their .shape property all the time right? I haven't seen an exception to this so far. However, I think this doesn't play nice with the CUDA backend.

Maybe ShapeItem typing should stay as is but there should be another function that checks whether the input "is integer like"? Maybe not? Upgrading ShapeItem and shape_item_as_index to handle np.integer is always an option.

In the case of this error, shape_item_as_index is used to make sure the input can go into nplike.arange and that it is not unknown_length if the backend is not typetracer:

nplike.arange(
nplike.shape_item_as_index(new_index.size - num_none),
dtype=self._index.dtype,
)
)

@ikrommyd
Copy link
Copy Markdown
Collaborator

ikrommyd commented Jun 8, 2025

@pfackeldey operator.index doesn't work with cupy

In [15]: x = cupy.array([1,2,3])

In [16]: operator.index(x[2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 operator.index(x[2])

TypeError: 'ndarray' object cannot be interpreted as an integer

int does though but I assume there is some GPU -> RAM casting here

In [17]: x[2]
Out[17]: array(3)

In [18]: int(x[2])
Out[18]: 3

@lgray
Copy link
Copy Markdown
Collaborator Author

lgray commented Jun 8, 2025

Yeah getting that to not have to spend time copying a single int back to host would be nice. That's relatively extremely expensive.

@pfackeldey
Copy link
Copy Markdown
Collaborator

@pfackeldey operator.index doesn't work with cupy

In [15]: x = cupy.array([1,2,3])

In [16]: operator.index(x[2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 operator.index(x[2])

TypeError: 'ndarray' object cannot be interpreted as an integer

int does though but I assume there is some GPU -> RAM casting here

In [17]: x[2]
Out[17]: array(3)

In [18]: int(x[2])
Out[18]: 3

Then, there should be a different dedicated shape_item_as_index implementation for the CuPy nplike. The implementations that I proposed for CPU and TypeTracer (should also work for JAX) are the most general and safe ways I can think of to implement this.

Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgray - thanks for fixing it. I agree that there might be other issues with migration to Numpy 2.3, however, let's have a reproducer for the upcoming issues first. We can refactor and generalize then.

@lgray
Copy link
Copy Markdown
Collaborator Author

lgray commented Jun 10, 2025

While I'm not sure this fix is appropriate, it does fix this specific problem.
At least patching it atop my work area for fastjet.

I'll get to making a reproducer today so this can be more correctly managed.

@ikrommyd ikrommyd changed the title fix: check if shape item is numpy int dtype as well fix: allow numpy integers and 0-dim arrays through shape_item_as_index Jun 10, 2025
@ikrommyd
Copy link
Copy Markdown
Collaborator

@lgray @pfackeldey As discussed offline, I've pushed the fix that covers python ints, numpy ints, dim-0 arrays and should work with cupy as well.

@lgray could you check that this makes the fastjet test pass?
@ianna could you please re-review?

This is a quick fix for numpy 2.3 but we should rethink what a ShapeItem is and what shape_item_as_index is meant to do because it feels like the names are a bit miss-leading given what they are meant to do in the actual code.

@ikrommyd ikrommyd requested a review from ianna June 10, 2025 21:40
@ikrommyd
Copy link
Copy Markdown
Collaborator

ikrommyd commented Jun 10, 2025

@ianna I could add a test as well if you'd like? Or we can wait to revisit shape item and test then? Here is the simplest reproducer of the issue with numpy 2.3

import awkward as ak
import numpy as np

content = ak.contents.NumpyArray(np.array([0.0, 1.1, 2.2, 3.3, 4.4]))
index = ak.index.Index64(np.array([2, 2, 0, -1, 4], dtype=np.int64))
array = ak.contents.IndexedOptionArray(index, content)
array.to_packed()

@ikrommyd ikrommyd changed the title fix: allow numpy integers and 0-dim arrays through shape_item_as_index fix: allow numpy integers and 0-dim arrays through shape_item_as_index to support numpy 2.3 Jun 10, 2025
@ikrommyd
Copy link
Copy Markdown
Collaborator

The failures are one single test coming from the latest numexpr release. It's coming from this line:

if expr_key not in numexpr.necompiler._names_cache:
because numexpr.necompiler._names_cache is no longer a CacheDict but a threading.local() due to this commit: pydata/numexpr@33ee71b

I'll think tomorrow probably how we can deal with this.

@ianna
Copy link
Copy Markdown
Member

ianna commented Jun 11, 2025

@lgray @pfackeldey As discussed offline, I've pushed the fix that covers python ints, numpy ints, dim-0 arrays and should work with cupy as well.

@lgray could you check that this makes the fastjet test pass? @ianna could you please re-review?

This is a quick fix for numpy 2.3 but we should rethink what a ShapeItem is and what shape_item_as_index is meant to do because it feels like the names are a bit miss-leading given what they are meant to do in the actual code.

Hey — since this one’s already greenlit, let’s keep it as-is and handle any tweaks in a follow-up. Keeps things tidy for the history. Thanks a ton!

Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is meant to be a real fix, it needs a reproducer, tests, and GPU profiling — else let’s roll with the approved quick fix.

@ikrommyd
Copy link
Copy Markdown
Collaborator

ikrommyd commented Jun 11, 2025

@ianna which version are you talking about? The current version of the PR is the appropriate quick fix that works with cuda and the typetracer backend. The original version that you green-lighted won't work with those two backends that's why we had a very short offline chat with Lindsey and Peter and put the current version of the PR as the "quick fix".

The “lengthier” fix would be implementing a separate function for the cupy backend and in general re thinking a bit how shape item works. I’m happy to put a test in this one. Without this fix, there are about 400 test failures though with numpy 2.3 they get noticed in the other NumExpr PR I have up.

@ianna
Copy link
Copy Markdown
Member

ianna commented Jun 11, 2025

@ianna which version are you talking about? The current version of the PR is the appropriate quick fix that works with cuda and the typetracer backend. The original version that you green-lighted won't work with those two backends that's why we had a very short offline chat with Lindsey and Peter and put the current version of the PR as the "quick fix".

The “lengthier” fix would be implementing a separate function for the cupy backend and in general re thinking a bit how shape item works. I’m happy to put a test in this one. Without this fix, there are about 400 test failures though with numpy 2.3 they get noticed in the other NumExpr PR I have up.

Thanks for the context — but to be clear, the initial quick fix addressed a specific, documented problem that was described in an issue. This new 'improved fix' currently doesn’t have an associated issue or a reproducer for the backend-specific problems it’s trying to address.

If there’s a deeper problem affecting CUDA and Typetracer, it really needs to be properly tracked in a new issue, with reproducers attached. Otherwise we risk merging untracked changes that solve symptoms without addressing or even documenting the underlying cause.

I’d prefer we hold off merging this version until that’s in place.

@ikrommyd
Copy link
Copy Markdown
Collaborator

Well for the cpu backend, I have a reproducer already which I can add as a test. For cuda, it's not tested in ci but I can create a reproducer which shows why the "original" fix fails and you can try it offline. For the typetracer, I'm not 100% sure if we can end up with this failure in actual code, but since we're allowing "integer like" objects in this function in the cpu backend, we might as well allow them in the typetracer case.

@ikrommyd
Copy link
Copy Markdown
Collaborator

After I tracked down the use cases of shape_item_as_index, it turns out we don't need to change the function but only a single usage of it. If y'all agree we can close this PR cause the most concise fix for numpy 2.3 support is in #3534.

@pfackeldey
Copy link
Copy Markdown
Collaborator

Superseded by #3534

@pfackeldey pfackeldey closed this Jun 11, 2025
@ikrommyd ikrommyd deleted the fix_numpy2.3_expected_int branch June 11, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support NumPy 2.3

4 participants