fix: allow numpy integers and 0-dim arrays through `shape_item_as_index` to support numpy 2.3 by lgray · Pull Request #3531 · scikit-hep/awkward

lgray · 2025-06-07T18:46:47Z

fixes #3530 (but we'll see what the tests say)

lgray · 2025-06-07T19:13:14Z

Though now this technically violates the typing of ShapeItem, but the awkward authors should debate that and figure out what they want to do.

src/awkward/_nplikes/array_module.py

ikrommyd · 2025-06-07T21:25:09Z

I think the typing of ShapeItem atm is not perfect because we allow numpy ints in the shape of virtual arrays. The length of a virtual array is often the last element of some offsets so it's offsets[-1]. Since numpy 2, offsets[-1] is a numpy int and not a python int so there is a case to be made that the shapeitem should be updated to support np.integer indeed.

Edit: actually...I think we do length = int(length) for this case so that the shape tuple contains only python ints but now that I'm thinking this...That's probably bad for virtual arrays on GPU right? Cupy errors in this case I think and tells you to use .get() to get the integer on the cpu.

pfackeldey

I think I'd prefer:

import operator 
from awkward._regularize import is_integer_like

class ArrayModuleNumpyLike:
    ...
    def shape_item_as_index(self, x1: ShapeItem) -> int:
        if x1 is unknown_length:
            raise TypeError("array module nplikes do not support unknown lengths")
        elif is_integer_like(x1):
            return operator.index(x1)
        else:
            raise TypeError(f"expected None or int type, received {x1}")

This should be working automatically for all relevant types.

This change (likely) needs to be propagated to the TypeTracer nplike as well: https://github.com/scikit-hep/awkward/blob/main/src/awkward/_nplikes/typetracer.py#L1040-L1046 to:

import operator 
from awkward._regularize import is_integer_like

class TypeTracer:
    ...
    def shape_item_as_index(self, x1: ShapeItem) -> IndexType:
        if x1 is unknown_length:
            return TypeTracerArray._new(np.dtype(np.int64), shape=())
        elif is_integer_like(x1):
            return operator.index(x1)
        else:
            raise TypeError(f"expected unknown_length or int type, received {x1}")

Could you give this a try?

ikrommyd · 2025-06-08T05:05:43Z

Just for context, this happens because in numpy 2.3 np.count_nonzero returns a numpy integer unlike numpy 2.2

In [2]: np.count_nonzero([1,2,3])
Out[2]: np.int64(3)

In [2]: np.count_nonzero([1,2,3])
Out[2]: 3

If the point of ShapeItem is to express "things that can live in the shape property" then maybe it should be only python built in integer and unknown_lengthand then it would be awkward's job to make sure no np.integer is passed inside the shape_item_as_index function. Numpy and other nplikes still return builtin python integers with their .shape property all the time right? I haven't seen an exception to this so far. However, I think this doesn't play nice with the CUDA backend.

Maybe ShapeItem typing should stay as is but there should be another function that checks whether the input "is integer like"? Maybe not? Upgrading ShapeItem and shape_item_as_index to handle np.integer is always an option.

In the case of this error, shape_item_as_index is used to make sure the input can go into nplike.arange and that it is not unknown_length if the backend is not typetracer:

awkward/src/awkward/contents/indexedoptionarray.py

Lines 1730 to 1734 in 8e2d4a8

    
               nplike.arange( 
        
                   nplike.shape_item_as_index(new_index.size - num_none), 
        
                   dtype=self._index.dtype, 
        
               ) 
        
           )

ikrommyd · 2025-06-08T06:06:46Z

@pfackeldey operator.index doesn't work with cupy

In [15]: x = cupy.array([1,2,3])

In [16]: operator.index(x[2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 operator.index(x[2])

TypeError: 'ndarray' object cannot be interpreted as an integer

int does though but I assume there is some GPU -> RAM casting here

In [17]: x[2]
Out[17]: array(3)

In [18]: int(x[2])
Out[18]: 3

lgray · 2025-06-08T19:32:02Z

Yeah getting that to not have to spend time copying a single int back to host would be nice. That's relatively extremely expensive.

pfackeldey · 2025-06-09T15:26:02Z

@pfackeldey operator.index doesn't work with cupy

In [15]: x = cupy.array([1,2,3])

In [16]: operator.index(x[2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 operator.index(x[2])

TypeError: 'ndarray' object cannot be interpreted as an integer

int does though but I assume there is some GPU -> RAM casting here

In [17]: x[2]
Out[17]: array(3)

In [18]: int(x[2])
Out[18]: 3

Then, there should be a different dedicated shape_item_as_index implementation for the CuPy nplike. The implementations that I proposed for CPU and TypeTracer (should also work for JAX) are the most general and safe ways I can think of to implement this.

ianna

@lgray - thanks for fixing it. I agree that there might be other issues with migration to Numpy 2.3, however, let's have a reproducer for the upcoming issues first. We can refactor and generalize then.

lgray · 2025-06-10T15:47:09Z

While I'm not sure this fix is appropriate, it does fix this specific problem.
At least patching it atop my work area for fastjet.

I'll get to making a reproducer today so this can be more correctly managed.

ikrommyd · 2025-06-10T21:40:39Z

@lgray @pfackeldey As discussed offline, I've pushed the fix that covers python ints, numpy ints, dim-0 arrays and should work with cupy as well.

@lgray could you check that this makes the fastjet test pass?
@ianna could you please re-review?

This is a quick fix for numpy 2.3 but we should rethink what a ShapeItem is and what shape_item_as_index is meant to do because it feels like the names are a bit miss-leading given what they are meant to do in the actual code.

ikrommyd · 2025-06-10T21:46:16Z

@ianna I could add a test as well if you'd like? Or we can wait to revisit shape item and test then? Here is the simplest reproducer of the issue with numpy 2.3

import awkward as ak
import numpy as np

content = ak.contents.NumpyArray(np.array([0.0, 1.1, 2.2, 3.3, 4.4]))
index = ak.index.Index64(np.array([2, 2, 0, -1, 4], dtype=np.int64))
array = ak.contents.IndexedOptionArray(index, content)
array.to_packed()

ikrommyd · 2025-06-10T23:12:30Z

The failures are one single test coming from the latest numexpr release. It's coming from this line:

awkward/src/awkward/_connect/numexpr.py

Line 82 in 8e2d4a8

if expr_key not in numexpr.necompiler._names_cache:

because numexpr.necompiler._names_cache is no longer a CacheDict but a threading.local() due to this commit: pydata/numexpr@33ee71b

I'll think tomorrow probably how we can deal with this.

ianna · 2025-06-11T10:59:05Z

@lgray @pfackeldey As discussed offline, I've pushed the fix that covers python ints, numpy ints, dim-0 arrays and should work with cupy as well.

@lgray could you check that this makes the fastjet test pass? @ianna could you please re-review?

This is a quick fix for numpy 2.3 but we should rethink what a ShapeItem is and what shape_item_as_index is meant to do because it feels like the names are a bit miss-leading given what they are meant to do in the actual code.

Hey — since this one’s already greenlit, let’s keep it as-is and handle any tweaks in a follow-up. Keeps things tidy for the history. Thanks a ton!

ianna

If this is meant to be a real fix, it needs a reproducer, tests, and GPU profiling — else let’s roll with the approved quick fix.

ikrommyd · 2025-06-11T12:06:13Z

@ianna which version are you talking about? The current version of the PR is the appropriate quick fix that works with cuda and the typetracer backend. The original version that you green-lighted won't work with those two backends that's why we had a very short offline chat with Lindsey and Peter and put the current version of the PR as the "quick fix".

The “lengthier” fix would be implementing a separate function for the cupy backend and in general re thinking a bit how shape item works. I’m happy to put a test in this one. Without this fix, there are about 400 test failures though with numpy 2.3 they get noticed in the other NumExpr PR I have up.

ianna · 2025-06-11T13:37:36Z

@ianna which version are you talking about? The current version of the PR is the appropriate quick fix that works with cuda and the typetracer backend. The original version that you green-lighted won't work with those two backends that's why we had a very short offline chat with Lindsey and Peter and put the current version of the PR as the "quick fix".

The “lengthier” fix would be implementing a separate function for the cupy backend and in general re thinking a bit how shape item works. I’m happy to put a test in this one. Without this fix, there are about 400 test failures though with numpy 2.3 they get noticed in the other NumExpr PR I have up.

Thanks for the context — but to be clear, the initial quick fix addressed a specific, documented problem that was described in an issue. This new 'improved fix' currently doesn’t have an associated issue or a reproducer for the backend-specific problems it’s trying to address.

If there’s a deeper problem affecting CUDA and Typetracer, it really needs to be properly tracked in a new issue, with reproducers attached. Otherwise we risk merging untracked changes that solve symptoms without addressing or even documenting the underlying cause.

I’d prefer we hold off merging this version until that’s in place.

ikrommyd · 2025-06-11T14:13:13Z

Well for the cpu backend, I have a reproducer already which I can add as a test. For cuda, it's not tested in ci but I can create a reproducer which shows why the "original" fix fails and you can try it offline. For the typetracer, I'm not 100% sure if we can end up with this failure in actual code, but since we're allowing "integer like" objects in this function in the cpu backend, we might as well allow them in the typetracer case.

ikrommyd · 2025-06-11T16:19:28Z

After I tracked down the use cases of shape_item_as_index, it turns out we don't need to change the function but only a single usage of it. If y'all agree we can close this PR cause the most concise fix for numpy 2.3 support is in #3534.

pfackeldey · 2025-06-11T16:35:05Z

Superseded by #3534

check if shape item is numpy int dtype as well

149b8ec

lgray changed the title ~~check if shape item is numpy int dtype as well~~ fix: check if shape item is numpy int dtype as well Jun 7, 2025

lgray temporarily deployed to docs June 7, 2025 18:59 — with GitHub Actions Inactive

lgray requested a review from pfackeldey June 7, 2025 19:04

ikrommyd reviewed Jun 7, 2025

View reviewed changes

src/awkward/_nplikes/array_module.py Show resolved Hide resolved

pfackeldey requested changes Jun 8, 2025

View reviewed changes

ianna approved these changes Jun 10, 2025

View reviewed changes

use is_integer_like and also fix the typetracer case

26900fc

ikrommyd changed the title ~~fix: check if shape item is numpy int dtype as well~~ fix: allow numpy integers and 0-dim arrays through shape_item_as_index Jun 10, 2025

ikrommyd requested a review from ianna June 10, 2025 21:40

pfackeldey approved these changes Jun 10, 2025

View reviewed changes

ikrommyd changed the title ~~fix: allow numpy integers and 0-dim arrays through shape_item_as_index~~ fix: allow numpy integers and 0-dim arrays through shape_item_as_index to support numpy 2.3 Jun 10, 2025

ikrommyd temporarily deployed to docs June 10, 2025 21:56 — with GitHub Actions Inactive

ikrommyd mentioned this pull request Jun 11, 2025

fix: make awkward compatible with numexpr 2.11 #3533

Merged

pin numexpr to make tests pass in this one

752bd48

ikrommyd deployed to docs June 11, 2025 11:05 — with GitHub Actions View deployment

ianna requested changes Jun 11, 2025

View reviewed changes

ikrommyd mentioned this pull request Jun 11, 2025

fix: numpy 2.3 compatibility #3534

Merged

pfackeldey closed this Jun 11, 2025

ikrommyd deleted the fix_numpy2.3_expected_int branch June 11, 2025 16:38

Conversation

lgray commented Jun 7, 2025

Uh oh!

lgray commented Jun 7, 2025

Uh oh!

Uh oh!

ikrommyd commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfackeldey left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ikrommyd commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikrommyd commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgray commented Jun 8, 2025

Uh oh!

pfackeldey commented Jun 9, 2025

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

lgray commented Jun 10, 2025

Uh oh!

ikrommyd commented Jun 10, 2025

Uh oh!

ikrommyd commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikrommyd commented Jun 10, 2025

Uh oh!

ianna commented Jun 11, 2025

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

ikrommyd commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianna commented Jun 11, 2025

Uh oh!

ikrommyd commented Jun 11, 2025

Uh oh!

ikrommyd commented Jun 11, 2025

Uh oh!

pfackeldey commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ikrommyd commented Jun 7, 2025 •

edited

Loading

pfackeldey left a comment •

edited

Loading

ikrommyd commented Jun 8, 2025 •

edited

Loading

ikrommyd commented Jun 8, 2025 •

edited

Loading

ikrommyd commented Jun 10, 2025 •

edited

Loading

ikrommyd commented Jun 11, 2025 •

edited

Loading