MAINT,ENH: Simplify `CScalar` handling and ready it for arbitrary dtypes by seberg · Pull Request #9503 · cupy/cupy

seberg · 2025-11-26T10:19:03Z

This tries to simplify the scalar handling. In part just for maintenance and a small speed boost, but largely to make it easier to support arbitrary dtypes in the scalar code-path (i.e. split out from the ml_dtypes draft PR).

The simplification ideas ideas are:

Rely on NumPy C-API to convert Python scalars to C data that we can use for kernel launches.
Simplify handling by pushing NEP 50 "weak scalar" handling into the CScalar and removing the distinction between "numpy scalar/CScalar" path.
Make CScalar simply hold the original scalar

The other change is making NumPy a build-time dependency.

I'll add some comments in-line, so discussions are easier to focus on each topic.

This simplifies the CScalar, it does however use NumPy C-API to do so. The thought is two-fold: 1. Try to get rid of any dtype specific code as much as possible 2. Remove NumPy/CScalar detection to always prefer CScalar for simplicity 3. Smaller misc improvements

seberg · 2025-11-26T10:23:14Z

tests/cupyx_tests/jit_tests/test_cooperative_groups.py


        x = cupy.empty((16,), dtype=cupy.uint64)
-        x[:] = -1
+        x[:] = cupy.int64(-1)  # wrap-around cast to uint


Maybe clearer to comment here, but this is an actual behavior change. Previously, we checked for things like uint8_arr + (-1) raising an error now in NumPy in guess_routine, now it is checked one level lower.

That means that direct elementwise kernel calls now also see the new behavior. This -1 behavior especially can be tedious, though. It's certainly possible to restore the old behavior if we think it is likely to create hassle.
(Or move it later if we find it does in practice after a release.)

@asi1024 would know this a lot better since he built the whole CuPy JIT machinery, but I think somewhere in the compiler we have a way to change how Python scalars should be interpreted inside a JIT kernel. Maybe it's this:

cupy/cupyx/jit/_cuda_typerules.py

Lines 111 to 137 in f60edcf

def get_ctype_from_scalar(mode: str, x: Any) -> _cuda_types.Scalar:

if isinstance(x, numpy.generic):

return _cuda_types.Scalar(x.dtype)

if mode == 'numpy':

if isinstance(x, bool):

return _cuda_types.Scalar(numpy.bool_)

if isinstance(x, int):

return _cuda_types.Scalar(numpy.int64)

if isinstance(x, float):

return _cuda_types.Scalar(numpy.float64)

if isinstance(x, complex):

return _cuda_types.Scalar(numpy.complex128)

if mode == 'cuda':

if isinstance(x, bool):

return _cuda_types.Scalar(numpy.bool_)

if isinstance(x, int):

if -(1 << 31) <= x < (1 << 31):

return _cuda_types.Scalar(numpy.int32)

return _cuda_types.Scalar(numpy.int64)

if isinstance(x, float):

return _cuda_types.Scalar(numpy.float32)

if isinstance(x, complex):

return _cuda_types.Scalar(numpy.complex64)

raise NotImplementedError(f'{x} is not scalar object.')

My thinking is: Maybe there is a way to leave the CuPy JIT default unchanged by this PR?

In that case it is unchanged since the JIT kernel discovers a reasonable type here (int64) which is also the final kernel type.

The change only occurs for places where the kernel C-type is explicitly a uint or narrower int.

seberg · 2025-11-26T10:25:54Z

cupy/_core/_scalar.pyx

+        return descr->f;
+    }
+    #endif
+    """


I'll confirm that this works for NumPy 1.x locally. As mentioned, we can simplify this dance to just the cdef PyArray_Pack() if we hard require NumPy 2 (i.e. with NPY_TARGET_VERSION=NPY_2_0_API_VERSION, NumPy will generate an error if importing with NumPy 1.x).

But, I wasn't sure that we should do that hard yet and while you may have to be me to know that this is all fine, I am me :).

Q: Remind me what's the conclusion here? That we still allow importing with 1.x, we just wouldn't claim full support for it?

Ah, I never removed this :/! I think we said we don't need <2.0, it still felt a bit strange to me to enforce strictly in this PR. But let me just do this.

It is easy to restore this if there is any doubt about in the end. (Makes this a lot cleaner, all we'll have left is the Cython define for PyArray_Pack).

seberg · 2025-11-26T10:30:17Z

cupy/_core/_scalar.pyx

+        # NOTE(seberg): This uses assignment logic, which is very subtly
+        # different from casting by rejecting nan -> int. This is *only*
+        # relevant for `casting="unsafe"` passed to ufuncs with `dtype=`.
+        # It also means we fail for out of bound integers (NEP 50 change).


Maybe to explain this: This uses the same logic as arr_with_dtype[0] = value and that is exceedingly subtly different from a casting.
That truly only matters for things like cp.add(cp.float64(np.nan), 1, casting="unsafe", dtype=int) style call. So I think it's safe to ignore :).

leofang · 2025-12-11T14:53:46Z

/test mini

seberg · 2025-12-11T18:58:34Z

/test mini

asi1024 · 2025-12-12T07:45:38Z

cupy/_core/_dtype.pyx

+    try:
+        _scalar.get_typename(dtype)  # allow if we know a C typename.
+    except (ValueError, KeyError):
+        if not error:
+            return False
+        else:
+            raise ValueError(f'Unsupported dtype {dtype}') from None


Should we return True when scalar.get_typename(dtype) does not raise an exception?

Yes, good catch. This was prep for ml_dtypes and would be wrong for those so we can string them into get_typename() and not in many places.

cupy/_core/_dtype.pyx

leofang

Thanks, @seberg! LGTM overall. Left some comments/questions.

leofang · 2025-12-20T03:45:12Z

cupy/_core/_dtype.pyx

@@ -1,5 +1,7 @@
 cimport cython  # NOQA

+from . cimport _scalar


nit: use absolute import

cupy/_core/_dtype.pyx

leofang · 2025-12-20T03:59:39Z

tests/cupyx_tests/jit_tests/test_cooperative_groups.py


        x = cupy.empty((16,), dtype=cupy.uint64)
-        x[:] = -1
+        x[:] = cupy.int64(-1)  # wrap-around cast to uint


@asi1024 would know this a lot better since he built the whole CuPy JIT machinery, but I think somewhere in the compiler we have a way to change how Python scalars should be interpreted inside a JIT kernel. Maybe it's this:

cupy/cupyx/jit/_cuda_typerules.py

Lines 111 to 137 in f60edcf

def get_ctype_from_scalar(mode: str, x: Any) -> _cuda_types.Scalar:

if isinstance(x, numpy.generic):

return _cuda_types.Scalar(x.dtype)

if mode == 'numpy':

if isinstance(x, bool):

return _cuda_types.Scalar(numpy.bool_)

if isinstance(x, int):

return _cuda_types.Scalar(numpy.int64)

if isinstance(x, float):

return _cuda_types.Scalar(numpy.float64)

if isinstance(x, complex):

return _cuda_types.Scalar(numpy.complex128)

if mode == 'cuda':

if isinstance(x, bool):

return _cuda_types.Scalar(numpy.bool_)

if isinstance(x, int):

if -(1 << 31) <= x < (1 << 31):

return _cuda_types.Scalar(numpy.int32)

return _cuda_types.Scalar(numpy.int64)

if isinstance(x, float):

return _cuda_types.Scalar(numpy.float32)

if isinstance(x, complex):

return _cuda_types.Scalar(numpy.complex64)

raise NotImplementedError(f'{x} is not scalar object.')

My thinking is: Maybe there is a way to leave the CuPy JIT default unchanged by this PR?

cupy/_core/internal.pxd

cupy/_core/_fusion_trace.pyx

leofang · 2025-12-20T04:42:45Z

cupy/_core/_scalar.pyx

+        return descr->f;
+    }
+    #endif
+    """


Q: Remind me what's the conclusion here? That we still allow importing with 1.x, we just wouldn't claim full support for it?

cupy/_core/_scalar.pyx

leofang · 2025-12-20T15:29:41Z

/test mini

leofang · 2025-12-21T19:30:09Z

LGTM! CI is green too. Though Kenichi-san mentioned there is a code freeze now. Let merge it after the freeze and also give @asi1024 a bit more time in case he wants to chime in 🙂

leofang · 2025-12-21T19:31:04Z

@seberg forgot to ask, is there a simple reproducer for us to check the perf difference before and after this PR? Would be nice to know the expected ballpark improvement.

leofang · 2025-12-22T23:16:00Z

Let merge it after the freeze and also give @asi1024 a bit more time in case he wants to chime in 🙂

Code freeze is lifted. Let me get this merged. @asi1024 please let us know if you have any concern and we can follow up in a separate PR.

@seberg forgot to ask, is there a simple reproducer for us to check the perf difference before and after this PR? Would be nice to know the expected ballpark improvement.

Would still be nice to keep a record in this PR for future reference, in case people come and ask for what drove the decision of making NumPy a build-time dependency.

MAINT,ENH: Simplify `CScalar` handling and ready it for arbitrary dtypes

seberg · 2026-01-08T13:21:41Z

is there a simple reproducer for us to check the perf difference before and after this PR?

Not sure about this, but things like cp.add(f, f) (with f = np.float32(1.)) is e.g. unchanged compared to v13 (cp.add(1, 1) maybe very slightly faster).
I think it's basically a wash. The call to NumPy costs a few ns maybe, OTOH, I optimized an allocation away, which may even safe some in the end.

On the grand scheme, the perf differences here are just very small I think.

seberg added 7 commits November 26, 2025 00:26

Stop threading through weaks as a tuple (it's attached to the scalar)

db7e090

No need to check dtype validity, this will be cast to a loop dtype.

a95aede

Use descr (rather than dtype) for C-side name and don't allocate for now

90535e4

no need for float16 helpers if we use NumPy (small fixups)

7deaf05

Remove now unnecessary CInt8, etc. classes

25996e2

Hack NumPy 1.x support (with meaningless limitation for now)

ab6fde6

seberg requested a review from a team as a code owner November 26, 2025 10:19

seberg commented Nov 26, 2025

View reviewed changes

Tweak header dance for 2.x/1.x support

2906394

asi1024 self-assigned this Nov 28, 2025

asi1024 added cat:performance Performance in terms of speed or memory consumption prio:high labels Nov 28, 2025

seberg added 2 commits December 11, 2025 09:26

Merge branch 'main' into cscalar-cleanup

3cc45c7

I named things to .descr but forgot one place (it can avoid it)

43f5850

asi1024 reviewed Dec 12, 2025

View reviewed changes

seberg commented Dec 12, 2025

View reviewed changes

cupy/_core/_dtype.pyx Show resolved Hide resolved

leofang reviewed Dec 20, 2025

View reviewed changes

seberg added 4 commits December 20, 2025 01:37

Address review comments

b6d629e

Do not support NumPy 2.x to simplify code

64ad83d

Merge branch 'main' into cscalar-cleanup

437de8e

fixup style

9b6f286

leofang added this to the v14 milestone Dec 21, 2025

leofang added the to-be-backported Pull-requests to be backported to stable branch label Dec 22, 2025

leofang approved these changes Dec 22, 2025

View reviewed changes

leofang merged commit 4d9486d into cupy:main Dec 22, 2025
61 checks passed

chainer-ci pushed a commit to chainer-ci/cupy that referenced this pull request Dec 22, 2025

Merge pull request cupy#9503 from seberg/cscalar-cleanup

dd98b55

MAINT,ENH: Simplify `CScalar` handling and ready it for arbitrary dtypes

chainer-ci mentioned this pull request Dec 22, 2025

[backport] MAINT,ENH: Simplify CScalar handling and ready it for arbitrary dtypes #9546

Merged

leofang mentioned this pull request Dec 22, 2025

ENH: ml_dtypes.bfloat16 support #9494

Merged

seberg deleted the cscalar-cleanup branch December 23, 2025 12:19

leofang modified the milestones: v14, v14.0.0, v15 Dec 24, 2025

leofang mentioned this pull request Dec 31, 2025

Show both build- & run- time versions for all libraries in cupy.show_config() #9560

Open

kmaehashi mentioned this pull request Jan 13, 2026

Add NumPy to build-time dependency cupy/cupy-release-tools#472

Merged

kmaehashi mentioned this pull request Feb 17, 2026

[v14] Bump CuPy version in Docker images #9702

Merged

leofang modified the milestones: v15, v15.0.0a1 Mar 9, 2026

	def get_ctype_from_scalar(mode: str, x: Any) -> _cuda_types.Scalar:
	if isinstance(x, numpy.generic):
	return _cuda_types.Scalar(x.dtype)

	if mode == 'numpy':
	if isinstance(x, bool):
	return _cuda_types.Scalar(numpy.bool_)
	if isinstance(x, int):
	return _cuda_types.Scalar(numpy.int64)
	if isinstance(x, float):
	return _cuda_types.Scalar(numpy.float64)
	if isinstance(x, complex):
	return _cuda_types.Scalar(numpy.complex128)

	if mode == 'cuda':
	if isinstance(x, bool):
	return _cuda_types.Scalar(numpy.bool_)
	if isinstance(x, int):
	if -(1 << 31) <= x < (1 << 31):
	return _cuda_types.Scalar(numpy.int32)
	return _cuda_types.Scalar(numpy.int64)
	if isinstance(x, float):
	return _cuda_types.Scalar(numpy.float32)
	if isinstance(x, complex):
	return _cuda_types.Scalar(numpy.complex64)

	raise NotImplementedError(f'{x} is not scalar object.')

Uh oh!

Conversation

seberg commented Nov 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leofang commented Dec 11, 2025

Uh oh!

seberg commented Dec 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leofang commented Dec 20, 2025

Uh oh!

leofang commented Dec 21, 2025

Uh oh!

leofang commented Dec 21, 2025

Uh oh!

leofang commented Dec 22, 2025

Uh oh!

Uh oh!

seberg commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants