AI-generated port of ml_dtypes to numpy 2. by copybara-service[bot] · Pull Request #360 · jax-ml/ml_dtypes

copybara-service · 2026-02-24T07:28:53Z

AI-generated port of ml_dtypes to numpy 2.

hawkinsp · 2026-02-24T08:02:34Z

I have no intention of submitting this as is without doing a bunch of manual work on it first, but it shows that we can port ml_dtypes to numpy 2's apis with numpy 2.4. I have not read these that closely myself even.

I note there were a couple of things I found that might need fixes on the numpy side:

It appears despite NumPy claiming in dtype_api.h that:

// Copyswap is disabled
// #define NPY_DT_PyArray_ArrFuncs_copyswapn 3 + _NPY_DT_ARRFUNCS_OFFSET
// #define NPY_DT_PyArray_ArrFuncs_copyswap 4 + _NPY_DT_ARRFUNCS_OFFSET

The copyswap functions are called and if you don't define them we end up crashing, at least under numpy 2.4:

* thread #1, name = 'custom_float_te', stop reason = signal SIGSEGV: address not mapped to object (fault address=0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x00007ffff5653391 libthird_Uparty_Spy_Snumpy_Slibmultiarray.so`PyArray_Byteswap(self=0x000050533c6398f0, inplace='\x01') at methods.c:546:13
    frame #2: 0x00007ffff5654021 libthird_Uparty_Spy_Snumpy_Slibmultiarray.so`PyArray_Byteswap(self=0x000050533c639ad0, inplace='\0') at methods.c:570:15
    frame #3: 0x00007ffff565b092 libthird_Uparty_Spy_Snumpy_Slibmultiarray.so`array_byteswap(self=0x000050533c639ad0, args=0x0000555556b72f00, kwds=0x000050533cfc2400) at methods.c:587:12
    frame #4: 0x0000555555db65cc custom_float_test`method_vectorcall_VARARGS_KEYWORDS(func=0x000050533d78b330, args=0x000050533fc31878, nargsf=9223372036854775809, kwnames=0x000050533ceb3970) at descrobject.c:365:14
    frame #5: 0x0000555555d934e5 custom_float_test`_PyObject_VectorcallTstate(tstate=0x0000555556c188a8, callable=0x000050533d78b330, args=0x000050533fc31878, nargsf=9223372036854775809, kwnames=0x000050533ceb3970) at pycore_call.h:92:11
    frame #6: 0x0000555555d94f6a custom_float_test`PyObject_Vectorcall(callable=0x000050533d78b330, args=0x000050533fc31878, nargsf=9223372036854775809, kwnames=0x000050533ceb3970) at call.c:325:12
    frame #7: 0x00005555561be369 custom_float_test`_PyEval_EvalFrameDefault(tstate=0x0000555556c188a8, frame=0x000050533fc317f0, throwflag=0) at bytecodes.c:2715:19
    frame #8: 0x0000555556189546 custom_float_test`_PyEval_EvalFrame(tstate=0x0000555556c188a8, frame=0x000050533fc31628, throwflag=0) at pycore_ceval.h:89:16
    frame #9: 0x000055555618940b custom_float_test`_PyEval_Vector(tstate=0x0000555556c188a8, func=0x000050533dc9f880, locals=0x0000000000000000, args=0x000050533cafa380, argcount=1, kwnames=0x000050533fccbb80) at ceval.c:1685:12
    frame #10: 0x0000555555d95578 custom_float_test`_PyFunction_Vectorcall(func=0x000050533dc9f880, stack=0x000050533cafa380, nargsf=1, kwnames=0x000050533fccbb80) at call.c:419:16
    frame #11: 0x0000555555d9c065 custom_float_test`_PyObject_VectorcallTstate(tstate=0x0000555556c188a8, callable=0x000050533dc9f880, args=0x000050533cafa380, nargsf=1, kwnames=0x000050533fccbb80) at pycore_call.h:92:11
    frame #12: 0x0000555555d9a289 custom_float_test`method_vectorcall(method=0x000050533cfc2000, args=0x000050533cafa388, nargsf=9223372036854775808, kwnames=0x000050533fccbb80) at classobject.c:61:18
    frame #13: 0x0000555555d94ee7 custom_float_test`_PyVectorcall_Call(tstate=0x0000555556c188a8, func=(custom_float_test`method_vectorcall at classobject.c:44), callable=0x000050533cfc2000, tuple=0x0000555556b72f00, kwargs=0x000050533cfc1c40) a
t call.c:283:24

The AI cunningly did this to get around the problem but NumPy either needs to either allow copyswap and copyswapn to be provided or not call them.

#ifndef NPY_DT_PyArray_ArrFuncs_copyswapn
#define NPY_DT_PyArray_ArrFuncs_copyswapn (3 + (1 << 11))
#endif

#ifndef NPY_DT_PyArray_ArrFuncs_copyswap
#define NPY_DT_PyArray_ArrFuncs_copyswap (4 + (1 << 11))
#endif

it seems that it's not possible to override the dot operator correctly for a new-style user dtype. Is that correct?

seberg · 2026-02-24T08:28:00Z

it seems that it's not possible to override the dot operator correctly for a new-style user dtype. Is that correct?

I think we may have to do the same cunning trick here, which is fine. I was a bit in the "add when needed" mode for ArrFuncs, because really we should solve all of these differently...
The one other thing that I am not sure about is that we might need the old getitem function in order to return a Python float/integer for .item().

Either way, as much as it isn't nice, I think when it comes to ArrFuncs it's very much workable to brutally monkey-patch it.

hawkinsp · 2026-02-25T10:54:56Z

it seems that it's not possible to override the dot operator correctly for a new-style user dtype. Is that correct?

I think we may have to do the same cunning trick here, which is fine. I was a bit in the "add when needed" mode for ArrFuncs, because really we should solve all of these differently... The one other thing that I am not sure about is that we might need the old getitem function in order to return a Python float/integer for .item().

Either way, as much as it isn't nice, I think when it comes to ArrFuncs it's very much workable to brutally monkey-patch it.

Can this be done? The failure is:

>     result = np.dot(x, y)
E     TypeError: This function currently only supports native NumPy dtypes and old-style user dtypes, but the dtype was bcomplex32.
E     (The function may need to be updated to support arbitraryuser dtypes.)

and I think the call chain there is something like PyArray_InnerProduct calls PyArray_ObjectType on its arguments, which promply dies here: https://github.com/numpy/numpy/blob/10e9faf1afbecca9316ce752c8a1dc8807137edb/numpy/_core/src/multiarray/convert_datatype.c#L1907

I don't think this can be worked around from the dtype? We didn't make it even as far as calling the dtype's code.

PiperOrigin-RevId: 871352005

seberg · 2026-02-25T11:40:03Z

:(, I had somehow missed that it failed this early, thought it was later. Let me make sure to fix this for 2.5.
But the question is still whether we can "backport" this part (I really naively thought this would be more about the ArrFuncs where I wouldn't have any squirms).

Monkeypatching away this particular is likely too crazy :(. We I think we would have to:

keep a legacy num = PyArray_RegisterDType() around
descr = PyArray_DescrFromType(num). Then Py_SET_TYPE(descr, NewDType), and
NewDType.flags |= 1; NewDType.num = descr->num (tell NumPy this is a legacy dtype. That should be fine because it can be used like one -- i.e. it has a type number now).

This whole dance is basically just to create a type number, because I liked the idea of not needing type numbers. And if there was no history here, maybe all of this would be less of a deal (i.e. find a solution without assigning a type number), but porting things...

In theory one could monkey-patch the other way around, but that seems even less desire-able. FWIW, I think we can add code to NumPy to do the above in a sane way (i.e. a single new flag or so, that says "my dtype is legacy compatible and should get a type number".).

hawkinsp · 2026-02-25T11:43:32Z

BTW, I just made one more change in this branch, which is to:

set the kind characters to their natural kinds (f instead of V).
set all the type descriptor characters to the same one ? rather than me claiming a unique one randomly for each type.

The tests all seem to pass, which is great!

I wonder however if that's the right thing to do or not. I wonder what if anything still cares about the type descriptor characters?

seberg · 2026-02-25T12:54:18Z

I wonder however if that's the right thing to do or not. I wonder what if anything still cares about the type descriptor characters?

Not much really. Some downstream projects could in theory use it as C-API, but I'll doubt it overall.

Things that might break, we should maybe open NumPy issues (I can do that):

You need to implement __hash__ for sure.
I am not sure about __reduce__
Hmmmm, one thing I now realize is that the __array_interface__ could be a problem. Exporting things as <f2 for bfloat16 which then round-trips incorrectly :(.
So it might be that we have to fix the __array_interface__ repr in NumPy 2.5 before we can actually change the kind character.

I would be tempted to leave the character at \0, which is the default right now. Just don't use ? that is the character of a boolean :).

The tests all seem to pass, which is great!

But I guess dot() is still broken, then?

hawkinsp · 2026-02-25T13:02:17Z

I wonder however if that's the right thing to do or not. I wonder what if anything still cares about the type descriptor characters?

Not much really. Some downstream projects could in theory use it as C-API, but I'll doubt it overall.

Things that might break, we should maybe open NumPy issues (I can do that):

You need to implement __hash__ for sure.

I am not sure about __reduce__

Hmmmm, one thing I now realize is that the __array_interface__ could be a problem. Exporting things as <f2 for bfloat16 which then round-trips incorrectly :(.
So it might be that we have to fix the __array_interface__ repr in NumPy 2.5 before we can actually change the kind character.

I would be tempted to leave the character at \0, which is the default right now. Just don't use ? that is the character of a boolean :).

The tests all seem to pass, which is great!

But I guess dot() is still broken, then?

Yes. dot is broken, I'm just skipping that for now.

hawkinsp · 2026-02-25T15:29:15Z

I would be tempted to leave the character at \0, which is the default right now. Just don't use ? that is the character of a boolean :).

I did this and after a couple of fixes it works.

numpy/numpy#30879 seems necessary now.

MaanasArora · 2026-03-02T13:15:49Z

Sorry, coming into this a bit late! But for dot, backporting does seem hard, though maybe not impossible if we just special-case the common dtype code with a minimal something for user dtypes? It won't really be the right thing to do though I guess, as it's essentially introducing a rough version of a (missed) feature.

On the NumPy side, I dug through the code and think the 'decision' on the dtype is basically made just here:

https://github.com/numpy/numpy/blob/dd102ade8afbe0bf16870cca75fa391fe17cc634/numpy/_core/src/multiarray/multiarraymodule.c#L988-L997

(there is a small reference below, but there's no real logic as far as I could tell.) So there might not be as much to port, hopefully, and we could just do "if not-legacy dtype, trim op descr"? But constructing the descr might have quirks, so it may not be as minimal at it seems, especially for backporting... trying to look more into this.

copybara-service bot force-pushed the test_871352005 branch 4 times, most recently from 16ac2b3 to c6fbd5f Compare February 24, 2026 08:00

hawkinsp mentioned this pull request Feb 24, 2026

Change type chars and document complex #357

Open

copybara-service bot force-pushed the test_871352005 branch 3 times, most recently from 10ca2f2 to 4a875a0 Compare February 25, 2026 10:31

copybara-service bot force-pushed the test_871352005 branch from 4a875a0 to 7c2ea5d Compare February 25, 2026 11:11

AI-generated port of ml_dtypes to numpy 2.

974cb8c

PiperOrigin-RevId: 871352005

copybara-service bot force-pushed the test_871352005 branch from 7c2ea5d to 974cb8c Compare February 25, 2026 11:36

hawkinsp mentioned this pull request Feb 25, 2026

ENH: Test ._is_numeric not .char in np.testing.assert_equal numpy/numpy#30879

Merged

MaanasArora mentioned this pull request Mar 4, 2026

BUG: Fix np.dot to allow user dtypes numpy/numpy#30931

Merged

charris mentioned this pull request Mar 7, 2026

ENH: Test .kind not .char in np.testing.assert_equal (#30879) numpy/numpy#30955

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI-generated port of ml_dtypes to numpy 2.#360

AI-generated port of ml_dtypes to numpy 2.#360
copybara-service[bot] wants to merge 1 commit intomainfrom
test_871352005

copybara-service bot commented Feb 24, 2026

Uh oh!

hawkinsp commented Feb 24, 2026

Uh oh!

seberg commented Feb 24, 2026

Uh oh!

hawkinsp commented Feb 25, 2026

Uh oh!

seberg commented Feb 25, 2026

Uh oh!

hawkinsp commented Feb 25, 2026

Uh oh!

seberg commented Feb 25, 2026

Uh oh!

hawkinsp commented Feb 25, 2026

Uh oh!

hawkinsp commented Feb 25, 2026

Uh oh!

MaanasArora commented Mar 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

copybara-service bot commented Feb 24, 2026

Uh oh!

hawkinsp commented Feb 24, 2026

Uh oh!

seberg commented Feb 24, 2026

Uh oh!

hawkinsp commented Feb 25, 2026

Uh oh!

seberg commented Feb 25, 2026

Uh oh!

hawkinsp commented Feb 25, 2026

Uh oh!

seberg commented Feb 25, 2026

Uh oh!

hawkinsp commented Feb 25, 2026

Uh oh!

hawkinsp commented Feb 25, 2026

Uh oh!

MaanasArora commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MaanasArora commented Mar 2, 2026 •

edited

Loading