Skip to content

ENH: Add internal real/imag ufuncs and use them for the attrs#30984

Merged
seberg merged 18 commits into
numpy:mainfrom
seberg:complex-attrs
Mar 29, 2026
Merged

ENH: Add internal real/imag ufuncs and use them for the attrs#30984
seberg merged 18 commits into
numpy:mainfrom
seberg:complex-attrs

Conversation

@seberg

@seberg seberg commented Mar 9, 2026

Copy link
Copy Markdown
Member

This is a work in progress, although already functional

I am trying to figure out how to restructure .imag and .real to:

  • Allow complex dtypes not defined inside NumPy
  • Ideally, not hard-code that they must be views
    • Fix object dtype behavior (a bit?) by not returning views.
    • It was also brought up by Warren.
  • Define one part of a future finfo (the ability to fetch the corresponding real type)
  • (while not breaking current strange behavior)

The approach here is to define internal imag and real ufuncs, plus we need a few small tweaks to allow those ufuncs to return actual views.

One wrinkle is that arr.imag = 0 works and right now is a bit hacked, because I wasn't sure I like a set_imag ufunc (but maybe that is the better solution?).
There is also a wrinkle that if arr.real/arr.imag still need weird special handling for backcompat since a ufunc really can't return a broadcasted array of zeros.

(The code looks wrong, but it is mainly boilerplate ufunc code+docs.)

Generally, I think this is a reasonable approach to make it a ufuncs, although there may be some details to hash out. Ping @ngoldbaum and @mhvk in case you have some thoughts already.

EDIT: Closes gh-1740

EDIT: Should have mentioned that I tried to use an agent to write some of the ufunc code, but it didn't do a job I liked, so there is practically nothing left of it.

Admittedly, this has a wrinkle in that I don't feel like adding a
`set_imag` ufunc as well or maybe we should?

This also fixes `object` arrays `.real` and `.imag`, although fixing
means trying to fetch the attribute (which is also not perfect, if
better? E.g. what if that is a method...)

It still has an awkward fallback: If there is *no* loop defined for
real/complex we assume it is a real value.  This is not good, but
I think the solution here is to come from the other side and identify
`ComplexDType` a bit clearer.

Why a ufunc?  Well, it allows fixing the object path here fairly
well by allowing to write a different function. It would also work
for polar coordinates.

@mhvk mhvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - I like the idea of the ufuncs, but two worries.

  1. doing .real and .imag happens a lot and is time-critical, did you check the performance? (e.g., most of my code dealing with complex numbers has def power(z): return z.real**2+z.imag**2). I fear you may have to stick with having the standard view taking as the default...
  2. I am less enchanted by the "magic" that now sometimes outputs can be views. I think these functions should behave like all ufuncs, e.g.,np.positive, and thus return a copy by default.

On (2), of course we do want to allow views and avoid actual work in that case, but could we introduce an actual signal that that is wanted (and that would thus work for np.positive as well)? At the risk of repurposing too much, maybe casting='view' or giving out=... an extra meaning, that a view is OK?

value = *reinterpret_cast<real_type *>(in);
}
else {
value = *reinterpret_cast<real_type *>(in + sizeof(real_type));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition should be done outside the loop (even if we do not exercise it!).

@mhvk

mhvk commented Mar 9, 2026

Copy link
Copy Markdown
Contributor

p.s. The broadcasted zero makes me wonder whether a ufunc is in fact the right option...

@seberg

This comment was marked as outdated.

@seberg

seberg commented Mar 10, 2026

Copy link
Copy Markdown
Member Author

OK, I did the iteration of moving it onto to the DType itself:

  • Registration now via ".imag" and ".real" similar to the sorts. (I forced the sorts to match "numpy:sort" exactly, just for simplicity.)
  • Removed the UFunc handling for views, but tweaked the promotion very slightly in order to allow a promoter that adds loops to the ufunc from the DType slot.
    (I always meant promoters being able to do this, but without this, the recursion guard might stop that from happening.)
    • I.e. the object path still calls into the UFunc, but the UFunc doesn't "own" the loops (sure, you could add additional loops to it that wouldn't affect .real and .imag).
  • The actual attributes, now access the resolve_descriptors directly. It is 20% slower (40ns -> 49ns) for me. But that is a very small thing (it might also go away if looking at things a bit closer).
    EDIT: Actually, at least practically all of this is attributable to the view creation itself taking longer on current main.

Again, still needs tests, but maybe this is the right approach. Maybe the ufuncs don't even need to exist, but it is a fairly convenient way for the new object path (in principle, it would be nice to call the ufunc core directly here, but that requires more refactor).

Still needs tests at least and clearly fixes!

EDIT: Still needs at least release notes (it is quite substantial with object arrays). But if we think this is a reasonable path, then I think it is far along now.

@seberg seberg added 01 - Enhancement 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes labels Mar 10, 2026
@seberg seberg marked this pull request as ready for review March 10, 2026 19:52
@MaanasArora

MaanasArora commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

I went over this very briefly and it looks quite nice! The user dtype API in particular benefits from having ufuncs anyway, I think. Sorts and argsorts are similar insofar they're inplace, so having these too on the dtype fits well!

I'm still wrapping my head around this a bit, but yes, putting it on the dtype made it nicer as the ufuncs have less "responsibility" for the view path now. I think it is fine for the ufunc to return views even if atypical to start with; this is mostly internal after all.

About a signal that a view is wanted, casting="view" seems like a natural extension. But it is not likely to be many ufuncs that typically return views so maybe we can get away with them being a bit special (at least in the short run)?

Edit: Going to take a deeper look soon!

@seberg seberg changed the title WIP,ENH: Add internal real/imag ufuncs and use them for the attrs ENH: Add internal real/imag ufuncs and use them for the attrs Mar 11, 2026
@seberg seberg removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Mar 11, 2026

@mhvk mhvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - had another bit of a review now.

Some comments in-line, all small except perhaps that it may make sense to use the names "real" and "imag" directly, linking them to the existing np.real and np.imag functions -- after all, they will be using the ufuncs internally... With that, it would also not be necessary to change how the function comparison is done (i.e., "numpy:real" will work fine).

Comment thread doc/source/reference/c-api/array.rst Outdated
entry points, ``(module ':')? (object '.')* name``, with ``numpy``
the default module. Examples: ``sin``, ``strings.str_len``,
``numpy.strings:str_len``.
``"sort"``, ``"argsort"``, ``".real"``, ``".imag"`` and specific names

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bigger: should this just be real and imag? This since we do have np.real and np.imag functions, and presumably those are affected as well. (Indeed, I think those in principle could just be replaced by the new ufuncs, no? In which case there would be no need to mention this here...)

Smaller: did you mean to write "are" instead of "and"? Though perhaps write instead,

Note that some names do not correspond directly to ufuncs names, but
instead to functions that use ufuncs internally:
``"sort"``, ``"argsort"``, ``".real"``, ``".imag"`` 

Comment thread doc/source/reference/c-api/array.rst Outdated
Comment thread doc/source/reference/c-api/array.rst Outdated
Comment thread numpy/_core/code_generators/ufunc_docstrings.py Outdated
Comment thread numpy/_core/code_generators/ufunc_docstrings.py Outdated
{"_core._multiarray_umath.add", &add_spec},
{"numpy:sort", &sort_spec},
{"numpy._core.fromnumeric:argsort", &argsort_spec},
// These names must match exactly right now (not ufuncs)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? I had done the above as a sanity check on our interpretation of names... (but it is fine if an exact match is required!).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I changed it for the ones not backed by ufuncs (as such!). (Same as above, I think it's fine to be strict for these...)

NPY_NO_EXPORT int
PyUFunc_AddLoopsFromSpecs(PyUFunc_LoopSlot *slots)
{
if (npy_cache_import_runtime(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to link real and imag to the corresponding functions, we could keep the old scheme (which I slightly prefer, but only slightly).

Comment thread numpy/_core/src/umath/dispatching.cpp Outdated
Comment thread numpy/_core/src/umath/real_imag_ufuncs.cpp Outdated
Comment thread numpy/lib/_type_check_impl.py Outdated
>>> a = np.array([2j, "a"], dtype=np.str_)
>>> np.isreal(a) # Warns about non-elementwise comparison
False
>>> np.isreal(a) # Compares as strings currently

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused what "Compares as strings currently" means - what would give rise to True?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, that can't be understood. np.imag(a) is the empty string "" and it then does "" == 0 which is False.
That doens't make sense... But it didn't change here.

(Just N.B. the big change here is as the object dtype behavior of returning a copy and trying to use getattr(obj, "real", obj); getattr(obj, "imag", 0) -- which is a decent guess, but of course a guess.)

@seberg

seberg commented Mar 17, 2026

Copy link
Copy Markdown
Member Author

Maybe to summarize:

  • I am still slightly in favor for just keeping this not-quite-ufuncs special. Maybe that is silly, but I see little reason to allowing the wider change (and who knows, maybe it even makes sense to implement numpy:imag, so that the ufunc works, but arr.imag does not)?!
    But happy to not use .imag, since it is probably just confusing.
    (But also not a strong opinion either way, in part, I also just didn't like to import more and more non-ufuncs there.)
  • As long as you don't think there is a major oddity around this fun "internal ufunc to implement object path mostly" setup, I think the main question to just consider once is whether the new object behavior makes sense (as the release note points out). I.e.:
    • getattr(obj, "real", obj)
    • getattr(obj, "imag", 0)
    • Of course neither works if real/imag are methods, or... And returning a write-only copy, could break funnier code :( that works with real object arrays.

@seberg

seberg commented Mar 25, 2026

Copy link
Copy Markdown
Member Author

@mhvk do you have any other worries here? As I said, I have a slight preference to just keep the strict string comparison for now (this is new and we can always broaden it up).
But otherwise if we are OK with the overall idea and the new behavior for objects, I think this should be good.

@seberg seberg added this to the 2.5.0 Release milestone Mar 25, 2026

@ngoldbaum ngoldbaum left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm experimenting with Claude Code to help me with code review. It spotted some issues below, only one of them is an outright bug (setting a struct field via a pointer before the error check for the pointer being NULL).

It also pointed out that Marten's comment about performance wasn't addressed. In particular:

calling resolve_descriptors + PyArray_NewFromDescr_int on a hot path is heavier than the old direct PyArray_NewFromDescrAndBase call. Checking whether .real/.imag result in a view (as opposed to calling the ufunc) already happens via view_offset, but the descriptor resolution overhead itself may be avoidable for the built-in complex dtypes by short-circuiting or caching.

Sorry that this was mostly me copy/pasting from Claude code. I'm pretty sure what it's saying about these topics makes sense but please let me know if this is annoying and you'd prefer I not do this.

Comment thread numpy/_core/src/multiarray/convert_datatype.c Outdated
Comment thread numpy/_core/src/umath/dispatching.cpp Outdated
// Hardcode slot names for attributes and non-ufuncs stored on the DType
// (Also avoids circular imports a bit.)
if (strcmp(slot->name, "real") == 0) {
Py_XSETREF(ufunc, npy_interned_str.real);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this pattern here and below is correct because np_interned_str.real is a borrowed reference. The next time through the loop, you'd incorrectly decref the borrowed reference. You need to INCREF first to create an owned reference. But also on Python 3.12+ interned strings are immortal so this won't ever cause any issues.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. I'll just add the NEWREF, of course others might argue it is just as well to not and rely on it being immortal, but it doesn't really matter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. I'll just add the Py_NewRef, of course others might argue it is just as well to not and rely on it being immortal, but it doesn't really matter.

PyArray_Descr *loop_descrs[2] = {NULL, NULL};
npy_intp view_offset = NPY_MIN_INTP;
int res = meth->method->resolve_descriptors(
meth->method, meth->dtypes, descrs, loop_descrs, &view_offset);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you INCREF and DECREF the objects in loop_descrs below - but couldn't they be NULL, in principle?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they can only be NULL on error.

Comment thread numpy/_core/src/multiarray/dtypemeta.h Outdated
Comment thread numpy/_core/_add_newdocs.py Outdated
seberg and others added 2 commits March 25, 2026 16:38
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@seberg

seberg commented Mar 25, 2026

Copy link
Copy Markdown
Member Author

Sorry that this was mostly me copy/pasting from Claude code. I'm pretty sure what it's saying about these topics makes sense but please let me know if this is annoying and you'd prefer I not do this.

I don't mind, but it didn't read the history: The slowdown is tiny and actually overshadowed by another, probably also meaningless, slowdown for creating the new view.
I don't think fast-paths are worth it here, this is very fast either way.

@mhvk mhvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - mostly (very) small comments left. One bigger one, though only a suggestion: instead of templates, might it be better to use static_data to decide between real and imag in the loops? The code is essentially the same, so that avoids doubling the memory use (for very little performance cost).

Comment thread doc/release/upcoming_changes/30984.capi.rst Outdated
Comment thread doc/source/reference/c-api/array.rst Outdated
Comment thread numpy/_core/code_generators/ufunc_docstrings.py Outdated
Comment thread numpy/_core/code_generators/ufunc_docstrings.py Outdated
Comment thread numpy/_core/src/umath/dispatching.cpp


/* We shouldn't normally use it, but define a simple loop anyway. */
template <typename real_type, bool real_part>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A disadvantage of templates might be that code gets duplicated a lot. In stringdtype_ufuncs.cpp, we use the array method static data to select between very similar codes (e.g., startswith_endswith). It may be worth doing the same here for real_part (possibly also for the promotor? for the loop at least, the tiny extra amount of time needed would seem easily outweighed by the extra memory used by the template).

@seberg seberg Mar 28, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll skip this... frankly the difference is 504 bytes, so a bloat of 0.005%. And that is ignoring the fact that the compiler should probably duplicate the loop to optimize it.

(The total binary increase is certainly more here, but just using the same wrong version for both gives that.)

}


template <auto component>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again, I would suggest to pass in the component via static_data

@mhvk mhvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, all fine now!

@seberg

seberg commented Mar 29, 2026

Copy link
Copy Markdown
Member Author

OK, I'll put this in then, thanks for the reviews. Not too happy that I need to pass the private flag to make it fully reliable for free-threading, but it is a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.real doesn't work on object arrays (Trac #1142)

4 participants