ENH: Prepare umath module for the C++ by seiko2plus · Pull Request #19753 · numpy/numpy

seiko2plus · 2021-08-25T15:22:18Z

merge after #23298

This work aims to greatly reduce reliance on the C API, in order to facilitate the complete elimination of C code in the future and count on modern C++.

This request includes multiple patches, the main purpose of which is to impose a framework that facilitates the creation of universal functions using the C++ language without the need of use python code as generator.

Two examples have been added to show how to use it. You can explore them by reviewing ufunc_operations.cpp

TODO:

~~wrapping PyObject and PyArray~~ Adding Array to manage the array data only
~~tweak the C++ wrapper for universal functions(add Doxygen comment blocks, add static assertation)~~
will be included by ENH, SIMD: Add C++ wrapper for universal intrinsics #21057
add C++ types for float16, datetime, timedelta for type safety and overloading.
~~add testing unit for the C++ wrapper~~
add more examples

seiko2plus · 2021-11-10T18:50:58Z

There's a necessity to provide a light C++ wrapper for PyObject and PyArray, similar to Boost.Python and pybind11 but without supporting C++ exceptions to fit our approach(google style). I'm going to push the local changes once I get done with these wrappers.

The work has been scaled down and focused only on universal functions, so please disregard this comment.

seiko2plus · 2023-05-02T19:00:02Z

@mattip, @seberg, I'm done here and its ready for reviewing.

seberg · 2023-05-02T19:28:11Z

I am not sure I will make the meeting tomorrow but maybe you can organize a meeting together with @mattip some time at the end of the week if not? I need to let this sink, on first sight it looks like quite a bit new things compared to last time I had a brief look (introducing Slice and unwrapping the old API into your new one, and especially since we currently have yet another layer since that old API is not sustainable forever, that seems all not ideal to me TBH).

I guess you want that C++ style for loop. Not sure if I think it is worth a lot (but I am not big into C++), but maybe 1-3 lines of boilerplate, rather than another indirection and complicated wrapping layer would do the trick as well?

Anyway, have to look at it closer and discuss a bit and then think a bit more before I can make a deeper comments.

seberg · 2023-05-04T12:14:39Z

numpy/core/src/umath/ufunc_object.hpp

+                )...
+            );
+            std::apply(reinterpret_cast<TFunc>(data), params);
+        };


So, I think my main worry is this part. We already have two loop versions:

The legacy that is unwrapped here.

A new loop signature, which is largely the same (but adds more info and an error return). This is not accessible through PyUFunc_FromFuncAndDataAndSignatureAndIdentity, but we should aim to replace it sooner rather than later internally.

This signature is very much open to change, but I would like to be sure that updating existing code is easy eventually.

We don't need a "legacy" loop, we only have to support it when downstream gives us one.

This new-loop can be specialized for specific strides (in most ufunc calls) for speed, so that selection of SIMD can happen outside of the inner-loop itself.

Since you use the static void *data, the compiler cannot possibly optimize this away either. The actual (new) ufunc signature currently doesn't even include static data (but we can add that back in some form, it is likely useful also for generic heavy functions, where we currently just pass a function pointer also).

Note that one thing I would like mid-term, is to move the current if (stride[0] == 0), etc. branches out of the inner-loop.
The machinery exists, but we have to start using it. This refactor should help with that since it should make our ufunc generation just a bit less of a mess hopefully, and to get there we should update most/all ufunc signatures to the new-style and adapt the ufunc generation a fair bit.

I guess you want C++ iterators and I expect they are really helpful and maybe necessary for later SIMD machinery improvements since those make iteration more painful.

But I do need to ask what the alternatives are? Would it be really be terrible to build an iterator (or multiple iterators) inside the inner-loop with a one-line per-argument boilerplate? That way you dog-feed the C-signature explicitly but still iterate in whatever way we like. Also, I do think it would make it much clearer to have both code styles side-by-side without getting super confused. And we will always have both side-by-side, because the C-signature is still our public API (unless we have the guts to change that, which may be a path, but probably a more difficult and slower one than I would like right now).

One more question I have but not for Sayed explicitly. I suspected we would aim to just remove the Python generation completely, even if we end up with a rather long C++ file (that could be generated via an adapted old file once for reference).

But to me, that would be another advantage of starting of with not modifying the signature at all, because then we can do the two steps separately. (Yes, this requires a bit of double work, but on the plus side you can do it in chunks.)

A new loop signature, which is largely the same (but adds more info and an error return).

PyArrayMethod_Context, and int rather then void as return, well not sure how useful PyArrayMethod_Context is but it can be supported through C++ also new C++ signature can support either bool or int for handling errors even if I think that testing PyErr_Occurred() is quite enough or maybe we can enable C++ exception in some stages.

This is not accessible through PyUFunc_FromFuncAndDataAndSignatureAndIdentity, but we should aim to replace it sooner rather than later internally.

If we are moving to C++ then what the point of continue enhancing C interface? Building a new interface in C++ to count more on static optimization should significantly speed up the internal calls.

This signature is very much open to change, but I would like to be sure that updating existing code is easy eventually.

The current approach provides the best start for the highest layer but it costs extra later for sure we can enhance it later at least by cutting off the legacy layer.

This new-loop can be specialized for specific strides (in most ufunc calls) for speed, so that selection of SIMD can happen outside of the inner-loop itself.

I'm not sure how it's going to be useful for SIMD kernels, we usually support both contiguous and non-contiguous memory access as long as there's no memory overlapping. Specialize within the inner loop is much easier and can save binary size too, For example, we can easily implement a C++ meta class or function that takes two operators one for scalar operation and another for SIMD intrinsic then it's generating all routined branches including SIMD vector memory access, unrolling, reduction and scalar fallback.

Since you use the static void *data, the compiler cannot possibly optimize this away either.

I don't think it's necessary to support it for C++, there're better patterns.

Note that one thing I would like mid-term, is to move the current if (stride[0] == 0), etc. branches out of the inner-loop.

If counting on meta functions to generate these branches wouldn't be robust then we can discuss moving to specialize these branches through UFunc API, after this pr gets merged I will provide real examples in this matter.

I guess you want C++ iterators and I expect they are really helpful and maybe necessary for later SIMD machinery improvements since those make iteration more painful.

It is also important for type safety.

But I do need to ask what the alternatives are?
That way you dog-feed the C-signature explicitly but still iterate in whatever way we like.

Avoid using C++ functions pointers and wraps the C UFunc API through meta programming, this solution is zero-cost overhead but unfortunately not that flexible and increases the complexity of the code especially when exporting internal symbols for CPU runtime dispatching.

And we will always have both side-by-side, because the C-signature is still our public API (unless we have the guts to change that, which may be a path, but probably a more difficult and slower one than I would like right now).

I agree but nothing is perfect, maybe provides inline functions that can serve both directions when it's needs until C++ dominates.

suspected we would aim to just remove the Python generation

Yes, but we also have to get rid of C template sources too. all at once seems more convenient to me. The new signature works only internally and can be improved later at any time.

I made changes to the code, to remove the runtime overhead and pass the inner loops as objects rather than function pointers (fully meta) please take a look at commit 1969494 to realize the differences:

However, once we deal with runtime dispatch or any exported symbols the overhead will come back again. I still prefer the previous method.

Another suggestion to C++ signature is to change the output indicator from reference to non-const:

-(Slice<T> in0, Slice<T> in1, Slice<T> &out) +(const Slice<T> in0, const Slice<T> in1, Slice<T> out)

The number of inputs and outputs are deduced from the signature rather than specifying it manually in case we decided later to support non-static signatures similar to function overloading in C++.

I won't have time to look at this for a few weeks. But since Marten commented: Yes, we should revive this.

FWIW, I am still a bit unsure about the right choice of inner-loop. Being more used to C++ these days helps, though.

My main question I guess remains a bit: I think you wrote this under the assumption to make the full thing C++. I lack a bit of clarity that this is a good choice, since we must keep a first class public C-API.

I suppose I can rephrase this into: Please assume that you will never be able to remove the step to translate the current C call into a C++ internal call. Would you reconsider this API choice? If not, then probably fine. But if yes, what would it look like?
(Even if it is just something creating a MultiIter<...>, whether more complicated or not to do the C/C++ layer conversion.)

(I would have been very happy, and still would be happy, to tweak the C signature. But moving where the C/C++ interface remains a big unclear thing to me that I don't know if it could or should be done. If others are more clear on it, that would be fine...)

eric-wieser · 2023-05-04T13:28:22Z

numpy/core/src/common/npstd.hpp

+using Bool = std::conditional_t<sizeof(bool) == sizeof(unsigned char),
+    bool, Holder<unsigned char, kBool>>;
+/// single-precision floating-point type, compatible with IEEE-754.
+using Float = float;


Why is Holder used for long but not float?

Theoretically, only the following data types are required to be distinguished:

using UByte = Holder<unsigned char, kUByte>; using Bool = std::conditional_t<sizeof(bool) == sizeof(unsigned char), bool, Holder<unsigned char, kBool>>; using LongDouble = std::conditional_t<kLongDoubleIsDouble, Holder<double, kLongDouble>, long double>; using CLongDouble = std::conditional_t<kLongDoubleIsDouble, Holder<complex<double>, kCLongDouble>, complex<long double>>;

To forbid the use of built-in datatypes directly, using long instead of Long should break the build when kTypeID is involved:

https://github.com/numpy/numpy/blob/2beb25fa4e90923e7b260408bf45b81fe238cf44/numpy/core/src/common/npstd.hpp#L174C4-L176

Allowing raw integer built-in data types isn't type-safe when it comes to determining the data type IDs (template specialization), since they are commonly used as alias almost for all the generic std alais including fixed width data types.

eric-wieser · 2023-05-04T13:31:53Z

numpy/core/src/common/npstd.hpp

- * This type will be set to `void` when `npy_longdouble` is not defined
- * as `long double`.
+ * Note: this class doesn't handel reference counting
+ * via RAII or raise any kind of exceptions.


Why not? It seems a bit of a trap at the moment, because Object(1); leaks a reference, but Object(some_obj) doesn't.

Supporting RAII should clean up the code, I'm going to support it. My reason at the moment is that we disable C++ exceptions by default which means there's no support for STD runtime including memory operators e.g. new/delete.

mattip · 2023-05-18T07:33:05Z

This work aims to greatly reduce reliance on the C API, in order to facilitate the complete elimination of C code in the future and count on modern C++.

Is this a goal we have agreed upon and appears in a NEP somewhere? I think some of the design principles of NumPy rely on C and the Python C-API, but we should have a discussion around this. We need to maintain much of the C-API, which parts are we going to remove?

seiko2plus · 2023-05-19T23:17:38Z

Is this a goal we have agreed upon and appears in a NEP somewhere?

No, but this is a defacto once you count on C++ you will keep reducing the reliance on C code, over time C API will become just a wrapper for the C++ code.

We need to maintain much of the C-API

what needs is just avoid breaking the compatibility.

which parts are we going to remove?

In general, any new code should be written in C++ and should run semi-standalone. The rule is simple, just avoid exporting C symbols from C++ code as far as we can.

We need to maintain much of the C-API, which parts are we going to remove?

The current step is to reimplement the current inner loops of umath in fully C++ without exporting any symbols to C.

And the step after (not necessary in the short term) would be re-implement the current UFunc API in C++ and provides a wrapper for C without breaking computability. The purpose mainly provides static optimizations, but the C API wouldn't get benefits from this change only from the C++ side.

…array

mhvk

I went through all the commits mostly out of curiosity, and because with a first taste of C++ in the FFT routines, I thought I might as well try to see how things could be done better. As a results, my comments are probably mostly rather useless (and most are about spelling of comments anyway).

Anyway, mostly just wanted to comment that this looks really nice!

But maybe my general impression is not: overall, this looked really nice! I think I would have understood what was happening here far faster than the time it took to understand the .c.src files. And of course nice not to rely on pre-processing. It also seemed the real tricky parts would be nicely hidden from someone just trying to code up a new ufunc.

I guess the main surprise was to see the inner loop functions replaced by a template with a operator(). Is that just to show how that would look? Is there a big advantage to one or the other? FWIW, to me, the indirection made things less clear, though I think in the end all that is really required is a nice worked example of how to do things.

numpy/_core/src/common/slice.hpp

mhvk · 2024-02-14T01:33:18Z

numpy/_core/src/umath/ufunc_object.hpp

+ *     }
+ * }
+ *
+ * template<typename T>


This is really nice!

Yes but then I changed it to meta objects rather than function pointers due to @seberg concerns, I kept the doc as-is since we still haven't decided to favor one another.

Function pointers:

template<typename T> void Add(np::Slice<T> in0, np::Slice<T> in1, np::Slice<T> &out) {...} UFuncObject add_obj( "add", "this doc of add function", Add<Float>, Add<Double> );

Meta objects:

template <typename T> struct AddOP { void operator ()(Slice<T> in0, Slice<T> in1, Slice<T> &out) {...} }; UFuncObject<AddOP<Float>, AddOP<Double>> add_obj( "add", "this doc of add function" );

note that lambda can be supported too and also both styles above at the same time, however for simplicity of code I left only one.

It would seem best to pick just one. I tried to read the discussion with Stefan more on top, but it left me unclear what the advantage of function pointer vs meta object might be. Is it that one could quite easily use meta objects to wrap old-style ufunc loops?

Anyway, I think both styles will work, and I really like how either allows one to focus on what the loop does rather than on how the iteration works.

I think the point is the meta object should be able to optimize things away (so that the pointer arrays don't need to be copied around on every call). But I am not sure if I understood it right in the call.
This was indeed my main concern: That we for-ever keep translating different function signature styles. If this is all optimized away, then that doesn't matter much.

the advantage of function pointer vs meta object might be

Passing the function pointer as Sebastian explained will add an extra call for each time the inner loop is executed since the C++ function pointer will be stored on C ufunc data arbitrarily while the current approach (Meta-object function) is statically wrapping the C legacy function which allows the compiler to optimize it out.

Choosing function pointers provides more flexibility when it comes to sharing functions among different translation units which I would personally prefer also it improves the readability a bit, later on, we can cut extra calls by letting C++ take the lead over C code and initialize its own PyType_Slot.

Anyway, I'm still happy with meta-objects if this going to resolve this discussion and we can later discuss moving to C++ function pointers once we get rid of template sources and the umath python generator.

Ah, thanks, both, that really helps!

mhvk · 2024-02-14T01:36:06Z

numpy/_core/src/umath/ufunc_object.hpp

+        constexpr size_t idx = Index_<TFunc>(
+            std::make_index_sequence<kNumOverloads>{}
+        );
+        printf("%d\n", idx);


debug statement left in by mistake?

mhvk · 2024-02-14T01:37:50Z

numpy/_core/src/umath/ufunc_object.hpp

+    );
+    static_assert(
+        ((ufunc_helper::FuncDeduce<TFuncOverloads>::kNumOut == kNumOut_) && ...),
+        "Overloaded functions must have the same number of output,"


output -> outputs

mhvk · 2024-02-14T01:38:41Z

numpy/_core/src/umath/ufunc_object.hpp

+    );
+    static_assert(
+        (ufunc_helper::FuncDeduce<TFuncOverloads>::kIsSlice && ...),
+        "Only np::Slice<T> is supported as parmater type at current moment"


at current moment -> at the moment (or at present)

mhvk · 2024-02-14T01:51:51Z

numpy/_core/src/common/npstd.hpp

+  */
+class Object {
+  public:
+    /// Constract from PyObject.


Construct (also below).

mhvk · 2024-02-14T01:58:47Z

numpy/_core/src/umath/ufunc_operations.cpp

+    PyDict_SetItemString(mdict, "_ones_like", ones_like_obj);
+    Py_DECREF(ones_like_obj);
+
+    static UFuncObject copysign_obj(


This really does prove the point how nice this is!

mhvk · 2024-02-14T02:00:11Z

numpy/_core/src/umath/umathmodule.c

    if (InitOperators(d) < 0) {
        return -1;
    }
+    if (InitOperations(d) < 0) {


Maybe rename to InitOperatorsCpp or so - that will help people looking at this understand why there are two similarly named functions (and find the relevant file where they are defined more easily).

mhvk · 2024-02-14T02:04:23Z

numpy/_core/src/umath/ufunc_object.hpp

-        ((std::is_function_v<std::remove_pointer_t<TFuncOverloads>>) && ...),
-        "Expected function pointers"
-    );
+    // static_assert(


Commented out because not sure what the best way is?

mhvk · 2024-02-14T02:05:17Z

numpy/_core/src/umath/ufunc_operations.cpp

-        }
-        else {
-            a = 1;
+struct OnesLikeOP {


Bit unclear, at least just following the commits, why this indirection was made. I liked the first iteration better in that one is just writing an inner loop function, which seems more natural. But I guess this where documentation would come in...

I'm also curious 😸 a meta object of templated functions doesn't seem functionally different from the struct variant, which probably means I'm missing something obvious.

mhvk · 2024-02-14T02:16:35Z

p.s. Over in the FFT routines, we had to deal with exceptions because they could be raised in pocketfft and having seen that, it did make me wonder whether it isn't better and just do that, rather than stick to how things are handled in C. But almost certainly much better not gotten into for this PR!

EDIT: now read a bit of the google style guide and the arguments against exceptions are fairly persuasive... Anyway, definitely ignore this comment for this PR!

This another method for defining the inner loops of UFunc, Fully meta and doesn't add any overheads as long as there's no exported symbols involved e.g. runtime dispatching

jorenham · 2026-01-08T14:21:08Z

@seiko2plus any chance this could be revived? Otherwise we should probably close it.

github-actions bot added the 01 - Enhancement label Aug 25, 2021

seiko2plus force-pushed the cpp_and_umath branch 8 times, most recently from b314ac3 to 3edb342 Compare August 25, 2021 21:44

seiko2plus mentioned this pull request Aug 25, 2021

MAINT: Replace numpy custom generation engine by raw C++ #19713

Merged

rgommers added the C++ Related to introduction and use of C++ in the NumPy code base label Oct 27, 2021

seiko2plus mentioned this pull request Nov 10, 2021

BLD: do not position 'cxx=-std=c++11' as a default compiler flag #20344

Merged

seiko2plus marked this pull request as draft November 10, 2021 18:46

seiko2plus mentioned this pull request Nov 13, 2021

BLD: Verify the ability to compile C++ sources before initiating the build #20354

Merged

seiko2plus mentioned this pull request Dec 4, 2021

ENH: Generate the docstrings of umath into a separate C header #20515

Merged

seiko2plus force-pushed the cpp_and_umath branch 6 times, most recently from a53f2a8 to 37f9bd7 Compare March 16, 2023 16:41

seiko2plus force-pushed the cpp_and_umath branch 8 times, most recently from 471fb0c to bf2c4aa Compare March 28, 2023 20:30

seiko2plus marked this pull request as ready for review May 2, 2023 18:58

seiko2plus force-pushed the cpp_and_umath branch from 9a25ad0 to 2beb25f Compare May 2, 2023 19:06

seberg reviewed May 4, 2023

View reviewed changes

eric-wieser reviewed May 4, 2023

View reviewed changes

seiko2plus force-pushed the cpp_and_umath branch from 2beb25f to a2ba033 Compare May 19, 2023 22:54

seiko2plus force-pushed the cpp_and_umath branch from a2ba033 to 069ed49 Compare May 19, 2023 23:26

Mousius self-assigned this Feb 12, 2024

seiko2plus added 4 commits February 14, 2024 02:48

ENH: Implement np::Slice represents a one-dimensional subset of an …

2a7ef21

…array

ENH: supports copysign for Half

049a754

ENH: expose new C++ headers

0e20dc4

ENH: implement the new template class np::UFuncObject

04117e4

seiko2plus force-pushed the cpp_and_umath branch from 069ed49 to ecfd9f9 Compare February 14, 2024 00:51

mhvk reviewed Feb 14, 2024

View reviewed changes

seiko2plus added 5 commits February 18, 2024 03:16

ENH: Covers most of C API data types

fc1f569

ENH: reimplement copysign/ones_like using C++

0da4f0d

ENH: Use object function rather function pointers

68cfe70

This another method for defining the inner loops of UFunc, Fully meta and doesn't add any overheads as long as there's no exported symbols involved e.g. runtime dispatching

Fix copysign/oneslike doc string

0000ad8

Rename the main umath C++ function initlizer to InitCPPOperations

2574dd1

seiko2plus force-pushed the cpp_and_umath branch from ecfd9f9 to 2574dd1 Compare February 18, 2024 01:20

matthewfeickert mentioned this pull request Feb 19, 2024

simple index suddenly stopped working for scipy-openblas32, scipy-openblas64 scientific-python/upload-nightly-action#65

Closed

mhvk mentioned this pull request Mar 18, 2024

BUG: fix logic error in stringdtype maximum/minimum ufunc #26071

Merged

ngoldbaum mentioned this pull request Jun 17, 2024

ENH: C++ namespace and function overrides for npy_math #26729

Open

jorenham added the 57 - Close? Issues which may be closable unless discussion continued label Jan 8, 2026

Uh oh!

Conversation

seiko2plus commented Aug 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seiko2plus commented Nov 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seiko2plus commented May 2, 2023

Uh oh!

seberg commented May 2, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented May 18, 2023

Uh oh!

seiko2plus commented May 19, 2023

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk commented Feb 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorenham commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

seiko2plus commented Aug 25, 2021 •

edited

Loading

seiko2plus commented Nov 10, 2021 •

edited

Loading

mhvk commented Feb 14, 2024 •

edited

Loading