ENH, API: Add `sort_compare` slot to DType and use sort wrappers if provided #29987

MaanasArora · 2025-10-16T03:47:58Z

This follows #29900 and #29737 to add a sort_compare slot to the DType, which can be used to enable sorting for user dtypes without completely redefining the sort. Also uses it in StringDType to allocate the lock just once and thus fixes #26510.

Also, there are now wrappers for the existing sorts, so it is probably easier to move to the array method API later. However doing so would require templating the SIMD sorts and doing other infrastructure gymnastics, so not done here. This does not implement descending, which can actually also be done as a follow-up if we keep the API the same and just flip the comparison twice.

This still needs release notes and docs but wanted to push this before to clarify direction. Unfortunately, it is quite verbose, especially for StringDType. ping @seberg @mhvk @ngoldbaum if you're interested - thanks!

MaanasArora · 2025-10-16T03:49:34Z

numpy/_core/src/multiarray/array_method.c

+                //         "must provide a `resolve_descriptors` function if any "
+                //         "output DType is parametric. (method: %s)",
+                //         spec->name);
+                // return -1;


This should not be merged!!! I noticed we cannot actually not pass resolve_descriptors in most cases, at least if the dtype is parametric (since it's passed as output)! Should we special case somehow?

MaanasArora · 2025-10-16T03:53:47Z

numpy/_core/src/multiarray/stringdtype/dtype.c

+    }
+    NPY_DT_SLOTS(stringdtype)->argsort_meth = argsort_method->method;
+    Py_INCREF(argsort_method->method);
+    Py_DECREF(argsort_method);


This slot is already assigned - we probably need to decref the existing method, but I wonder if we should also check in PyUFunc_AddLoopsFromSpecs? Though one doesn't usually define both the compare and the sort...

…methods

mhvk

@MaanasArora - I just had a quick glance, to see how this would look. Overall, it makes sense and looks good. Let me know when it is ready for a more detailed review.

…ction

MaanasArora · 2025-11-11T04:10:56Z

Thanks for reviewing @mhvk! I've added docs and done some cleanup - I think this should be ready now for more detailed review.

I suppose a main part of the API that needs discussion is how to implement descending sorts (with NaNs to the end):

We could do the 'double flip' idea that @charris suggested, i.e. reverse array once before sorting and once after. In practice, if I'm understanding correctly, this can simply mean writing new descending sort loops that are identical except they do cmp(b, a, arr) > 0 instead of cmp(a, b, arr) < 0 (swap a, b, and sign). In this can we can simply pass the descriptor and don't need to change the API.
We could pass the PyArrayMethod_Context into the sort_compare rather than the descriptor, and let the function be a 'should swap' function. This is probably more flexible if we need to add more features to the slot or if we want custom descending behavior (which we probably don't?).

I've implemented this PR such that we could do the first in a follow-up (i.e. I use the descr), but happy to change direction if we prefer the second.

charris · 2025-11-12T03:36:54Z

reverse array once before sorting and once after.

That was to make stable descending sorts, but it doesn't fix the NaN problem, i.e., NaNs will still sort to the beginning. Which I suppose might be correct, but ... I think we need to settle what descending stable sorts look like and where the NaNs go. Sounds like an array-api discussion. The double reversal keeps the index order for elements with the same value.

seberg · 2025-11-12T08:26:55Z

I think we need to settle what descending stable sorts look like and where the NaNs go.

We think we have already settled that one pretty much in the earlier API discussions. Happy to bring it up there, but I don't think it is going to add anything, except a shrug and agree with my opinion. The point is, array libs have no clue about this anyway, you have to look at dataframes/tables for prior art and there both versions exists (but my feeling is our version is more common in newer API).

"Our version" was to specify where NaNs end up irrelevant of the sort order. This is the better user API, IMO, it also means that top_k/bottom_k == sort(x, asencing=True/False)[:k].
Unfortunately, from a sort-compare implementation point of view the opposite logic is actually a bit more obvious (i.e. say that NaN is larger than inf or smaller than -inf).

Anyway, I at least consider the discussion of how to handle NaNs by default as settled.

MaanasArora · 2025-11-12T11:09:44Z

That was to make stable descending sorts, but it doesn't fix the NaN problem, i.e., NaNs will still sort to the beginning.

Ah yes, sorry, NaNs will still get flipped, which means passing the context is necessary to enable sort compare to respond to descending. I'll do that now.

…enable descending sorts

MaanasArora · 2025-11-12T11:35:59Z

Sorry, just one more question: is there perhaps a dtype that can be used to test this PR? With the most recent refactor, StringDType actually does not need to use the indirection to sort compare, so if preferred I can remove it before more review (and do it in a follow up).

ngoldbaum · 2025-11-12T16:28:44Z

Sorry, just one more question: is there perhaps a dtype that can be used to test this PR?

We only really have the scaled float dtype for testing inside numpy. If one of the dtypes in the user-dtypes repo won't work, then this whole discussion is a little moot until someone writes a dtype that needs the functionality.

I haven't been following as closely as I'd like, so I'm not sure offhand what sort of dtype you'd need.

MaanasArora · 2025-11-12T17:02:26Z

If one of the dtypes in the user-dtypes repo won't work, this whole discussion is a little moot until someone writes a dtype that needs the functionality.

This functionality is already part of the arrfunc machinery (as NPY_DT_PyArray_ArrFuncs_compare) and the user-dtypes actually use it presently - so they do work for testing! I was just wondering if we'd like to add tests within NumPy (for CI for example), but it is completely fine if not.

This PR is basically a migration of the older compare slot to the DType and to use the new array methods for sorting, as far as I understand.

seberg · 2025-11-12T17:03:22Z

Sorry, just one more question: is there perhaps a dtype that can be used to test this PR?

We could rewire object and/or void (of course for void it would be nice to have setup code as well), but yeah, ScaledFloat is the only other one as a playground for this.

ngoldbaum · 2025-11-13T19:30:04Z

I was just wondering if we'd like to add tests within NumPy (for CI for example), but it is completely fine if not.

I think it would be nice to have a testing job that checked out the user-dtype repo and tried to build it and run the unit tests inside it with the latest numpy version.

Totally not necessary to do that to merge this PR though.

MaanasArora · 2025-11-27T04:39:57Z

Closing this after #30266 was merged - going to break this into two! Thanks.

github-actions bot added the 01 - Enhancement label Oct 16, 2025

MaanasArora commented Oct 16, 2025

View reviewed changes

ENH, API: Add sort_compare slot to DType and sort wrappers if provided

5c5ad5a

MaanasArora force-pushed the enh/sort-compare-slot branch from bdae010 to 5c5ad5a Compare October 16, 2025 04:07

FIX: Correct memory deallocation function calls in wrap_default_sort_…

8b091a8

…methods

MaanasArora mentioned this pull request Oct 31, 2025

ENH: New-style sorting for StringDType #30112

Closed

MaanasArora added 2 commits November 7, 2025 03:32

BUG: Fix uninitialized method_flags in PyArray_(Arg)Sort

6ccdc38

FIX: DECREF existing sort_compare based method in stringdtype sorts

52d77b1

seberg self-requested a review November 7, 2025 10:38

mhvk reviewed Nov 11, 2025

View reviewed changes

MaanasArora added 3 commits November 10, 2025 22:00

FIX: Update variable name for clarity in stringdtype_sort_compare fun…

f8eb545

…ction

REF: Wrap new-style generic sort loops in npysort module

5a9d2a5

DOC: Document NPY_DT_sort_compare slot [skip ci]

bac9933

ENH: Update sorting comparison functions to use context structure to …

86d8f02

…enable descending sorts

MaanasArora closed this Nov 27, 2025

MaanasArora mentioned this pull request Dec 11, 2025

ENH: Add sort_compare dtype slot to register new-style sorts #30415

Open

Uh oh!

ENH, API: Add sort_compare slot to DType and use sort wrappers if provided #29987

ENH, API: Add sort_compare slot to DType and use sort wrappers if provided #29987

Uh oh!

Conversation

MaanasArora commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaanasArora Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

MaanasArora Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

MaanasArora commented Nov 11, 2025

Uh oh!

charris commented Nov 12, 2025

Uh oh!

seberg commented Nov 12, 2025

Uh oh!

MaanasArora commented Nov 12, 2025

Uh oh!

MaanasArora commented Nov 12, 2025

Uh oh!

ngoldbaum commented Nov 12, 2025

Uh oh!

MaanasArora commented Nov 12, 2025

Uh oh!

seberg commented Nov 12, 2025

Uh oh!

ngoldbaum commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaanasArora commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ENH, API: Add `sort_compare` slot to DType and use sort wrappers if provided #29987

ENH, API: Add `sort_compare` slot to DType and use sort wrappers if provided #29987

MaanasArora commented Oct 16, 2025 •

edited

Loading

ngoldbaum commented Nov 13, 2025 •

edited

Loading