Skip to content

Conversation

@MaanasArora
Copy link
Contributor

@MaanasArora MaanasArora commented Mar 14, 2025

Resolves #26510.

Allocates the lock for the StringDType array before sort and releases after.

I noticed the sorting algorithms independently get the compare function from the descriptor, so I have created new helper functions in stringdtype/dtype.c but not sure if that's the right place. Changes have been made only in quicksort.cpp (but will add others later), so this is a draft but would appreciate feedback.

Posting simple benchmarks in a comment below. Thank you for reviewing!

@MaanasArora
Copy link
Contributor Author

Running this script on both branches:

import numpy
import random
import timeit

print(numpy.__version__)  # '2.0.0rc2'

options = ["a", "bb", "ccc", "dddd"]
lst = random.choices(options, k=1000)
arr_s = numpy.fromiter(lst, dtype="T", count=len(lst))

print(timeit.timeit(lambda: numpy.unique(arr_s), number=10000))

produces on master:

2.3.0.dev0+git20250310.c275e25
3.481267879999905

and on this branch:

2.3.0.dev0+git20250314.0eb0c8e
1.1663875859994732

@seberg
Copy link
Member

seberg commented Mar 25, 2025

@ngoldbaum just to let you know, I'll let you decide on whether you want this. I am starting to think it is time to implement a get_sortfunction slot but that doesn't mean we can't do this in the mean-time as it's a pretty big speed advantage.

@ngoldbaum
Copy link
Member

I am starting to think it is time to implement a get_sortfunction slot

Agreed. @MaanasArora as you said this was a draft, would you be up for a bigger refactor? IMO this functionality deserves support in the new DType system without relying on the ArrFuncs baggage.

but that doesn't mean we can't do this in the mean-time as it's a pretty big speed advantage.

Also agreed if you don't want to take this further.

@MaanasArora
Copy link
Contributor Author

MaanasArora commented Mar 25, 2025

Yes, agreed, and willing to do a larger refactor! I actually began by considering special casing array sorting for strings overall, but wondered what the preferred approach would be. I think the sorting routines are not very flexible and could use an overhaul.

PS just to clarify, adding a slot to the dtype will mean we will restructure the sorting to be more generic and allow replacing or extending compare etc.? I'll look further into this but would appreciate pointers.

@ngoldbaum
Copy link
Member

ngoldbaum commented Mar 25, 2025

adding a slot to the dtype will mean we will restructure the sorting to be more generic and allow replacing or extending compare etc.?

Take a look at numpy/_core/src/multiarray/dtypemeta.h - I'm talking about adding a new entry in NPY_DType_Slots that handles comparison. Adding a new slot takes some ceremony - there are some magic constants doing offsets on structs elsewhere in NumPy that need to be updated alongside any changes - but there are comments that should hopefully guide you along your way.

We already have entries for getitem and setitem as well as the legacy arrfuncs slots. You'd be migrating from using NPY_DT_SLOTS(dtype)->f->compare to some new api like NPY_DT_COMPARE(dtype) that uses its own slot and allows per-dtype setup for sorting.

@seberg
Copy link
Member

seberg commented Mar 25, 2025

PS just to clarify, adding a slot to the dtype will mean we will restructure the sorting to be more generic and allow replacing or extending compare etc.? I'll look further into this but would appreciate pointers.

Yes, if you look around, you will find for example get_clear_loop and the way users can specify it. I think sorting would look similar.
(I.e. the core sort loop probably always works on a contiguous chunk of memory that is aligned -- that part may be simpler would have to check. I assume it would work in-place, but we will also need an argsort.)
The get_sort_function -- or what we name it -- would then get the desired sort-kind passed in, that way we will also have an easier time of adding new sort methods in the future.
(We may want to provision for an ascending/descending flag even if we don't use it).


I see nathan has some other pointers, I don't expect mine to be quite enough, so please ask!

@MaanasArora
Copy link
Contributor Author

Thank you both, this was helpful! Starting to plan this now and will surely clarify if needed.

@MaanasArora
Copy link
Contributor Author

I've added the slots and done some patchy work around using it, and the stringdtype integration. Looking into how to better relate to the array funcs. WIP, but hopefully this is in the right direction!

@ngoldbaum
Copy link
Member

Yeah, I think you see we still have a lot of other functionality that we should have slots for. Definitely nonzero, at least.

@MaanasArora
Copy link
Contributor Author

Yes, nonzero and some other arrayfuncs could definitely use a slot!

Thank you for the guidance--I've completed most of the missing pieces I think. I assume we would deprecate some of the earlier uses more gradually, so I've fallen back on array funcs as defaults in some places. I think this should be ready for a first pass!

}

return 0;
}
Copy link
Contributor Author

@MaanasArora MaanasArora Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a lot of somewhat redundant code, but I added it here as a 'test' use which provides a boilerplate to replace the routines with a more efficient special-case indirect sort in a future PR. As a "bonus" it allows us to at least temporarily do the allocation this PR was intended for

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'll let @seberg evaluate whether this API needs adjustments to fit in with the broader DType API but at a first glance it looks reasonable to me, especially if the common case with no specializations can be done with less boilerplate.

That said - there definitely are specialized string sorting implementations we could be using here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, the thing that I would like to change is the PyArray_(Arg)SortFunc itself, so that it gets a context and auxdata instead of the "array". And this function would get NpyAuxData *out_auxdata in also.
(Although, maybe for sorting this is less interesting as it is not as common to sort many arrays in one.)

The unfortunate thing is that you need to wrap the existing functions if you do this or have a second path for the old function.

I am also considering if we should have a return -2 or so to indicate that the sort-kind is not supported (no error set), to allow NumPy to fall back to a different one.
(But I am not sure we need it, it is useful only for somtehing like mergesort/stablesort, explicitly.)
@charris may have a thought on that.

Copy link
Contributor Author

@MaanasArora MaanasArora Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into passing context now! It looks like a good idea, will try to implement. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would be really nice; unfortunately both the PyArray_CompareFunc and the sort functions use the PyArrayObject right now, which we probably (!) shouldn't get from the context.

If I'm thinking right, we can define a new type such as PyArray_SortCompareFunc that uses the descr instead of the array and make new sort functions that do not use the array somehow (as we can no longer interchange the SortCompareFunc and the CompareFunc!), but we would still probably need the old functions to use with the older compare functions; I think the duplication will be quite complicated.

At the same time, this would be a missed opportunity to have the new CompareFunc type if we do deprecate later and want to go down this route...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two possible approaches here:

  1. We deal with it in the sort function, and just have to different calls depending on whether it is an old or new sort function.
  2. You ignore the fact that an array is currently passed (effectively). We do that in some other places as well, due to how terrible it is.
    That is, we wrap it into a dummy object for which basically the only valid field is arr->descr (and maybe arr->flags, don't recall). (See get_dummy_stack_array, yes this is terrible and even reviewers stumble over it, but...)
    If you do that, you can write a short function that wraps the old call into a function taking the new one.

I did the second for ufuncs (not sure if that was the easier!), so I suspect the first is likely the simplest here.


Allowing to set a compare function, seems like a nice idea (also to have a simple default).

It would be nice to move that into a default slot function. I.e. rather than setting it for StringDType here, auto-fill the slot with the function that tries to use the SortCompareFunc (If that slot is undefined, we can keep the slot filled with NULL).
That also removes the second check later.


About this SortCompareFunc, it may make sense to keep it "light-weight" (i.e. a single function even if that may not be ideal if you have to inspect the dtype to do the comparisons -- for example structured has to do this).

But I would like to think about what we need to sort things like NaNs if possible. Unfortunately, I am not immediately sure, i.e. <=> in C++ can return a partial order, which means that:

  • For us probably an "error" is a valid return (right now we can't propagate errors!).
  • "unordered" is a valid return, although I am not sure how to deal with it. If we have "compare(a, b) == unordered" (i.e. one or both are NaN), we don't yet know how to swap them. That may be possible to resolve with compare(b, b) and compare(a, a).
    But the only way I am quite sure how to resolve possible reversed sorting, etc. might be to have unordered_left, unordered_both, unordered_right.

Or we should keep it roughly as is, and accept that this function doesn't exist for all dtypes... A neat thing about having a clear order with "unordered" is that we could also use it from the comparison (u)funcs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try the first approach I think! I think it will help isolate the new API in a way that makes deprecation easier later too. I will also do this default logic for both compare and sort compare funcs.

As for allowing partial order: yes that could break the symmetry, I suppose. But it could be useful to make sorting more precise in the long run too, and so "unordered" does seem a better way to allow for those kinds of extensions. And then we can use this machinery as the go-to for anywhere comparison decisions need to be made as you said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an update: after some thought, I think it might be quite nice to go with unordered_left, unordered_both, unordered_right, mainly because it saves us any issues with reverse sorting down the line, might as well get everything in, especially as you mentioned dataframes and that's clearly a very important use case. Working on this! I'll try to draft an API that can easily fill in 'defaults' somehow, so that the user-facing side can be used at different levels of complexity / customization.

Copy link
Contributor Author

@MaanasArora MaanasArora Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a draft for this! Includes reversed (currently not functional, but working on that) and nan_position added into the context.

I've also restructured the context / legacy wrappers and updated the SortCompareFunc to use the sorting context instead of the descr; it should allow richer features down the line.

npy_intp, int, PyArray_SortFunc **);
typedef int *(PyArrayDTypeMeta_GetArgSortFunction)(PyArray_Descr *,
npy_intp, int, PyArray_ArgSortFunc **);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New stuff in the public API needs new API docs as well as a release note describing the new features.

Maybe also as a proof-of-concept, it looks like both quaddtype and mpfdtype in numpy-user-dtypes implement sorting - would you be willing to update them to use the new API in a PR to numpy-user-dtypes that depends on this PR to numpy? That should give you a feeling for whether this API is helpful for someone writing a new user dtype. It'll also be a form of documentation - we don't have great docs for writing user dtypes besides the examples in numpy-user-dtypes.

Copy link
Member

@ngoldbaum ngoldbaum Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what should we do about the flags that got added before we made the dtype API public, e.g. NPY_DT_PyArray_ArrFuncs_compare? I guess we can deprecate them although I don't know how hard it would be to generate deprecation warnings if those are used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to generate a deprecation warning during registration (a bit tedious maybe, as you need explicit check).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add API docs and a release note, and willing to make a PR to numpy-user-dtypes! Will look into that.

Just to be clear, NPY_DT_PyArray_ArrFuncs_compare is still needed right? We can move it to a new slot rather than an arrayfunc but it's going to be different from the sort comparison for now if I'm thinking right (as it is user-facing rather than used in the sorting). Do we need to do this another way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't change slot numbers (unless they are guarded as private)! So the numbers are fixed (until they have not been used for a bit at least).
So yeah, I think we should keep it the old slot for now, maybe easier to make the deprecation a follow up.[^depr]

So, we just have to live with the numbering we got, I half thought I asked for an offset for the NPY_DT_PyArray_ArrFuncs slots, but maybe I didn't bother.
(It's not a big issue, the only thing is the convenience if slot numbers == slot offset so you don't need to translate it.)

[^depr] I think this is as simple as asking users to compile with the new NumPy, and then adding PyArray_RUNTIME_VERSION, but this PR is complicated enough due to API decisions for the new loops, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we just have to live with the numbering we got, I half thought I asked for an offset for the NPY_DT_PyArray_ArrFuncs slots, but maybe I didn't bother.

There is an offset, _NPY_DT_ARRFUNCS_OFFSET:

#define NPY_DT_MAX_ARRFUNCS_SLOT \
NPY_NUM_DTYPE_PYARRAY_ARRFUNCS_SLOTS + _NPY_DT_ARRFUNCS_OFFSET

}

return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'll let @seberg evaluate whether this API needs adjustments to fit in with the broader DType API but at a first glance it looks reasonable to me, especially if the common case with no specializations can be done with less boilerplate.

That said - there definitely are specialized string sorting implementations we could be using here.

}

static inline PyArray_CompareFunc *
PyArray_SortCompare(PyArray_Descr *descr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this PyArray_GetSortCompareFunction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, made this change. Thanks

* additional minor optimization skips the recursion depth checking on the
* smaller partition as it is always less than half of the remaining data and
* will thus terminate fast enough
* will thus terminate fast enough`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was added by mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry!

@MaanasArora MaanasArora changed the title ENH: Allocate lock only once in StringDType quicksort ENH, API: New sorting mechanism for DType API Mar 28, 2025
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating so quickly :)

I think the new error path needs a little more thought, sorry for the back-and-forth.

sort-kind and order.

Additionally, the new `NPY_DT_sort_compare` slot can be used to provide a comparison function for
sorting, which will replace the default comparison function for the dtype in sorting functions. No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a note that the old arrfuncs slots may be deprecated in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

#define NPY_DT_sort_compare 11
#define NPY_DT_get_clear_loop 12
#define NPY_DT_get_fill_zero_loop 13
#define NPY_DT_finalize_descr 14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you re-order these so the slots that were already in the struct keep their old values? I don't know offhand if changing this order is problematic but it seems more consistent to not change the old values even if it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done!

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, long comments, and I realize this is becoming a lot more complex than it may have looked initially. But, I would really like to see the context passed in/a new signature.

That also probably includes filling in/returning ARRAY_METHODFLAGS, even if the only useful flag is "requires GIL".

I also still tend to think it may make sense to have a magic return for "unsupported sort method", although should maybe ask Chuck once in a meeting about that.
(in principle I agree we usually just need stable and not-stable, but if we want users to be able to choose more precisely, I think it may make sense to allow us to fallback here. We could still use something like "no error indicated, but func == NULL for it even, but maybe a special return is easier.)


If needed, maybe we have to talk briefly about it synchronously? Or maybe just write the docs/signatures first that we want for the public API.

@@ -0,0 +1 @@
* `PyArray_GetSortFunction`, `PyArray_GetArgSortFunction`, and `PyArray_GetSortCompareFunction` have been added to the C-API. These functions return the sorting, argsorting, and sort comparison functions if provided for a given dtype in new slots. No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did not actually add them to the public C-API. Which is totally fine, though.

(I might start with adding a SortBuffer() function or so.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is still valid, I think we might as well remove this for now.

}

return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two possible approaches here:

  1. We deal with it in the sort function, and just have to different calls depending on whether it is an old or new sort function.
  2. You ignore the fact that an array is currently passed (effectively). We do that in some other places as well, due to how terrible it is.
    That is, we wrap it into a dummy object for which basically the only valid field is arr->descr (and maybe arr->flags, don't recall). (See get_dummy_stack_array, yes this is terrible and even reviewers stumble over it, but...)
    If you do that, you can write a short function that wraps the old call into a function taking the new one.

I did the second for ufuncs (not sure if that was the easier!), so I suspect the first is likely the simplest here.


Allowing to set a compare function, seems like a nice idea (also to have a simple default).

It would be nice to move that into a default slot function. I.e. rather than setting it for StringDType here, auto-fill the slot with the function that tries to use the SortCompareFunc (If that slot is undefined, we can keep the slot filled with NULL).
That also removes the second check later.


About this SortCompareFunc, it may make sense to keep it "light-weight" (i.e. a single function even if that may not be ideal if you have to inspect the dtype to do the comparisons -- for example structured has to do this).

But I would like to think about what we need to sort things like NaNs if possible. Unfortunately, I am not immediately sure, i.e. <=> in C++ can return a partial order, which means that:

  • For us probably an "error" is a valid return (right now we can't propagate errors!).
  • "unordered" is a valid return, although I am not sure how to deal with it. If we have "compare(a, b) == unordered" (i.e. one or both are NaN), we don't yet know how to swap them. That may be possible to resolve with compare(b, b) and compare(a, a).
    But the only way I am quite sure how to resolve possible reversed sorting, etc. might be to have unordered_left, unordered_both, unordered_right.

Or we should keep it roughly as is, and accept that this function doesn't exist for all dtypes... A neat thing about having a clear order with "unordered" is that we could also use it from the comparison (u)funcs.

NPY_SORTKIND which, int descending, PyArray_SortFunc **out_sort)
{
if (NPY_DT_SLOTS(NPY_DTYPE(descr))->get_sort_function == NULL) {
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to set an error (TypeError or DTypeTypeError, which is defined somewhere I think.)

(An error here will make sense after you move the fallback logic into a default slot filling. Or you could have the default slot raise an error, that is also completely fine.)

@MaanasArora
Copy link
Contributor Author

No worries; thank you for the detailed feedback actually! It's nice to be able to iron out the direction for the API. I'll address the docs and public API changes and keep working away at the SortFunc changes. Happy to have a synchronous chat if it seems useful.

typedef PyObject *(PyArrayDTypeMeta_GetItem)(PyArray_Descr *, char *);

typedef int (PyArray_CompareFuncWithDescr)(const void *, const void *,
PyArray_Descr *);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is a bit weird here, but I didn't want to disturb the original type as it's used a lot. I think the SortCompareFunc should still be a unique type so will do that (even if only a clone of this type).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have slightly mixed feelings. On the one hand, I think this is the pragmatic thing to have.
On the other hand, we could also look this function from the np.less_than or np.great_than ufunc to implement sorting, I think.
(The problem there is still how to deal with unordered elements, a compare ufunc would work better...)

But, on the other hand, it seems pragmatic even if it won't work well e.g. for structured dtypes (performance issues), it will always work and provides an easy entry-point (we can also use this to define default comparison ufuncs).

So overall, I think I end up at just doing this, although I could imaging punting if we don't need it for StringDType (I suspect we do, though).

Would like to hear if @ngoldbaum has an opinion.

(A neater future path would also be if this was more of a header-only code binding generator job with us making the sorting patterns available maybe. I.e. if this was defined in a C++ class and our sort code available, the DType could compile the full loop and avoid calling such a helper everywhere.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is fine, if only because it exists right now 😄

@MaanasArora
Copy link
Contributor Author

MaanasArora commented Apr 5, 2025

Sorry for the bit of delay, I was thinking through this and essentially ended up with separating more the legacy sorting machinery from this API. This way, the new signatures can freely use context-related features and we do not have to create some sort of empty array or refactor a very large number of sorting-related files. Aside from how nice the feature is, I think this separation is actually a plus and even rolling back the stringdtype integration was worth it (in any case, user dtypes are not to define sorting with the internal, now legacy functions, so may be best to add a specialized routine).

We also have the new compare slot which defaults to sort_compare and related features now, though they're not used yet. Hopefully this is in the right direction. If it is, we can gradually move the older sorting machinery to the context signatures, thus converging. Thank you!

@ngoldbaum
Copy link
Member

Sorry for not getting to this yet. I'm going to try to make sure to give this a once-over next week.

I think you can fix the test failures by rebasing?

@MaanasArora
Copy link
Contributor Author

No worries, I have some things to address as well.

Just rebased--sorry not sure if things went perfectly smoothly.

@MaanasArora
Copy link
Contributor Author

MaanasArora commented Apr 11, 2025

Just brought this implementation with the new signature to parity with the previous one, including the usage in StringDType and ensuring use cases for the new and old sort functions are handled properly in the handlers. There is repetitive code but I guess we will phase out the legacy slots. Now we can make the context nicer if needed, and incorporate the auxdata!

@MaanasArora MaanasArora force-pushed the enh/faster-string-sorting branch from d7cd9ed to 0e8b6a5 Compare April 11, 2025 21:29
@MaanasArora
Copy link
Contributor Author

MaanasArora commented May 8, 2025

Hello! Getting back to this, is there anything I need to address? Thinking of adding the functions to the public C-API if things look fine.

Would we need to create a new C-API version (regenerate the hashes and such), and I guess it would come under 2.4, given how close 2.3 is?

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, yeah, this won't make 2.3, sorry. I think it might be good to discuss a bit in depth with @ngoldbaum some time (not next week, sorry).

Another thing that I would like addressed/discussed is the problem of reverse sorting.
I do think we need at least a reverse=True, I think it might make sense to also provision for a nan_position (if nan goes first or last).

(NULL/NA ordering is very important in dataframe world, and I am tempted to include this, even if we say that the value for now is always "last").

embedded references.
.. c:type:: int (PyArray_SortFunc)( \
void *start, npy_intp num, PyArrayMethod_Context *context, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the context to the first spot just for similarity. I think I added a context for traversal functions, I am not sure that was smart, but since we have it, it may be a slightly better fit.

I might call start, data (not that it matters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done!

NpyAuxData *auxdata, NpyAuxData **out_auxdata)
A function to sort a buffer of data. The *start* is a pointer to the
beginning of the buffer containing *num* elements. A function of this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
beginning of the buffer containing *num* elements. A function of this
beginning of the contiguous containing *num* elements. A function of this

It also should be aligned, but I have to think whether we should allow indicating support for unaligned data here. (Which would require flags, for ufuncs "supports unaligned" is flagged before get_loop(), although since here we always do contiguous, flagging it inside get_loop() is OK also -- that is, becuase get_loop() is not passed any strides).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Committed (modulo typo).

NpyAuxData **out_auxdata)
A function to arg-sort a buffer of data. The *start* is a pointer to the
beginning of the buffer containing *num* elements. The *tosort* is a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charris to confirm, even for argsorting it probably makes sense to always use a contiguous buffer for sorting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some, but we don't sort the data being argsorted, so it will eventually be accessed out of order in any case if the data is random. All this is to say, it depends :)

Comment on lines 1888 to 1889
PyArrayMethod_Context *context, NpyAuxData *auxdata, \
NpyAuxData **out_auxdata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PyArrayMethod_Context *context, NpyAuxData *auxdata, \
NpyAuxData **out_auxdata)
PyArrayMethod_Context *context, NpyAuxData *auxdata)

The out_auxdata belongs on the get_loop function!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done, thank you :)

.. c:macro:: NPY_DT_get_sort_function
.. c:type:: int *(PyArrayDTypeMeta_GetSortFunction)(PyArray_Descr *, \
npy_intp sort_kind, int descending, PyArray_SortFunc **out_sort);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs *out_flags, since we need the ability to indicate whether the GIL is required (I think we can ignore FPEs), but who knows if we'll have a reason for other flags eventually.

It also needs **out_auxdata, since auxdata needs to come from somewhere :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

}

sort = PyDataType_GetArrFuncs(PyArray_DESCR(op))->sort[which];
PyArray_GetSortFunction(PyArray_DESCR(op), which, 0, &sort);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, let's just use < 0 to decide if it's an error. In which case sort != NULL is assumed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, this is done!

typedef PyObject *(PyArrayDTypeMeta_GetItem)(PyArray_Descr *, char *);

typedef int (PyArray_CompareFuncWithDescr)(const void *, const void *,
PyArray_Descr *);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have slightly mixed feelings. On the one hand, I think this is the pragmatic thing to have.
On the other hand, we could also look this function from the np.less_than or np.great_than ufunc to implement sorting, I think.
(The problem there is still how to deal with unordered elements, a compare ufunc would work better...)

But, on the other hand, it seems pragmatic even if it won't work well e.g. for structured dtypes (performance issues), it will always work and provides an easy entry-point (we can also use this to define default comparison ufuncs).

So overall, I think I end up at just doing this, although I could imaging punting if we don't need it for StringDType (I suspect we do, though).

Would like to hear if @ngoldbaum has an opinion.

(A neater future path would also be if this was more of a header-only code binding generator job with us making the sorting patterns available maybe. I.e. if this was defined in a C++ class and our sort code available, the DType could compile the full loop and avoid calling such a helper everywhere.)

@MaanasArora
Copy link
Contributor Author

MaanasArora commented May 10, 2025

Thanks for the comments! And no worries, I was just making sure I wasn't missing something to do :)

I think I need to think a bit more about the best way to adjust this for the extra features you mentioned, yes. Unordered elements is definitely something to consider at this stage, so I might try to draft something for that soon enough; that should hopefully create a clearer story around these features!

@MaanasArora MaanasArora force-pushed the enh/faster-string-sorting branch from 39f5ef2 to 237b7f0 Compare May 13, 2025 22:39
@ngoldbaum
Copy link
Member

I want to call your attention to this suggestion: #28516 (comment).

Did you ever take a look at numpy-user-dtypes? A worked example would help.

@MaanasArora
Copy link
Contributor Author

Yes, sorry, I took a look actually--but was having some trouble with installing the dtype packages over the editable install of numpy. I'll push my draft anyway and try to address that. Thanks for the reminder.

allows defining custom argsorting implementations for each of the
sorting algorithms numpy implements.
sorting algorithms numpy implements. If `NPY_DT_get_argsort_function`
is defined, it will be used instead. This slot may be deprecated in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it need to be defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are dtype slots, so they will be defined with the user dtype. Here is the example for StringDType.

Indicates that NaN values should be sorted to the end.
.. c:enum:: NPY_COMPARE_RESULT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, C enums may be unsigned depending on whether they are explicitly defined. Are these actually enums, or just here for markup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are actually enums, defined in dtype_api.h!

NPY_UNORDERED_BOTH = 4,
NPY_COMPARE_ERROR = 5,
} NPY_COMPARE_RESULT;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the comparison function need to handle all those values? What if it doesn't care? The current sorts only require NPY_LESS, and only use the (-1, 0, 1) sequence where that is conventional for code that comes from outside of Numpy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps it doesn't, though the main issue comes up in nan sorting (as in dataframes) as was mentioned by @seberg. If we don't include this, NaNs would be treated consistently in both ascending and descending sorts, causing their position to be flipped, if that is a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see making sure that -1, 0, 1 retain their meaning for simplicity. You could also just not bother here and say if you want it you need to use the more complicated API...

Copy link
Contributor Author

@MaanasArora MaanasArora Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this mean certain user dtypes (that choose not to bother) would not support NaN order sensitive sorting? Should we allow to indicate that somehow?

Copy link
Member

@seberg seberg Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so dunno if it's good. Maybe the best thing is to pass in the information, then the choice to ignore it is rather obvious :).
(I.e. return -1, 0, 1, and pass in where to put NaN... unfortunately, it would be "value of NaN" style not "start/end", but "smallest/largest value")

Another thing is, @charris do you feel we should prepare for nan_order="undefined" thing?!

EDIT: I suppose maybe passing it in but returning -1, 0, 1 is the best?


typedef int (PyArray_SortFuncWithContext)(PyArrayMethod_SortContext *,
void *, npy_intp,
NpyAuxData *);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is in NpyAuxData?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any additional information that the sorting may need to access. I suppose it's for future compatibility, if say, we need to add dtypes that use auxiliary data in sorting or even sort multiple related arrays in some way, but @seberg would probably have a better idea of the use cases he thought of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auxdata is what all casts (and ufuncs) are passed. It allows holding on to temporary data. E.g. casts need it for structured dtypes (and this could make sense here in the same way).

@charris
Copy link
Member

charris commented Aug 15, 2025

It would be helpful to document how this apparatus is intended to operate. Maybe with some diagrams with arrows tracing a call.

@charris
Copy link
Member

charris commented Aug 20, 2025

I have a somewhat different concept as to how this should be handled. I would just add two slots that take function dispatchers that get passed the context and return the relevant sorting implementation, if there is one, null otherwise.

The context would also contain a stable variable, maybe an inplace variable as well, and leave out the compare function (an implementation detail) and maybe be versioned? The returned function would have the comparison function built in (there might be another level of indirection here), making it an implementation detail (sorting comparison, not the dtype comparison). Essentially, this keeps most of the sorting implementation private. The net effect would be to make it possible to extend sorting functionality by adding to the context structure.

I don't think this is very different from what is here at the lower level, but the upper level is somewhat different. It just adds a level of indirection. For reuse we probably need access to all the numpy builtin dtypes, so that their sorting functions can be called from user dtypes if needed. Maybe a toplevel function dispatcher? We currently have builtin introsort, heapsort, timsort, and radix sort, do we want to expose those to users? We should add to those decending sorts and the nan options.

The main idea here is the explicit use of function dispatchers, and maybe an explicit distinction between builtin and user. The goal should be simplicity for the average user, with detailed selection for experts who want it. I am reminded of our random module in that regard.

I think discussion should be about the top level of this. The original idea for making the builtin sorting functions a library was to make it easy to add them to the existing dtype slots.

@seberg
Copy link
Member

seberg commented Aug 22, 2025

I would just add two slots that take function dispatchers that get passed the context and return the relevant sorting implementation, if there is one, null otherwise.

@charris can you sketch that out slightly more? I think, you are basically saying to drop the element-wise compare function and just do the PyArrayDTypeMeta_GetSortFunction:
https://github.com/numpy/numpy/pull/28516/files#diff-136726bc7d1ef2257dd24ea927c31155ad76543e9fbd6eeb96c1a8b78da458faR3556-R3565
(which is the function the DType defines)

Then, rather than exposing an elementwise function, we have a public GetSortFunction() so that you can "inherit" the behavior from another DType (i.e. if I implement a unit dtype, I can call the one for float64).
Since we have a context that is specific to sort (and not shared with casts/ufuncs), I agree it makes sense to put a lot of information on it (that way the sort implementation that is returned is passed the context as well and has the information if desired).

For StringDType that may make the implementation slightly more complicated, but that is not a big deal, and we can probably figure out another path to make it easy.

FWIW, I am not too worried about versioning. We can add things to the context if backwards compatible. If not, it is possible to add a GetSortFunctionX slot and deprecate the old one.

@MaanasArora
Copy link
Contributor Author

MaanasArora commented Aug 22, 2025

Thanks @charris, that makes sense! I think we can definitely simplify the implementation. If I’m understanding correctly, you mean we could drop the sort-compare and expose the internal npy_sort functions instead, to be used in user implementations?

That sounds really nice, but the current sort functions read the comparison from the DType (a legacy array function rather than a slot, via PyArray_GetArrFuncs(dtype)→compare). We could change these to read the sort_compare from the context, but it seems you would like not to pass the sort_compare in the context?

Essentially I am wondering if the sort-compare user is actually more common than the sort user, mainly because most 'compound' types like StringDType or multi-precision float would need a new sort_compare or a complete new sort, and sort-compare is easier. Of course a dtype could go all the way and rewrite the functions completely, but then exposing the builtin functions may not be that useful.

Of course this doesn’t mean we need a full new slot! It actually seems to be an implementation detail, yes – when you say ‘indirection’ at the sort-compare level, would you like something like this (except we pass the sort-compare some other way if you don’t want it in the context)?

stringdtype_get_sort_function(SortContext *ctx)
{
  npy_quicksort_with_context(ctx, &stringdtype_sort_compare, ...);

  // or something like this, though probably not this?
  NpySortContext_SetCompare(ctx, &stringdtype_sort_compare);
  npy_quicksort_with_context(ctx, ...);
}

I think this would actually be quite nice! Users can choose to do this, or reimplement the full function.

Sorry, this got long, but just wanted to clarify what you were thinking. Thanks again for your feedback; it was very helpful!

@charris
Copy link
Member

charris commented Aug 23, 2025

but the current sort functions read the comparison from the DType'

They don't, except for generic types. The comparison functions are defined as inline functions in npysort_common.h. They tend to be in the inner loop, so it is best to avoid a function call there.

What I am thinking is that for public use there are four attributes to select sorting shim functions. The following information needs to be passed, either in context or by arguments:

  • dtype (we could save one level of indirection by using a separate dispatcher for each dtype)
  • stable=True (array api). If False, the implementation might still be stable (radix sort).
  • decending=False (array api)
  • nan position.

Note that with included dtype, only one dispatcher is needed for all dtypes. Or we could omit the dtype and attach different dispatchers to each dtype.

The returned sorting shim function allocates/releases working memory, makes contiguous copies if needed, and calls the lowlevel sorting implementation. Arguments something like

  • array (for data and shape and flags)
  • axis
  • output array (if we want that option)

This is close to what the current implementation has evolved to. In the beginning we called by algorithm, but here call by functionality.

The question if we want to expose the lowest level functions, or maybe the shims, in addition to the dispatchers.

I don't intend to this to be the final proposal, but I think this design should start by answering the question of what we want the new interface to accomplish. Feedback and alternate proposals would be good to hear.

@MaanasArora
Copy link
Contributor Author

MaanasArora commented Aug 23, 2025

They don't, except for generic types. The comparison functions are defined as inline functions in npysort_common.h. They tend to be in the inner loop, so it is best to avoid a function call there.

Ah, right thanks, those are for most builtin types indeed. Only dtypes for which explicit dispatchers aren't defined actually take the npy_quick/heap/timsort. I suppose my question was moreso if every user dtype would have to necessarily write a sort function - currently, that is not needed, as we pass the compare arrfunc and use the generic sorts. But if we move to new slots as in this PR we need to somehow inherit the compare or define it, unless we want every dtype to write their own sort.

That aside, the API seems good! What I understand is that we pass stable instead of the sort_kind? The interface in this PR does contain:

  • dtype: one dispatcher for each, so not passed
  • sort_kind: this is the algorithm, rather than stable (this is not in the context, but can be moved there)
  • descending: in the context
  • nan_position: in the context

I'm not sure I fully understand the dispatcher idea but it seems useful to add a layer on top of the low-level sort. The lower-level generic implementation, as far as I can see, does not contain axis and output array arguments.

So, to be clear if I understand this correctly, if we want to implement these changes, I think we first need to decide:

  1. if / how to handle comparisons for user dtypes if a custom sort function is not defined
  2. what level of internal sort functionality we are exposing

And then maybe:

  1. Add a shim between the sort function, and make dispatching explicit
  2. Handle all memory handling (including axes and output) in the underlying shim
  3. Expose parts of the internal sorting library to be reused by users
  4. (if needed) change how sort comparisons are defined

Sorry if this is off base. I'll think about this further, thanks!

Edit: If all of this won't be done in this PR, perhaps just the user API with some dispatch architecture (such that it fixes the external API) would be a good place to start. Once it's fixed, we can change the internals.

@seberg
Copy link
Member

seberg commented Aug 23, 2025

I would be happy to say that we only need stable/unstable sort kind. The main comment I have is that

The returned sorting shim function allocates/releases working memory, makes contiguous copies if needed, and calls the lowlevel sorting implementation. [...]
Arguments something like array (for data and shape and flags)

We may be thinking of much the same, but I would like NumPy sort to deal with contiguous copies, result allocation and outer iteration, so that the (user) DType only needs to provide the lowlevel sorting implementation. (I.e. the same pattern that ufuncs/casts use)

I would like to achieve that anything that a user DType may provide doesn't take arrays. NumPy DTypes need no knowledge of arrays (except for some old API, one of which we are replacing here).

I suppose I am just repeating that DType->get_(arg)sortfunction() is what we should add (and this adds it, with the signature PyArrayDTypeMeta_Get(Arg)SortFunction).
I.e. I think this is just the right abstraction and mirrors ufuncs/casts, that is the DType (class/type) provides the lowlevel sorting implementation (and we give the DType the chance to do setup/have multiple, which isn't often needed but is useful for structured dtypes for example).

Although, I would be OK to just provide a DType->lowlevel_sort_implementation instead, because we can always add the above to when we want (and most DTypes don't need the get_ version. Right now no dtype needs it).

But beyond that, e.g. whether we should have a CompareFunction, I really don't have a strong opinion.

Note that with included dtype, only one dispatcher is needed for all dtypes. Or we could omit the dtype and attach different dispatchers to each dtype.

We should always include the dtype (i.e. PyArray_Descr), for example StringDType needs it to work with its elements and structured dtypes (e.g. in casts) also use it for specialization.


In a sense I would love if we could implement sorting as (simple) gufunc even if that is overcomplicated for sorting... Unfortunately, we don't have a defined way to pass the parameters like descending.

@charris
Copy link
Member

charris commented Aug 23, 2025

@seberg Your proposal sounds like the current implementation, but with more slots for the low level functions. As I see it, we would like eight slots rather than the current three, that is, three booleans:

  • ascending/decending
  • stable/possibly not stable
  • nan last/first

We currently only really use two: stable/possibly not stable. Note that we already have shims for functions that require workspace, the shims allocate/release working memory.

The question then becomes how to allow extensions without making disruptive changes. I suggest that the easiest way is to have a pointer to a structure dedicated to holding function pointers rather than a function that returns the wanted pointer. The only complication is that the function signature is different between the generic/builtin cases, but I suppose that can be handled by a NULL in the comparison function arguments.

If we are not worried about future extensions, we could just add eight slots, sixteen total with sort/argsort.

What is it you would like with dtype inheritance?

I am interested in what users might like, @mhvk and pandas might have some ideas.

@seberg
Copy link
Member

seberg commented Aug 23, 2025

As I see it, we would like eight slots rather

I would like to just add one slot actually (please forget about the point that we don't have to use that). I.e. the function (a second one for argsort):

def PyArrayDTypeMeta_GetSortFunction(sort_context, ...):
    return lowlevel_sort_implementation

Now, internally, the dtype may or may not have 8 different versions. Because the new lowlevel_sort_implementation will also get the sort_context passed, so in principle you could choose to merge the functions into a single version if it doesn't matter speed wise.

Why the GetSortFunction pattern?

  1. It mirrors ufuncs and casts.
  2. It allows us to have the 8 versions without actually adding 8 slots explicitly.
  3. For some dtypes it can make sense to specialize things and that works well here. You could of course do that on the first call in principle (but, this mirrors ufuncs and casts).

Now, passing sort_context rather than stable, descending, nan_order is totally fine by me, I guess for PyArrayDTypeMeta_GetSortFunction it seemed fine to not bunch it up for convenience (and because I didn't expect it to grow any time soon).

If we are not worried about future extensions, we could just add eight slots, sixteen total with sort/argsort.

We are not (overly) worried about the future, because these are not the same kind of slots as the old ones. These are slots on the DType class the PyArray_DTypeMeta and we can add and deprecate these slots basically as freely as in Python!

@seberg
Copy link
Member

seberg commented Aug 23, 2025

To stress this: Even if we add PyArrayDTypeMeta_GetSortFunction today, we can easily add a PyArrayDTypeMeta_GetSortFunctionWithMoreOptions in the next version and keep both around or deprecate the other one slowly.

So we should be comfortable with the choice for the mid-term, but we don't have to worry too much in case someone comes around and wants sorts that e.g. work with non-contiguous data (if we don't really expect that to happen).

Adding nan last/first and descending now, is because there is clear interest for it in the near or mid term (e.g. because of top_k, although NaN we could also just define "always to end", I like the fact that having it around also documents the expectation).

@charris
Copy link
Member

charris commented Aug 23, 2025

There are going to effectively be 8 slots even with the function + context, it is a matter of how they are accessed. I'd rather they be in a table instead of in a switch. That said, I'll note that the improvement in speed between the original quicksort implementation and the type specific one was about 5x. That was due to several things, first, builtin comparisons, and second, a cleaner and simpler version of quicksort.

@seberg
Copy link
Member

seberg commented Aug 23, 2025

There are going to effectively be 8 slots even with the function + context, it is a matter of how they are accessed

True, although some of them may be duplicate entries (if perfect specialization isn't worth it or unnecessary e.g. for integers which can ignore NaNs).

I'd rather they be in a table instead of in a switch

I don't mind that too much the slot could be a struct with 8 entries. Although, I prefer this because of the reasons I said above:

  1. It is how ufuncs/casts are designed and I think it gives more flexibility to user DTypes (this is for the public API). Even if a bit more verbose.
  2. It is much more obvious to do setup work there, which could be useful e.g. if anyone ever wanted to optimize structured dtype comparisons.
  3. (In the extreme case a future user like numba could even JIT a function.)

But yeah, it may be that that we repeat the pattern to dispatch to the specialized version a few more times with my preference, because it could be repeated for multiple DTypes rather than just once.
(I guess that may depend a bit on how we actually do it internally, we could use the same function for all our Numerical dtypes after all and use a static struct internally in principle.)

@seberg
Copy link
Member

seberg commented Aug 23, 2025

That said, I guess we could always add my GetSortFunction slot later if I want to, so in that sense, I shouldn't care about what approach we use, and if you prefer slots a lot (and nobody chimes in), I am very happy to go that way @charris.

Slots may have the advantage of being slightly easier to reason about for user DType authors as well. (i.e. for ufuncs, the user doesn't need to use the get_loop slot either, it's optional)

@MaanasArora
Copy link
Contributor Author

For the public API it seems easier and clearer (in intent) to me to use a function rather than a struct. I'm not too familiar, but I would think it is possible for compilers to optimize, say, NaN order and descending, if the user writes obvious fast paths; in such cases it may be simpler to allow the user to merge functions than write 8 wrappers or versions (and as @seberg said this is similar to ufuncs too). And not all dtypes would care (equally) about all options (especially nan). Also, this seems more consistent with the slot design we have so far.

Also I think it would be nice (if @charris meant that as well) to expose some of the internal sort machinery we have and in that case user dtypes can reuse it more easily with a single function, presumably, if we wrap things (efficiently) (though this ties into the discussion whether we need to generalize compare, which I'm not sure about anymore...)

@MaanasArora
Copy link
Contributor Author

MaanasArora commented Aug 23, 2025

To be clear I'm not against going with either approach if there are strong opinions, just thinking out loud! Happy to go with whatever everyone thinks makes sense :)

@charris
Copy link
Member

charris commented Aug 24, 2025

To be clear, I wasn't recommending actually adding 8 (16) slots to the dtype, that was just to illustrate the effective functionality. What I would suggest is adding a pointer to a structure instead of a function. Yes, some of the entries could be duplicates or NULL. For instance the only reason to maybe prefer quicksort over radix sort for short integers is that radix sort requires extra memory. The only difference from a function is using an index rather than a context. I suspect a function might want to use a table for implementation, converting the context to an index. So I guess it is just a matter of the easiest way to pass the selection information.

I would suggest either new PyArray_SortExt and PyArray_ArgSortExt functions that take more/different arguments, context maybe, or simply extending NPY_SORTKIND. We will want new sorting functions for the existing types that do not use the generic sorting functions, but rather builtin comparison functions. Such functions are easy to add, and it looks like the SIMD sorting functions can do descending sorts, not sure what they do for nans.

@charris
Copy link
Member

charris commented Aug 24, 2025

For descending sorts, we could just reverse the output of existing sorts as a quick, if dirty, solution. That would also work for floats if we added the option of sorting nans to the beginning.

@seberg
Copy link
Member

seberg commented Aug 25, 2025

For descending sorts, we could just reverse the output of existing sorts as a quick, if dirty, solution

Yeah, that's a good point. If we provide the family of functions (yes, via a single slot as you said), then we can auto-fill half the slots via reversing others.

So in the end, I think where we are heading is basically adding a sorting-slots struct. That was maybe always the right thing to do (even for the fewer ones in the PR, I guess I just didn't need it as much before; the one indirection should be meaningless).

Since I do like the idea of potentially adding a GetSortFunction(), I think we could do the following single slot:

struct {
    sortfunc *sort_ascending_stable;
    sortfunc *sort_ascending_unstable;
    sortfunc *sort_descending_stable;
    sortfunc *sort_descending_unstable;
    ...
    /* and argsorts */
    ...
    void *reserved1 = NULL;  // for a GetSortFunction probably
    void *reserved2 = NULL;  // GetArgsort
    /* could reserve more if we want a compare function, but no need to worry */
}

I don't have an opinion on how many sort functions we include. 4+4 we need, but maybe all 8+8 (as the NaN versions can be implemented via reversing if descending sorts to end).
I would personally be OK with relying on GetSortFunction for the full zoo, but I think you prefer to avoid it which is fine by me.

@charris
Copy link
Member

charris commented Aug 25, 2025

Been thinking about this some more, The current logic is:

  • Look for function with builtin compare (in table)
  • If no function (table entry is NULL, use generic sort with descr compare function.

That is pretty much what this PR does AFAICT.

What I think we need is to have more sorting options, i.e., decending, maybe nan position. To make those accessible, we need to either extend NPY_SORTKIND or add a new C-API function with appropriate arguments. I think that is the first decision to be made. Once that is decided, we should implement that for all current sort types and use it internally. That will allow the current tests to be used to check the infrastructure. That is steo one. The second step (new PR) would add descending builtin sorts, with extended tests. That could be broken down into several steps, maybe doing only generic/stable/etc at each point.

Thinking about descending sorts, stable sorts cannot be implemented by reversal. Indeed, I think we need to explicitly define what "stable" means for descending sorts, which should mean the equal elements are sorted in index order.

Come to think of it, reversal before sorting followed by reversal after should work in the stable case.

Table pointer or function is an implementation detail, I could go with either.

I think we should be able to get descending sorts added for 2.4.

@MaanasArora
Copy link
Contributor Author

Yes, that is the current logic! In order of priority:

  1. If a get_(arg)sort_function is found, that is used,
  2. If it is not found and a sort_compare is found, that is used with the new built-in functions,
  3. If that is not found, and the older arrfunc slot ((arg)sort), that is used (this includes SIMD implementations currently, it seems),
  4. Failing all that, we rely on the compare arrfunc and use the older built-in functions.

What I think we need is to have more sorting options

Just to be clear, for 1 and 2, the sort methods are restructured and nan_position and descending are actually functional! Those are retrieved from the context. It isn't completely redone, so it's patchy, but there is the basic infrastructure. This is done by patching in a compare_with_context function that replaces the compare arrfunc - it acts more like a 'should swap' function than a compare function, and reacts accordingly to these two options e.g., if the sort is descending, the effective comparison is flipped. Only if we get to 3 and 4 those options aren't available anymore, and context goes unused.

What is not done yet is the sort implementations for built-in dtypes, like numpy integers or floats - those are also implemented in SIMD AFAIK. We only have public API so far; none of this is in use internally, so yes, that should be next so we can test it here. This basically amounts to replacing options 3 and 4 with 1 and 2 for all dtypes. This should be possible by initializing the internal dtypes with the get_(arg)sort_function and (if needed) sort_compare slots.

@charris
Copy link
Member

charris commented Aug 26, 2025

How do you access those from the public C-API, PyArray_Sort doesn't have the arguments ...

@MaanasArora
Copy link
Contributor Author

MaanasArora commented Aug 26, 2025

Should have been precise, sorry, functional in terms of the actual sorting routines, the levels down from _new_(arg)sortlike, not at the level of PyArray_(Arg)Sort. By public API I meant the dtype author side; public user dtype authors can define sorts for their dtypes using the arguments in the context. I think functional is not the right term then, just wanted to clarify a default context is being created and used:

https://github.com/numpy/numpy/pull/28516/files#diff-e469e991ad3c24ac159bcc0469153ebb85fbed6f81c937c07e126f001a1007f7R1219

We do need to add the actual options as you said.

@mattip
Copy link
Member

mattip commented Aug 28, 2025

As I understand things, the first commit 87f410b is all that is needed to actually solve the slow string sorting. Perhaps it would be worthwhile to have that as a separate PR to solve the immediate problem, together with some benchmarks, until the dust settles around the wider refactor?

@MaanasArora
Copy link
Contributor Author

Moved this PR to #29737 given it would need major changes after #29642. Closing this - thanks everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: np.unique and sorting is slow for StringDType

5 participants