-
-
Notifications
You must be signed in to change notification settings - Fork 12k
ENH: New-style sorting for StringDType #30266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Running this script: import numpy
import random
import timeit
print(numpy.__version__)
options = ["left", "right", "leftovers", "righty", "up", "down", numpy.nan]
lst = random.choices(options, k=1000)
arr_s = numpy.array(lst, dtype="T")
time = timeit.timeit(
"numpy.sort(arr_s, kind='stable')",
globals=globals(),
number=1000,
setup="import numpy",
)
print(f"Time taken for stable sort: {time:.6f} seconds")produces on main: and on this PR: |
|
Which Python version did you use to do the comparison? If it's 3.13 or newer I bet you'll see even better results using 3.12. |
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looks good to me.
This does add a teeny bit of per-array extra overhead in the sorting fast paths to call fill_sort_data_with_array. Maybe worth running the sorting benchmarks. It would only matter for tiny arrays though and I bet it's lost in the noise.
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaanasArora - this looks great! The edits to the code you added before all make sense. For the PR as is, I just have a suggestion about using get_loop to pass on the sort implementation function, which allows one to remove some of the wrappers.
A larger question -- quite possibly just for follow-up -- is whether the hooking in to sort is at the right level -- you probably won't be surprised to hear I thin it might be nicer (and more useful later for custom dtypes) if inside StringDType, we avoid calling internal sort functions, but rather wrap a new, standard ArrayMethod loop that covers the use case of sorting with a comparison function.
I guess this would still need your new npy_quicksort_impl, etc., but would avoid exposing them. Instead, we would expose a new general ArrayMethod loop (one each for sort and argort) that expects to find the comparison function somewhere (probably best in context->method->static_data; we have used that so-far private slot a lot in stringdtype_ufuncs.cpp). This new loop would then call the existing npy_quicksort, etc., based on the sort flags, passing on the comparison function.
Does this make sense?
Note that now that I realized this would still need the changes to npysort that you have here, it may make more sense to do this in follow-up -- since we do not expose any new API here, that is certainly possible (just want to confirm, though, that we are not planning to expose npy_quicksort_impl, etc.!).
| (PyArray_StringDTypeObject *)context->descriptors[0]; | ||
|
|
||
| npy_string_allocator *allocator = NpyString_acquire_allocator(sdescr); | ||
| int ret = sort_loop( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it would feel more logical if sort_loop had the ArrayMethod signature rather than that of the (hopefully eventually removed) sort loop.
But obviously that would need to have a standard array method loop that uses a comparison function -- see main comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be better, and also it seems to be a very useful feature if exposed - dtypes do not need to restrict themselves to sort_compare or similar functionality if they need to do setup work etc. Seems like a natural extension if it's okay in a follow-up; I just tried to keep this PR minimal!
|
|
||
| NPY_NO_EXPORT int | ||
| npy_mergesort(void *start, npy_intp num, void *varr) | ||
| npy_mergesort(void *start, npy_intp num, void *varr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: keep starting brace on its own line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done, thanks!
| */ | ||
|
|
||
| static inline void | ||
| fill_sort_data_with_array(void *varr, npy_intp *elsize, PyArray_CompareFunc **cmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name confused me, since it suggested filling an array. Maybe just get_sort_data_from_array?
Though since it is only two lines, I also think it is also fine to repeat those 4 times in the code - it helps understanding why it is needed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the main idea was just to have it reusable along with a fill_sort_data_with_context for new arraymethod-style loops - in case we need to change some logic, we just change the helper. But I can see why this is a bit too minimal not to duplicate, and the sorts are legacy anyway - @ngoldbaum since you raised performance concerns with this, would you prefer just duplicating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you get any speed benefit from explicitly in-lining, since the function is marked as inline, presumably the compiler will do that already. My main complaint really was about the name -- but if there is precedent in the same file, feel free to ignore that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! There is no precedent - I've renamed the function. Thank you!
| PyArrayMethod_SortParameters *parameters = (PyArrayMethod_SortParameters *)context->parameters; | ||
| *flags |= NPY_METH_NO_FLOATINGPOINT_ERRORS; | ||
|
|
||
| if (parameters->flags == NPY_SORT_STABLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the two loops that we return are identical except for passing on different NpyAuxData, I think we can simplify this to have just one loop, but already set the auxdata here (a nice aspect of what get_loop can do!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure I fully understand! We don't pass different transferdata? Do you mean we can use NpyAuxData to pass the legacy sort loop? Where would that go?
I can see using context->parameters inside the sort loop, instead of get_loop, though! That sounds like a nice idea if that's what you meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I wasn't very. Basically, I think you can do,
stringdtype_sort_loop( // Just one sort loop!
PyArrayMethod_Context *context,
char *const *data, const npy_intp *dimensions, const npy_intp *strides,
NpyAuxData *transferdata)
{
return stringdtype_wrap_sort_loop(
context, data, dimensions, strides, transferdata,
transferdata); // transferdata contains npy_aquicksort_impl or npy_amergesort_impl
}
int
stringdtype_get_sort_loop(
PyArrayMethod_Context *context,
int aligned, int move_references,
const npy_intp *strides,
PyArrayMethod_StridedLoop **out_loop,
NpyAuxData **out_transferdata,
NPY_ARRAYMETHOD_FLAGS *flags)
{
PyArrayMethod_SortParameters *parameters = (PyArrayMethod_SortParameters *)context->parameters;
*flags |= NPY_METH_NO_FLOATINGPOINT_ERRORS;
if (parameters->flags == NPY_SORT_STABLE) {
*out_transferdata = &npy_amergesort_impl;
else if (parameters->flags == NPY_SORT_DEFAULT) {
*out_transferdata = &npy_aquicksort_impl;
else {
PyErr_SetString(PyExc_RuntimeError, "unsupported sort kind");
return -1;
}
*out_loop = (PyArrayMethod_StridedLoop *)stringdtype_sort_loop;
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I am missing something obvious, but would this not cause a pointer issue (casting &npy_amergesort_impl to NpyAuxData *, for example, cannot be done)? Should we define a new struct with the auxdata base to cast to and back is what you mean? Happy to do that of course!
Edit: tried what you sent, and indeed there was a cast issue, at least the function cannot be directly called with the signature then...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems using NpyAuxData requires a struct definition with an auxdata base, so something like:
typedef struct {
NpyAuxData base;
PyArray_SortImpl *sort_impl;
} StringDTypeSortingAuxData;
with free and clone might work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need a struct...Note that you are using auxdata to pass the function in the code you have above as well, just a layer deeper.
Grepping, I see that in dtype_transfer.c, the same kind of thing is done, but it does have a cast:
| *out_transferdata = (NpyAuxData *)data; |
That said, maybe this does behave differently than the auxdata in legacy-style loops: it seems everywhere I can find it, memory is allocated for it. Let's ping @seberg, since that may just be because everywhere else there is actually an allocation needed. Unfortunately, the documentation does not quite help -- https://numpy.org/doc/stable/reference/c-api/array.html#c.PyArrayMethod_GetLoop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, memory probably needs to be allocated. Even in that example though, there is a struct, which is the type of data:
numpy/numpy/_core/src/multiarray/dtype_transfer.c
Lines 276 to 280 in 1aa3109
| typedef struct { | |
| NpyAuxData base; | |
| PyArray_Descr *descr; | |
| int move_references; | |
| } _object_to_any_auxdata; |
And after that descr and move_references are members of the struct.
Note that you are using auxdata to pass the function in the code you have above as well, just a layer deeper.
Maybe you are looking at the (arg)sort_loop parameter? There is an additional argument after transferdata:
int
stringdtype_wrap_sort_loop(
PyArrayMethod_Context *context,
char *const *data, const npy_intp *dimensions, const npy_intp *strides,
NpyAuxData *transferdata, PyArray_SortImpl *sort_loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaanasArora - Now had time to look a bit more carefully and with out_transferdata requiring allocation, it is just too much work. It is also not really necessary: if we move selection of the implementation in the wrapping loops, things can stay super simple. I just made a PR to your PR to implement that - have a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks - I've merged your PR!
| return ret; | ||
| } | ||
|
|
||
| int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need only one loop (say, stringdtype_sort_loop and stringdtype_argsort_loop if we use transferdata -- see comment on get_loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above!
|
Thanks for reviewing both! @mhvk I'm a bit confused about the @ngoldbaum I actually used Python 3.12.7. The improvement in 3.13 is a bit more modest (a bit less than 2x). I tried to run the sort benchmarks, but they seem quite flaky; perhaps we can just do away with the helper based on @mhvk's suggestion inline. |
Argh, too bad. Sorry! I'll defer to Marten on the API decisions, he has much better taste for that stuff than I do... |
|
@MaanasArora - OK, definitely happy to go with what you have here and do follow-up. Having something working that uses the new mechanism will help a lot for that anyway. |
MAINT: refactor to select sort method later, removing duplicated code paths
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small in-line comment.
Also, I wonder if we should use the ufunc registration rather than the private sort method pointers? Since there are probably people like me who copy & paste parts of numpy code, it is helpful to use public api as much as possible. But no big deal if you don't feel like it!
| return -1; | ||
| } | ||
|
|
||
| // sorting cannot be a view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small one, but I think by setting *view_offset = 0 you actually indicate it can be a view (no view is NPY_MIN_INTP. I think it is fine not to touch view_offset at all, but maybe best to ask @seberg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, perhaps not good to touch it, thanks. I'll delete these lines, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted the lines!
|
@mhvk I did actually try using |
…solve_descriptors
|
@MaanasArora - OK, that's indeed unfortunate, but for another time! With your latest change, I think this is all ready! Still, maybe it would be good if @seberg had a look too... I do think this is all OK, but I wondered a little whether this case is supported by the Side note for future: the hackery here to get the standard sort/argsort implementation with a |
|
The test failure is unrelated. |
seberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HAven't looked very close, but besides a small nit, looks good to me.
Yeah, the wrapping loop is for ufuncs, and frankly, this isn't really a ufunc. Even if it works, I suspect it would just be confusing.
I've been thinking that one approach might be for the API to have not just the ability to register a loop from a specification, but also to retrieve one.
Yes, that is something we could and should expose (there is some python stuff for ufuncs but just to allow experimenting).
But I don't think that would actually be useful here?
| return 0; | ||
| } | ||
|
|
||
| int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
| int | |
| static int |
(and elsewhere). We are not hiding symbols by default yet, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is done!
Indeed, not useful here yet. It would become useful if one could use it to retrieve the default "use compare slot" loop, and then wrap the loop in the allocate acquire/release part (i.e., we wouldn't have to get those loops through using the sort header file). |
|
Yes, I think a wrapping loop doesn't twist well to this idea and might be hard to maintain / evolve too.
Perhaps we could expose default sorting loops too? I think those would be pretty much necessary and was done (though very lightly) in the sort compare PR, so we could expose them directly, or through convenience functions (like this idea). C++ templating came up too - but it probably won't be absolutely necessary to keep parity since we were passing a comparison function even in the legacy code. It's the same idea anyway! |
|
Thanks for your persistence on this @MaanasArora! Merging... |
* ENH: New-style sorting for StringDType * BUG: fix descriptor handling in stringdtype_sort_resolve_descriptors * REF: Cleanup braces and function order * REF: rename fill_sort_data_with_array to get_sort_data_from_array for clarity * MAINT: Move sort type selection one level down, reducing duplication. * MAINT: Combine two levels of indirection * REF: remove unnecessary view_offset assignment in stringdtype_sort_resolve_descriptors * REF: change function visibility to static for stringdtype sort functions --------- Co-authored-by: Marten Henric van Kerkwijk <mhvk@astro.utoronto.ca>
* ENH: New-style sorting for StringDType * BUG: fix descriptor handling in stringdtype_sort_resolve_descriptors * REF: Cleanup braces and function order * REF: rename fill_sort_data_with_array to get_sort_data_from_array for clarity * MAINT: Move sort type selection one level down, reducing duplication. * MAINT: Combine two levels of indirection * REF: remove unnecessary view_offset assignment in stringdtype_sort_resolve_descriptors * REF: change function visibility to static for stringdtype sort functions --------- Co-authored-by: Marten Henric van Kerkwijk <mhvk@astro.utoronto.ca>
Sorry, many PRs - I'm closing the other one (#30112). This should be much simpler.
Implements sorts and argsorts for StringDType using the feature introduced in #29737. The sorts allocate the mutex lock only once, thus fixes #26510. Some refactors of the npysort code were needed which should help with further work (e.g., descending sorts and moving the builtin dtype machinery).
Posting benchmarks below. ping @seberg @ngoldbaum @mhvk - thanks for reviewing!