-
-
Notifications
You must be signed in to change notification settings - Fork 12k
ENH, API: Add sort_compare slot to DType and use sort wrappers if provided
#29987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // "must provide a `resolve_descriptors` function if any " | ||
| // "output DType is parametric. (method: %s)", | ||
| // spec->name); | ||
| // return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be merged!!! I noticed we cannot actually not pass resolve_descriptors in most cases, at least if the dtype is parametric (since it's passed as output)! Should we special case somehow?
| } | ||
| NPY_DT_SLOTS(stringdtype)->argsort_meth = argsort_method->method; | ||
| Py_INCREF(argsort_method->method); | ||
| Py_DECREF(argsort_method); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This slot is already assigned - we probably need to decref the existing method, but I wonder if we should also check in PyUFunc_AddLoopsFromSpecs? Though one doesn't usually define both the compare and the sort...
bdae010 to
5c5ad5a
Compare
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaanasArora - I just had a quick glance, to see how this would look. Overall, it makes sense and looks good. Let me know when it is ready for a more detailed review.
|
Thanks for reviewing @mhvk! I've added docs and done some cleanup - I think this should be ready now for more detailed review. I suppose a main part of the API that needs discussion is how to implement descending sorts (with NaNs to the end):
I've implemented this PR such that we could do the first in a follow-up (i.e. I use the descr), but happy to change direction if we prefer the second. |
That was to make stable descending sorts, but it doesn't fix the NaN problem, i.e., NaNs will still sort to the beginning. Which I suppose might be correct, but ... I think we need to settle what descending stable sorts look like and where the NaNs go. Sounds like an array-api discussion. The double reversal keeps the index order for elements with the same value. |
We think we have already settled that one pretty much in the earlier API discussions. Happy to bring it up there, but I don't think it is going to add anything, except a shrug and agree with my opinion. The point is, array libs have no clue about this anyway, you have to look at dataframes/tables for prior art and there both versions exists (but my feeling is our version is more common in newer API). "Our version" was to specify where NaNs end up irrelevant of the sort order. This is the better user API, IMO, it also means that Anyway, I at least consider the discussion of how to handle NaNs by default as settled. |
Ah yes, sorry, NaNs will still get flipped, which means passing the context is necessary to enable sort compare to respond to descending. I'll do that now. |
…enable descending sorts
|
Sorry, just one more question: is there perhaps a dtype that can be used to test this PR? With the most recent refactor, StringDType actually does not need to use the indirection to sort compare, so if preferred I can remove it before more review (and do it in a follow up). |
We only really have the scaled float dtype for testing inside numpy. If one of the dtypes in the user-dtypes repo won't work, then this whole discussion is a little moot until someone writes a dtype that needs the functionality. I haven't been following as closely as I'd like, so I'm not sure offhand what sort of dtype you'd need. |
This functionality is already part of the arrfunc machinery (as This PR is basically a migration of the older |
We could rewire |
I think it would be nice to have a testing job that checked out the user-dtype repo and tried to build it and run the unit tests inside it with the latest numpy version. Totally not necessary to do that to merge this PR though. |
|
Closing this after #30266 was merged - going to break this into two! Thanks. |
This follows #29900 and #29737 to add a
sort_compareslot to the DType, which can be used to enable sorting for user dtypes without completely redefining the sort. Also uses it inStringDTypeto allocate the lock just once and thus fixes #26510.Also, there are now wrappers for the existing sorts, so it is probably easier to move to the array method API later. However doing so would require templating the SIMD sorts and doing other infrastructure gymnastics, so not done here. This does not implement descending, which can actually also be done as a follow-up if we keep the API the same and just flip the comparison twice.
This still needs release notes and docs but wanted to push this before to clarify direction. Unfortunately, it is quite verbose, especially for StringDType. ping @seberg @mhvk @ngoldbaum if you're interested - thanks!