-
-
Notifications
You must be signed in to change notification settings - Fork 12k
ENH, API: New sorting slots for DType API #29737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think this needs to be documented in the C API docs for the dtype API: https://numpy.org/devdocs/reference/c-api/array.html#slot-ids-and-api-function-typedefs |
|
Yes thanks, missed that. This is done! Will add a release note also. |
|
Awesome, thanks! Can you also update your PR over at numpy-user-dtypes to make use of the updated API (if any updates are needed)? numpy/numpy-user-dtypes#109 The CI over there is slow at the moment so hopefully that's not annoying. We need to add some caching for the quaddtype build. |
|
Sure! Currently that PR relies on the |
|
Actually, do you think we should get this PR to parity first instead? It doesn't seem those dtypes have too many considerations for overall sorting itself, and I think a lot of examples including StringDType rely on the comparison feature. I think a comparison system for using generic sorts is about non-negotiable at this point, so I might as well add it now. |
|
Sure, whichever you think would work best. Just wanted to make sure it's on your radar. |
|
Note that sortkind has the property that |
|
A question that I also raised in #29642 - is the design of having a struct table with pointers to different sort functions for different properties set in stone? To me, it feels illogical if we are having It seems to me more sensible to have just two functions (sort, argsort), both of which takes Implementation-wise, it would also seem easy enough for a developer to implement the table if they want to go that route, while the other way around is just annoying (I can easily imagine that ascending/descending is easily done in one function). But it may well be that I'm missing the bigger picture here! |
Only for the generic sorts. The type specific sorts gain speed by having Once this is settled, I plan to generate the needed descending sort functions. EDIT: It is also easy to add entries to a private tables without causing incompatibilities. I don't see the added complexities of hints as having much use at the moment, they can be added later if needed. The only real use I can see would be for out-of-memory sorts, and that would probably qualify as a requirement rather than a hint. |
|
In #28516 we actually had just two functions that mirrored ufuncs, |
|
@charris - yes, my example was perhaps poorly chosen. My question is really more general: isn't it better to define the API as just having only one The only downside I see is that this does not automatically provide a way to tell that a given dtype can sort descending, etc., hence the idea of a mask or specific return code. Though it probably is fine to generically just error if a sort function doesn't support it (same as function=NULL in the present proposal) and fall back to a standard sort using the comparison. |
|
@MaanasArora - sorry, missed your answer, which appeared while I wrote mine. I can see the wish for performance but it seems to me calling a function with Anyway, I don't want to make too much of this... |
|
@mhvk no problem! Yes, it would probably not impact performance terribly, and if user dtypes want to define a simple sort, even if slightly inefficient, I guess we could let them :) Another benefit of having the older architecture that @seberg suggested would be that it mirrors ufuncs, and allows for more flexible configuration with the context as you said. The reason (as far as I understand) we went with structs, was that I think tables are more explicit in a way — we have 4 ways of doing sorts, so declaring them outright is simpler in its own way as we have straightforward and performant routines to call without needing to consider configuration. Personally, I think the config doesn't change the code in the routines much at all so it might be simpler to just pass it into two functions. I guess it depends on whether the performance gains are good to have too. I'm happy to go with whichever approach everyone thinks is better for the project, I am just clarifying what I understand so far! |
Except that @charris also introduced an option to sort nans first or last... It is currently commented out, but it would make it eight options... Which is partially why I like passing on flags... 😺 Note that I'd be absolutely fine with going with a mechanism mirroring the ufuncs -- in the end, sorting could in principle easily be done with a p.s. Sorry to come to this discussion late! |
|
Right, so basically we have 4 typical implementations. With NaNs, it would be 8, although we wouldn't need all. I am now leaning towards not worrying about NaNs, though. I just doubt that sorting NaNs to the front is niche enough to make it hard.
I think optimization is actually a reason to expose the
Not sure this matters, but I think optimization is a reason for the UFuncs actually support both styles. The closest design to what ufuncs actually be having both! (Implementation wise, There was the question of making it easy to use the generic implementation by only providing a compare function. But it seems fair to defer that to a follow-up, it is also again be somewhat mutually exclusive (since the compare function could fill in all of partitioning, sorting, argsorting, and potentially even compare ufuncs!). |
|
I am quite happy to leave the design choice to @seberg here. My first instinct would be to go with the simplest implementation, but agree that a getter approach offers more flexibility for future changes. That said, the future is hard to predict, so we shouldn't spend too much time on it. I think the most important thing in the short term is to make alternative selections easy, SIMD for instance. I think Sebastian is thinking of run time dispatch as a possibility. |
|
@seberg - thanks for the details! I still should at some point look at how the ufuncs actually do it, but perhaps you can help by pointing out in the code or docs where things are defined... If ufuncs do both, then I think we should pick one of those for now, and mimic it as close as possible. (To me, the getter still seems the more scaleable solution, so I'd prefer that...) |
|
Ah, right, I would love to move the runtime dispatching into the getter. I didn't think of it TBH, because that isn't where it is for ufuncs right now (it may be there for casts, not sure). I think I convinced myself in the previous comment, that both is useful (for simplicity) but that it is perfectly reasonable to use two distinct slots for both (because you only need to use one of them). (I do have to go through the API here, although I suspect it is pretty much settled from the last PR.) |
Hehe, fine, we can also start with the getter. I was considering starting with this just to make the step of changing our internals slightly easier :). Honestly, I don't care, so if you like a getter maybe that is the way to go. @mhvk I think it's easier to look at the public API here. Not great layout, but you can see: https://numpy.org/doc/2.1/reference/c-api/array.html#c.NPY_METH_strided_loop the slots (of which most ufuncs only need the first strided-loop one, but casts need more). And further down the |
|
Thanks, that's useful! One question before proceeding is whether it might be possible to treat the sorting code as an array method. Since effectively these are |
Maybe, but that means 4 gufuncs here if we don't figure out parameters and I think that is too much for the immediate need (if interesting). And sorting may be a bit special? We simplified to contiguous loops since it sounded like supporting strided ones is unlikely to be useful (but of course one could likely wrangle that into gufuncs to focus on that). I suppose I agree that gufuncs might be the better thing, but without figuring out some loose ends that seems hard. And figuring out the loose ends seems maybe a bit much for this. |
|
I guess the question is whether we can get away with defining a |
I do suspect sorting is a complex enough operation that some consideration for future extensions is important; the getter solution seems to target many of them well enough and it seems more natural to what this API actually does (allow users to define custom sorts). If optimization is actually easier with it, that's great and seems more reason to go that way!
Makes sense! That should make the size of this PR much more palatable :) In practice we're still accessing the compare arrayfunc, so we can probably fit the stringdtype fix into this PR without needing to move to a slot for that. Though I'm happy to do it in a small follow-up too if you prefer.
Would it be possible to transition to gufuncs using this API down the line? We could presumably allow the same API that we have now and internally insert it into a gufunc? Edit: I mean the getter API, not the one implemented right now, since it seems we're going that way! |
I suppose. In the end, the missing piece of gufuncs is the fact that their
If we do create a gufunc with the above capabilities, then yes, that gufunc could use this mechanism to implement itself for all dtypes that support it via this (while also allowing to do it directly if you want to support e.g. strided versions). (The biggest annoyance/dance would be that if the I am not sure that all of this should matter. I like the idea of doing the above, but I don't like the idea of sitting on this change for it indefinitely. @mhvk what does this mean to you? To me it doesn't change much right now to be honest (I wish it did, but I don't want to stall things). Unless this is an argument for doing just the explicit 4 loops for now, because a |
|
Yes, we do need to move forward and concreteness would be good... But maybe first back to basics: looking again at the PR, there are two things I'd really like to change. Use the standard array method loop signatureI'd suggest we ensure that the (arg)sort loops have the standard array method loop signature: This is obviously needed if we move to (arg)sort being Use loop registration, not slots on the dtypeIt feels like a mistake to me to define things on the dtype, in a way moving us back to the previous API, where the dtype defined a set of "handy" functions in For registration, using the ufunc registration mechanism seems by far the easiest -- any custom one would mostly duplicate it. Making (arg)sort and inside the current sort/argsort code, the relevant loop would be looked up in those fake gufuncs. Alternatively, one could expose a registration function specifically for (arg)sort that wraps this. Decide on how to deal with the flagsThe way the array method system is currently written, there is no straightforward way to pass on runtime information to a ufuncs other than the dtypes. Thus, by far the easiest way to deal with flags is to imagine different (which would argue even more for wrapping this in a That said, it would seem relatively easy to add a new |
|
p.s. The above is mostly because this PR introduces new public API, which I feel we need to get somewhat right. A private implementation for |
|
The test failure is unrelated. |
seberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, if you like, you could simplify the resolve_descriptors to just bail on non-native byte-order (even just assert).
Let me know when we got the method_flags (just the GIL one) and the loop_descrs/loop_descrs are used for the new path.
If you would like me to suggest a solution, let me know (or if you are sick of iterating, I can maybe also make an iteration).
| else { | ||
| loop_descrs[1] = given_descrs[0]; | ||
| Py_INCREF(loop_descrs[1]); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
view_offset makes little sense for sorting (a sort isn't a view, a cast may be, .imag may be a ufunc is fathomable but not supported... but a sort never is).
Thus I don't care what it is set to above, since you won't bother setting it here, nor will you check if it was set above.
In theory a dtype that can hold only a single element (and thus is always sorted) could set *view_offset = 0 here and then if we check that were it's called the sort could just return immediately, but that seems silly so...
| PyArray_DTypeMeta *dtypes[2], | ||
| PyArray_Descr *given_descrs[2], | ||
| PyArray_Descr *loop_descrs[2], | ||
| npy_intp *view_offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it doesn't make sense, especially if sort is already in-place (but not in the sense of being a no-op).
|
|
||
| NPY_BEGIN_THREADS_DESCR(descr); | ||
| if (!needs_api) { | ||
| NPY_BEGIN_THREADS_DESCR(descr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPY_BEGIN_THREADS_DESCR checks the descriptor for the flag, but that is exactly what you do above. So no need for the DESCR part.
(I think the descr isn't changed yet, but if I missed it let me know.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's an explicit flag so should use that! This is done (also caught an important bug)
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this all looks great, and I have no further comments (the ones I made just now are just me realizing that I was probably wrong before...).
Might still be good if @seberg can have a last look.
Also, if we merge, we should squash the commits!
p.s. @seberg - it would be nice to expose the ability to get loops from a ufunc. Partially inspired by this PR, I wrote a little class that multiplies whatever is input by a constant, and ended up getting the legacy loop via ->types and ->functions just because it seemed impossible to get the loop directly using the regular API. Or did I miss something?
| else { | ||
| odescr = descr; | ||
| Py_INCREF(odescr); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still slightly wrong should have explained more maybe. Needscopy needs odescr to decide. (Rather than swap since we replace the swap logic!.)
So we should find odescrfirst, and where we do, eithercheck equivtypes or check swap (or always check equivtypes later, it should take the ==fast path anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, thanks a lot for the pointers! I was missing that odescr is still the first descriptor (as it's the intermediate value in a sense). I've done something to this effect, hopefully it's correct!
| NPY_BEGIN_THREADS_DESCR(descr); | ||
| if (!needs_api) { | ||
| NPY_BEGIN_THREADS; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we do also need to usethe descriptors, in this case the first (as the second we assume is default intp, I am fine with that). But the copying must use context->descrs[0].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm a bit confused, won't the NEEDS_PYAPI flag (in get_strided_loop) handle that logic, and in case it doesn't, we're checking op anyway? Or are we building in a safeguard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget about it, I think I just picked a pretty random place in the argsort loop to comment that the desriptor is important there also. You added that.
We need to add such public API, there is the Python side "unstable" API as |
seberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be getting blind on this, but I think we can go ahead here and then continue and see how it pans out using it for more.
(I.e. we somewhat should change our sorts to actually use this, also since it'll be the only good way to get to reverse sorts anyway.)
Honestly, I think we could just put this in as is. It doesn't have meaningful additions right now as that would come with a way to actually register the sort!
| if (odescr == NULL) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't actually fail (unless the typenum is invalid), and we usually just use that assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so should remove the check? Did that.
|
OK, I think it's best to just put this in, thanks @MaanasArora, sorry that this is so difficult with the API design and ufunc back and forth. Right now, it changes (practically) nothing public API wise anyway. The added but unused |
|
Yay!! A really nice contribution, looking forward to the next steps, of allowing registration! |
|
Thank you both for reviewing; I learned a lot here! Looking forward to the next steps as well. |
|
Same here, I learned a lot (thanks @seberg! what you implemented is really nice, but it is non-trivial to understand without a simple example - which is why I like the scaled float case so much...) |
* ENH, API: New sorting slots for DType API * DOC: Add sorting context structures and functions to DType API documentation * DOC: Fix type signature syntax * DOC: Update data pointer type in sorting function signatures * DOC: Fix incorrect argsort function name * DOC: Fix formatting of function signature for PyArray_SortFuncWithContext * ENH: Initialize moving to an arraymethod table for user dtype sorts * ENH: Initialize PyArrayMethod_Context to zero in sorting functions * ENH: Add stable sort option to context in user sorts * DOC: Remove deprecated references from DType API sorting documentation * ENH: Refactor sorting parameters to use NPY_SORTKIND in PyArrayMethod_SortParameters * ENH: Remove unnecessary blank line in NPY_DType_Slots structure * ENH: Simplify axis check in PyArray_ArgSort * REF: Simplify PyArray_(Arg)Sort loop discovery logic * BUG: Add missing breaks in loop discovery in sorts * BUG: Simplify context initialization in PyArray_Sort and PyArray_ArgSort * BUG: Fix dangling pointer to context in PyArray_(Arg)Sort * BUG: Fix dangling context pointer in PyArray_(Arg)Sort * REF: Update sorting implementation to use new generalized array method system * ENH: Use dtype slots for sort and argsort arraymethods * STYLE: Add back blank line in methods.h * REF: Simplify conditional logic in PyArray_Sort and PyArray_ArgSort * BUG: Add error handling for strided loop retrieval in PyArray_(Arg)Sort * ENH: Fill out context and strides in sort strided loop calls * ENH: Add correct number of descriptors and strides in sorting loops * BUG: Correct descriptors and strides in argsort implementation * ENH: If-case parameters field in PyArrayMethod_Context * REF: Initialize PyArrayMethod_Context structure directly for sorting functions * ENH: Pass auxdata to dtype sorts * ENH: Update stride calculation for sorting to use item size * ENH: Refactor context initialization and descriptor resolution in sorting functions * ENH: Implement sorting and argsorting methods for scaled float dtype * DOC: Add comment to clarify that we ignore `view_offset` * ENH: Free auxdata and decref descriptors in sorting functions * DOC: Add comments to clarify that method flags are ignored in sorting functions * ENH: Add scaled float tests for sorting and argsorting with scaled and non-contiguous arrays * TST: Update scaled float argsort test to stride array * STYLE: Fix formatting in sfloat test_sort * BUG: Update sorting functions to check for sort method in cleanup instead of strided loop * REF: Fix scope for sort contexts * REF: Initialize sorting context only if taken * DOC: Add parameters member to PyArrayMethod_Context for loop parameterization * BUG: Move declaration of loop_descrs array in PyArray_ArgSort * DOC: Add comment clarifying arraymethod context parameters * DOC: Clarify description of parameters member in PyArrayMethod_Context * REF: Use existing variable to simplify NPY_DTYPE macro * BUG: Add missing output decref in PyArray_ArgSort * BUG: Add assertions to validate parameters in sorting functions of scaled float dtype * REF: Simplify argument passing by removing unnecessary dimensions and strides in casting functions * BUG: Fix castinfo changes * TST: assert exact stride values in sfloat sorts Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * TST: assert exact stride values in sfloat sorts Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * DOC: move versionchanged in ArrayMethod_Context Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * BUG: Fix stride initialization in _new_sortlike and ensure proper decref in PyArray_ArgSort * BUG: Correct dimension and stride initializations in sorting * BUG: Fix strides initialization in sorting Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * DOC: clarify function level in parameters for PyArrayMethod_Context Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * REF: initialize slots in-line for sfloat sorts * BUG: Remove incorrect dimension assertions in sorting loops * BUG: Correct size to sizeof * STYLE: revert newline deletion * STYLE: remove trailing whitespace * STYLE: revert delete newline * BUG: Fix incorrect stride assignment in _new_sortlike function * STYLE: Remove unnecessary whitespace * DOC: Improve comment Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * REF: Remove first branch in resolve_descriptors for sfloat sorts * TST: Add reverse sorting test for non-aligned arrays in sfloat sort tests * DOC: Add TODO comments for future registration method in sfloat_init_sort * ENH: Perform descriptor handling in sorting array method for byte order * DOC: Remove redundant versionchanged note in types-and-structures.rst * TST: Add stable and unstable sorting tests for sfloat arrays * ENH: Update stride assertions in sfloat_default_sort_loop for improved data validation * ENH: Add stride assertions in sfloat_default_argsort_loop for improved data validation * ENH: Change descriptor handling in sfloat_sort_resolve_descriptors and sfloat_argsort_resolve_descriptors for improved byte order management * ENH: Manage reference counting for argsort method in sfloat_init_sort to prevent memory leaks * ENH: Add assertion in sfloat_sort_resolve_descriptors to validate descriptor consistency * ENH: Add assertion in sfloat_argsort_resolve_descriptors to validate second descriptor type * ENH: Move error handling to sfloat_sort_get_loop and sfloat_argsort_get_loop for unsupported sort kinds * ENH: Manage reference counting for sort method in sfloat_init_sort to prevent memory leaks * REF: Clarify stride calculation in PyArray_Sort and PyArray_ArgSort * ENH: Add method flag handling in sorting methods to check for GIL * REF: Simplify descr element size retrieval in PyArray_Sort and PyArray_ArgSort * BUG: Replace NULL return with goto fail in PyArray_ArgSort for error handling * REF: Assert instead of handling non-native byteorder in sfloat sorts * BUG: Initialize return value to NULL in PyArray_ArgSort for error handling * REF: Remove descr-specific thread release in favor of pyapi flag in sorting methods * BUG: Fix needcopy logic in _new_sortlike and _new_argsortlike functions * BUG: Remove unnecessary NULL check for odescr in PyArray_ArgSort --------- Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
Supersedes #28516 after @charris defined the new C-API for
PyArray_Sortin #29642.I have kept this light so far, with a sorting table (struct) for the DType API, using the new signature with the context. The new design is untested so far because without
sort_compare, using it directly in say StringDType would be difficult, but happy to move this to parity if this looks right so far considering Chuck's comments. Thank you!