-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
I have an array with shape (500000, ) and chunks=2500.
I need to pick the top 5000 elements out of it.
The current topk implementation is very bad for it, as:
- it applies one kernel per chunk of the original input
- concatenates all the outputs of the first pass together into a single chunk
- performs a single, final sort
As my k is greater than the chunk size, effectively step 2 would create a single-chunk array or 500000 points.
The workaround would be to rechunk the array ahead of topk to larger chunks - which is often not the easiest thing to do when you're implementing a general-purpose library call.
As a solution I suggest to add a recursive=False option to both the method and the function; see prototype below:
https://gist.github.com/crusaderky/35d355d8124141aeea82213e0c7bccde
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels