Conversation
ORippler
left a comment
There was a problem hiding this comment.
The non-CUB ARGSORT would fail with 64k+ rows (-ub 8192 with expert group selection) due to storing nrows in the 16-bit y-dimension.
According to official cuda docs, both dim3 and blockIdx resolve to uint3 type, which is a vector of 3 uints that are each stored with 32-bit precision. Did you verify that blockIdx.y is actually stored with 16-bit precision only? Cause that would be a pretty severe bug on CUDA side and I'd be interested in a repro so I can raise this internally
|
I can confirm this fixes my problem mentioned in #16691 (comment). using thanks @CISC! |
|
@jeffbolznv @0cc4m Looks like Vulkan needs a fix too: |
|
Sure, I'll take a look. |
|
Vulkan fix is at #16851. |
The non-CUB
ARGSORTwould fail with 64k+ rows (-ub 8192with expert group selection) due to storingnrowsin the 16-bity-dimension.