convert : enable expert group selection for all models with it#16691
convert : enable expert group selection for all models with it#16691
Conversation
|
@jeffbolznv @am17an It would be great to have fusion for this as well. :) Lines 953 to 976 in 84bf3c6 |
Agreed. Let's catch up with the clamping change and get these common code utilities in, then I agree we should do this. |
|
@ggerganov gentle ping |
Kimi-K2 also has it, but it's just 1 group: "n_group": 1,
"topk_group": 1,I've patched my and it's reduced the tg by around 5-6% and pp by around 10%. |
|
My guess is that this doesn't take the topk-moe path because of this change |
Yes, effectively disabling it.
To be expected, should be recoverable by fusion. |
|
Also I didn't expect such a drastic drop in PP due to missing fusion. @jukofyork what is your setup? |
Yeah, I think I may have been wrong about this as it's been a while since I compiled a new version of |
getting a segfault that doesn't occur when I override the keys: This version of |
|
It seems that the crash occurs when The only limit I can see to do with so wonder if it is to do with |
Provided this version has the recent |
Ah, yes, that would be it. |
|
I guess we need a check and fallback to CUB. |
There is another test against llama.cpp/ggml/src/ggml-cuda/argsort.cu Line 191 in 3464bda |
Sure, but that's |
|
I think we can swap the parameters instead of falling back to CUB. |
|
Yep, worked, making a PR. |
|
My issue was fixed by #16849. |
|
Would this affect the output for models that have more than one expert group (i.e. Deepseek V3 with its |
Correct, provided you've updated the GGUF or overridden the metadata. |
Alright, I suppose then it would make things more technically correct in terms of matching the official model implementation... but perhaps I'll wait for fusion to be added lol. |
Enable expert group selection for all models with it (requires reconversion or metadata editing).
Specifically, these models (maybe more):