Conversation
|
@corebonts not sure if you have time but could you review or test this PR? I am struggling to get proper output from the model, but I don't see anything major from the code changes that is incorrect, but I might just not be able to spot it. |
|
I can at least give it a try tomorrow. I hope I don't forget it :) |
|
thank you so much |
|
I had a quick look, and I also got completely broken response. I tried 0.6B and 30B-A3B models from Ollama and 0.6B bartowski from huggingface (just to double check). |
|
Thanks for checking, that matches what I have as well |
|
Sadly I did not have progress :/ I checked again, but I haven't found any problem. I even compared it with qwen2 code, but did not found anything special there, and that is working. |
|
Same, I've looked pretty hard and tried having some AI help and still not spotting anything really. Not sure if there is some change upstream that is needed, but I quite frankly don't know what it is |
|
As long as llamafile has not rebased llama.cpp with minimal changes on top it will be hard to know if the problem is here or upstream. |
|
yes |
|
They have changed the way self attention is built in mainline |
|
Thank you @ikawrakow, I will take a look and give it a try Edit: I did just copy/paste the necessary changes and it looks like they work with llamafile. Going to review a bit deeper as |
Changes based off of llama.cpp #12828
Adds support for Qwen3 and Qwen3MoE models. It looks like there will be more changes when the models are released.