Skip to content

Qwen3 Support#743

Merged
cjpais merged 3 commits intomozilla-ai:mainfrom
cjpais:qwen3-support
May 13, 2025
Merged

Qwen3 Support#743
cjpais merged 3 commits intomozilla-ai:mainfrom
cjpais:qwen3-support

Conversation

@cjpais
Copy link
Copy Markdown
Collaborator

@cjpais cjpais commented Apr 13, 2025

Changes based off of llama.cpp #12828

Adds support for Qwen3 and Qwen3MoE models. It looks like there will be more changes when the models are released.

@cjpais
Copy link
Copy Markdown
Collaborator Author

cjpais commented Apr 30, 2025

@corebonts not sure if you have time but could you review or test this PR? I am struggling to get proper output from the model, but I don't see anything major from the code changes that is incorrect, but I might just not be able to spot it.

@corebonts
Copy link
Copy Markdown
Contributor

I can at least give it a try tomorrow. I hope I don't forget it :)

@cjpais
Copy link
Copy Markdown
Collaborator Author

cjpais commented Apr 30, 2025

thank you so much

@corebonts
Copy link
Copy Markdown
Contributor

I had a quick look, and I also got completely broken response. I tried 0.6B and 30B-A3B models from Ollama and 0.6B bartowski from huggingface (just to double check).
So far I haven't seen anything odd in the code but I will have another look later.

@cjpais
Copy link
Copy Markdown
Collaborator Author

cjpais commented May 1, 2025

Thanks for checking, that matches what I have as well

@corebonts
Copy link
Copy Markdown
Contributor

Sadly I did not have progress :/ I checked again, but I haven't found any problem. I even compared it with qwen2 code, but did not found anything special there, and that is working.

@cjpais
Copy link
Copy Markdown
Collaborator Author

cjpais commented May 1, 2025

Same, I've looked pretty hard and tried having some AI help and still not spotting anything really. Not sure if there is some change upstream that is needed, but I quite frankly don't know what it is

@reneleonhardt
Copy link
Copy Markdown

As long as llamafile has not rebased llama.cpp with minimal changes on top it will be hard to know if the problem is here or upstream.

@cjpais
Copy link
Copy Markdown
Collaborator Author

cjpais commented May 2, 2025

yes

@ikawrakow
Copy link
Copy Markdown
Contributor

They have changed the way self attention is built in mainline llama.cpp and this is why the PR is not working. I think it will be easier to use the model ports in ik_llama.cpp as there the graph is still built in the old way. This is the PR for Qwen/Qwen3-MoE there.

@cjpais
Copy link
Copy Markdown
Collaborator Author

cjpais commented May 12, 2025

Thank you @ikawrakow, I will take a look and give it a try

Edit: I did just copy/paste the necessary changes and it looks like they work with llamafile. Going to review a bit deeper as llm_build_moe_ffn is missing some params in llamafile as compared to ik, either want to port those changes or verify that it works the same

@cjpais cjpais merged commit 51b357b into mozilla-ai:main May 13, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants