Conversation
c2d784a to
f1fe325
Compare
ffaab1a to
f656e79
Compare
|
ICYMI, the new gemma3 architecture is on the main branch but it is not supported, I just tried: |
|
@RafaAguilar only llama3 fp16 models are currently working with the MLX backend. I need to refine the loading code to support more models, debug the other model definitions, as well as add quantization support before this can come out of draft state. |
|
Gotcha, if you need testers I would gladly help you. I will try to get into the code, but I'm not sure I have the time to delve into matrixes the following weeks, although I'll try. |
b087393 to
b10df34
Compare
|
I've added some quant compatibility to be able to load more models with Q4_0, Q6_0, and Q8_0 tensors, however they're all converted to FP16 at load time so they don't provide quantization "benefits." I should be able to implement proper Q4 and Q8 support (with the benefits of reduced VRAM usage) once we get the new raw weight model loading implemented. |
Functional implementation on the latest backend and caching code Still has some debugging that needs rebasing/cleanup one unit test fails which still needs work...
The cache still has some bugs.
| ) ml.Tensor { | ||
| a = a.Reshape(ctx, append([]int{1}, a.Shape()...)...).Permute(ctx, 0, 2, 1, 3).(*Array) | ||
| // TODO figure out how to get offset wired up | ||
| offset := 0 |
There was a problem hiding this comment.
Not using positionIDs is probably also part of the issue where things fall apart after a few forward passes. The prompt will get correctly RoPEd since it starts at offset 0 and is all in one batch. However, every token after that will be at position 0 as well.
From what I saw from quickly looking before, it looks like offset is only a scalar in the C interface of MLX but it can be a vector in the Python version, same as we have here:
https://ml-explore.github.io/mlx/build/html/python/_autosummary/mlx.core.fast.rope.html
|
@dhiltgen Thanks for your work, I wonder when will the mlx for ollama come out of box? |
|
I don't know if it will help, but applying the following file in some way may resolve the error in the action: |
|
Really excited about this - thanks for your effort! |
|
Also very excited to see this shipped! |
|
Thank you for Ollama, and looking forward to MLX support. I'd be happy to test on my M4 Pro. |
|
Hi! Are still any plans to integrate MLX support to ollama? |
|
Any updates on this PR? |
|
Any plans to move this forward? |
|
Hi! Waiting for this a lot 🙂 any news? |
|
Any updates? Can't wait for MLX support in Ollama! |
|
Really need mlx support!! Thank youu |
|
Any updates? Can't wait for MLX support in Ollama! |
|
ollama!! c'mon with mlx! YOU CAN DO IT!! |
|
If anyone wants to use MLX while we wait for Ollama natively supporting it, we recently added experimental MLX support to Msty (1) with similar interface to as that of Ollama - downloading and managing MLX models, using them in split chats and in RAG, and much more. |
Sadly this is a big no from me since the project is not open-source. |
|
@CamilleHbp what about Jan, are they MLX compatible? We know that LMStudio is not FOSS but they had a lot of traction. Also how is this taking 6 months for Ollama to adopt, why can't we just get GGUF (with quantization no less) and convert it to something convenient for Mac GPUs? Watching GGUF hogging unified memory is a bit nuts. |
|
Ollama! Come on now! You'll change the game with MLX support. |
|
@pranavkafle even the alternatives are not safe containers/ramalama#2104 |
|
Hey. What's the current status with that? I'm hungry for mlx support :) |
|
we demand mlx supponrt now! if it's possible |
|
Is this something an extra brain and/or hands would help with? |
|
@TG-Techie update the draft to match current codebase, and also put up with "procedural issues". |
|
It would be ideal to have mlx support! |
|
This is now replaced by #13648 which includes an experimental MLX backend and initial image generation support for Mac and Linux. |
Replaces #8490 on main.
Carries #9115 which should merge first.
A few key points:
To see it working:
Then