Conversation
* Added back support for --image in CLI tool * Added tests for multimodal cli * Added optional mmproj parameter to TUI tests too * Addressed review comments * Added test to check multiple markers/images on cli
* Updated index.md * Moar updates to index.md * Updated quickstart.md * Updated support + example llamafiles * Added example files and examples + minor fixes * Updated structure * Removed security * Updated source installation * Updated README_0.10.0, now frozen doc * Removed ref to new_build_wip in whisperfile, make setup installs cosmocc * Apply suggestion from @dpoulopoulos Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Addressed review comments * Addressed review comments #2 --------- Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>
* Added per-mode help + nologo/ascii support * If model is missing, bump to help for respective mode
* Updated skill not to use new_build_wip + improved it * Removed stray new_build_wip reference
|
Update: I just merged some code to fix the following:
We should do some tests to try this on different platforms, at least:
@wingenlit if you have some time to try it, let us know how it works with your setup! Happy to iterate on this until we are confident enough this works reliably. |
Wooohoo I love the sparkfile idea 💙 |
Hi, I’ve been on llamafile since tinyblas came out. What I like is how llamafile hides many os/arch complexity, so it’s easy to share with less technical people to run gguf. tinyblas out-of-the-box on gpu gives a nice baseline acceleration. However, to take advantage of platform acceleration (like cublas) without writing a lengthy instruction of how-to, the recompile option is the one important trick. It’s been a good way to get near-full performance while keeping things lightweight. |
Code reviewFound 4 issues:
Lines 140 to 148 in 3a76415 Compare with the CUDA pattern: Lines 165 to 175 in 3a76415
Lines 207 to 212 in 3a76415
Lines 207 to 212 in 3a76415 Compare with the Metal pattern: Lines 661 to 672 in 3a76415 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
(see commit f3d1428) |
|
speed test: model used llama-benchy (0.3.5)
llama.cpp Results GB10: (updated)
native llama.cpp vulkan
AMD MI50:
native llama.cpp vulkan
Reference llamafile CUDA GB10
|
Uuuuh thanks for running this! So the take-home messages are:
This information is golden, thank you so much @wingenlit ! 🙏 |


Brings support for Vulkan dylibs into llamafile.
Tested on:
A Windows build script will be created and tested in a follow-up PR