Add image to cli by aittalam · Pull Request #912 · mozilla-ai/llamafile

aittalam · 2026-03-16T17:17:34Z

This PR re-introduces --image back into CLI mode, so that users can specify one or more images to a multimodal model.

aittalam · 2026-03-16T20:32:11Z

Code review

Found 3 issues:

Missing context-length check for multimodal path. The plain-text branch checks tokens.size() > params.n_ctx and gives a clear "prompt too long" error (line 338), but the has_images branch calls mtmd_helper_eval_chunks with no equivalent guard. A multimodal prompt with large images can silently exceed the context window, producing an opaque error or crash. The TUI implementation in chatbot_eval.cpp performs this check via mtmd_helper_get_n_pos(chunks). The existing // Check context comment at line 337 establishes this as a required step.

Lines 296 to 327 in 1bb0a96

    
           llama_pos n_past = 0; 
        
           if (has_images) { 
        
               // Use mtmd pipeline for multimodal prompt evaluation 
        
               mtmd_input_text text; 
        
               text.text = formatted_prompt.c_str(); 
        
               text.add_special = true; 
        
               text.parse_special = true; 
        
               mtmd::input_chunks chunks(mtmd_input_chunks_init()); 
        
               auto bitmaps_c_ptr = bitmaps.c_ptr(); 
        
               int32_t res = mtmd_tokenize(mtmd_ctx, chunks.ptr.get(), &text, 
        
                                           bitmaps_c_ptr.data(), bitmaps_c_ptr.size()); 
        
               if (res != 0) { 
        
                   fprintf(stderr, "error: failed to tokenize prompt with images (error %d)\n", res); 
        
                   mtmd_free(mtmd_ctx); 
        
                   common_sampler_free(sampler); 
        
                   llama_free(ctx); 
        
                   llama_model_free(model); 
        
                   return 6; 
        
               } 
        
               llama_pos new_n_past = 0; 
        
               if (mtmd_helper_eval_chunks(mtmd_ctx, ctx, chunks.ptr.get(), 
        
                                           0, 0, params.n_batch, true, &new_n_past)) { 
        
                   fprintf(stderr, "error: failed to evaluate prompt with images\n"); 
        
                   mtmd_free(mtmd_ctx); 
        
                   common_sampler_free(sampler); 
        
                   llama_free(ctx); 
        
                   llama_model_free(model); 
        
                   return 6; 
        
               } 
        
               n_past = new_n_past;

--mmproj without --image silently loads the vision model and proceeds. If the user passes --mmproj mmproj.gguf without any --image, the code loads the vision model (lines 197-215), sets has_images = false, and runs the plain text path -- wasting GPU/CPU memory with no warning. This is inconsistent with the opposite case (--image without --mmproj) which immediately errors at line 216-222.

llamafile/llamafile/chatbot_cli.cpp

Lines 194 to 222 in 1bb0a96

    
           // Initialize multimodal context if mmproj is specified 
        
           mtmd_context *mtmd_ctx = nullptr; 
        
           bool has_images = !params.image.empty(); 
        
           if (!params.mmproj.path.empty()) { 
        
               mtmd_context_params mparams = mtmd_context_params_default(); 
        
               mparams.use_gpu = params.mmproj_use_gpu; 
        
               mparams.n_threads = params.cpuparams.n_threads; 
        
               mparams.print_timings = false; 
        
               mparams.flash_attn_type = params.flash_attn_type; 
        
               mparams.warmup = params.warmup; 
        
               mparams.image_min_tokens = params.image_min_tokens; 
        
               mparams.image_max_tokens = params.image_max_tokens; 
        
               mtmd_helper_log_set((ggml_log_callback)llamafile_log_callback_null, NULL); 
        
               mtmd_ctx = mtmd_init_from_file(params.mmproj.path.c_str(), model, mparams); 
        
               if (!mtmd_ctx) { 
        
                   fprintf(stderr, "error: failed to load vision model: %s\n", 
        
                           params.mmproj.path.c_str()); 
        
                   common_sampler_free(sampler); 
        
                   llama_free(ctx); 
        
                   llama_model_free(model); 
        
                   return 5; 
        
               } 
        
           } else if (has_images) { 
        
               fprintf(stderr, "error: --image requires --mmproj to specify a vision model\n"); 
        
               common_sampler_free(sampler); 
        
               llama_free(ctx); 
        
               llama_model_free(model); 
        
               return 5; 
        
           }

Missing specific error messages for mtmd_tokenize failures. The TUI (chatbot_eval.cpp) provides actionable messages for different error codes (res == 1 -> "number of images doesn't match number of markers", res == 2 -> "image preprocessing error"), but the CLI only emits a generic "error: failed to tokenize prompt with images (error %d)". Users hitting these errors get an unhelpful numeric code instead of an actionable message.

llamafile/llamafile/chatbot_cli.cpp

Lines 306 to 315 in 1bb0a96

    
           int32_t res = mtmd_tokenize(mtmd_ctx, chunks.ptr.get(), &text, 
        
                                       bitmaps_c_ptr.data(), bitmaps_c_ptr.size()); 
        
           if (res != 0) { 
        
               fprintf(stderr, "error: failed to tokenize prompt with images (error %d)\n", res); 
        
               mtmd_free(mtmd_ctx); 
        
               common_sampler_free(sampler); 
        
               llama_free(ctx); 
        
               llama_model_free(model); 
        
               return 6; 
        
           }

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

aittalam · 2026-03-17T08:30:29Z

Addressed review comments:

Context-length check for multimodal path (950534a) — Added a guard using mtmd_helper_get_n_tokens() before calling mtmd_helper_eval_chunks(). On overflow, the error now shows a text/image token breakdown with a hint about --image-max-tokens. Also extracted repeated cleanup code into a cleanup() helper.
--mmproj without --image (7802835) — Vision model loading is now skipped when no --image is provided, avoiding wasted memory. Note that as pre-bundled llamafiles for multimodal models will come with mmproj weights, it does not make a lot of sense to warn users whenever they don't use them to describe images. For this reason, a LOG_INF is used for debug purposes, but nothing is show to users.
Specific mtmd_tokenize error messages (a8351ca) — Error codes 1 and 2 now produce actionable messages matching the TUI implementation, instead of a generic numeric code. This can be manually tested with e.g. ./o/llamafile/llamafile -m ~/zipaligner_files/Qwen3.5-4B-Q5_K_S.gguf --mmproj ~/zipaligner_files/302b92d565080b9cc0281186979ae75a7429ec23d14f6f7607a035539b21f3a6.gguf -ngl 9999 --cli --image ~/Downloads/img01.jpg --image ~/Downloads/img02.jpg -p "<__media__> Describe the image" --ctx-size 8192. Added integration tests for multi-image markers and marker mismatch (9224973).

@dpoulopoulos

* Update llama.cpp to a44d7712 and refresh patches * Updated apply-patches and renames * Removed Makefile patch, added as a removal * Refactored patches to be applied in the llama.cpp dir * Fixed apply-patches.sh * Updated tool_server_server.cpp.patch * Updated Makefile to pull llama.cpp deps * Update llama.cpp submodule with dependency submodules * added miniaudio.h.patch * Fixed wrong index in miniaudio patch * Updated patches * Updates to server.cpp * Added cosmocc-override.cmake * Made patches minimal * Removed common.h patch * Added cosmocc 4.0.2 target * Added readme * Updated llama.cpp to commit a44d77126c911d105f7f800c17da21b2a5b112d1 * Updated llama.cpp to commit dbc15a79672e72e0b9c1832adddf3334f5c9229c * Updated patches for newer llama.cpp version * Added miniaudio * Updated patches with common/download.cpp * Updated patches with common/download.cpp * Added extra deps to llama.cpp setup * Moved to using deps from the vendors folder * Removed miniaudio from added files * New BUILD.mk + common/chat.cpp patch * Updated cosmocc to 4.0.2 * Piping od with awk for better compatibility * Renamed miniaudio patch 🤦 * Updated README_0.10.0.md * Moved llama.cpp/common ahead of other deps * Added COSMO to server build * Update build/config.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * Update build/rules.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * First TUI iteration (keeping original llamafile dir for comparison) * Add comment blocks to rule file * Integrated llama.cpp server + TUI tool in the same llamafile tool * Code clean * Disable ggml logging when TUI * Updated README * Refactored code llamafile_new -> llamafile and simplified build * Simplified Makefile, updated README * Fixing uncaught exception on llama.cpp server termination when running from tui * LLAMAFILE_INCS -> INCLUDES to fix -iquote issue * Patching common/log to fix uncaught exception * Updated main, removed unused import, cleaned removeArgs code * Metal support - first iteration (only works on TUI) * Added metal support to standalone llama.cpp * Fixed removeArgs (again) * Workaround for segfault at exit when TUI+metal+server * Improved logging (now sending null callback back to metal) * Make sure standalone llama.cpp builds metal dylib if not present * Updated README_0.10.0.md * Fixed typo in readme * Improved g_interrupted_exit handling on TUI * Moved g_interrupted_exit to cover for both sigint and newline * Improved comments around sleep+exit in server.cpp * Improved LLAMAFILE_TUI documentation in BUILD.mk * Made GPU init message always appear as ephemeral * Added back missing pictures * Improved descriptions * Removed redundant block * Update llama.cpp submodule to f47edb8c19199bf0ab471f80e6a4783f0b43ef81 * Removed src_llama-hparams.cpp as not needed anymore * Updated tools_server_server-queue patch * Updated patch for tools/server/server.cpp * Moved patch * Minor updates to other patches * Updated BUILD.mk for llama.cpp * Updated llamafile code to use new llama.cpp main/server * Updated llama.cpp/BUILD.mk with fixes, mtmd new files, and without main/main * Used LLAMA_EXAMPLE_CLI for TUI params * fix(update-llama-cpp): Use `new_build_wip` as base ref. * Adding zipalign as a submodule (#848) * Added third_party/zipalign as submodule * Updated build for zipalign * Fixes to zipalign paths in Makefile * Fixed BUILD.mk to look for zlib.h into cosmocc's third_party/zlib * Updated creating_llamafiles.md * Updated makefile to also compile zipalign with make -j8 * Patching ggml to fix issues arising with multimodal models * Added reset-repo command to Makefile * Minor fixes to Makefile * Added more examples to creating_llamafiles.md * TUI support for mtmd API (#852) * fix(update-llama-cpp): Use `new_build_wip` as base ref. (#850) * TUI support for mtmd API - first sketch * Improved token counting (using n_pos as in llama.cpp server) * Removed extra logging from mtmd/clip and mtmd-helper * Fixed parsing bug in eval_string, factored out function, added tests --------- Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com> * Added missing tests * Fix minja segfaulting in cosmo build (#858) * Added tests for minja regexp bug and example patch * Built an ad-hoc test for the cosmo build * Avoid updating it in place, do it only when success * Updated test to use actual patched minja code * Add cuda support (#859) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * Fixed cuda.c to copy dso in ~/.llamafile * Fixed BF16->FP16 issue with tinyblas * Updated llamafile.h comments * Add GGML version format validation * Made logging restriction consistent with metal * Free all function pointers in case of error * 862 bug metal dylib compilation (#863) * Suggested patch with -std=c++17 for cpp files * Read GGML_VERSION and GGML_COMMIT from build/config.mk * Fixed GGML_VERSION in BUILD.mk, added comments * Objects cleanup if compile fails, early fail for MAX_METAL_SRCS * Improved error message * Add cpu optimizations (#868) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * First iteration - import tinyblas files, update build, fix sgemm * Updated llamafile_sgemm interface to the one from moderl llama.cpp * Improved CPU ident, option to disable for testing/benchmarks * Added tests * Added IQK kernels for quants + test files * Q8_0 layout bug: test to check hypothesis + fix * Added q8 layout test to build, improved comments * Skills first commit (v0.1.0) (#870) * Fix timeout (#876) * Using wait_for() instead of wait() to avoid 72 mins timeout * tested and not timing out anymore * Fixed circular deps issue with .SECONDEXPANSION * Fix mmap issues when loading bundled model (#882) * Fix first iteration * Improved comments in patched llama-mmap.cpp * Simplified llama-map.cpp patch * Updated ggml/src/gguf.cpp to handle opening ggufs in llamafiles * Load gguf from .gguf, /zip/, @, .llamafile * Properly show think mode in TUI (#885) * Fix first iteration * Using llama.cpp's chat parser in TUI * Extend the approach to all models, similarly to what llama.cpp CLI does * Removed extra newline between initial info and chat format * Addressing reviews (handling interrupts + better logging) * Adding tools to use skill docs as a Claude plugin (#886) * Refactored skill docs to be used as Claude plugin * Added .llamafile_plugin dir with symlinks to docs * Added symlink from docs/AGENT.md to CLAUDE.md * Now you can install docs as a plugin with `/plugin marketplace add ./.llamafile_plugin` and `/plugin install llamafile` * Added tools/generate_patches.sh * Updated README_0.10.0.md * Created llama.cpp.patches/README.md * Minor updates to skills and patches README * skill updated to 0.1.1 (update upstream llama.cpp instructions) (#887) * Updated skill with llama.cpp upstream sync * Added check_patches tool * Update llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 (#888) * Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk * Updated license.cpp with the one generated by cmake in upstream llama.cpp * Updated info about license.cpp in patches' README * Remove minja from tests * Updated refs to minja in docs * Fix templating support for Apertus (#894) * Fixed templating issue with Apertus * Load the PEG parser in chatbot_main if one is provided * Add whisper (#880) * Updated whisper.cpp submodule from v1.6.2-168 (6739eb83) to v1.8.3 (2eeeba56). * Updated patches scripts + removed old patches * Added whisperfile + extra tools (mic2raw, mic2txt, stream, whisper-server) * Added slurp * Updated docs and man pages --------- Co-authored-by: angpt <anushrigupta@gmail.com> * Add support for legacy chat, cli, server modalities (#896) * Add CLI, SERVER, CHAT, and combined modes * Removed log 'path does not exist' * Added server routes to main.cpp * Fixing GPU log callbacks * Added --nothink feature for CLI * Refactored args + cleaned FLAGS * Enabled make cosmocc for any make version * Updated ci.yml to work with new llamafile / zipalign * Llamacpp 7f5ee549683d600ad41db6a295a232cdd2d8eb9f (#901) llama.cpp update, Qwen3.5 Think Mode & CLI Improvements llama.cpp Submodule Update Updated to 7f5ee549683d600ad41db6a295a232cdd2d8eb9f Updated associated patches (removed obsolete vendor_miniaudio_miniaudio.h.patch) Qwen3.5 Think Mode Support Proper handling of think/nothink mode in both chat and CLI modes Uses common_chat_templates_apply() with enable_thinking parameter instead of manually constructing prompts Correctly parses reasoning content using PEG parser with COMMON_REASONING_FORMAT_DEEPSEEK System Prompt Handling Captures -p/--prompt value early in argument parsing (needed for combined mode where server parsing excludes -p) /clear command now properly resets g_pending_file_content to prevent uploaded files from persisting after clear Code Quality Refactored cli_apply_chat_template() to return both prompt and parser params Added documentation comments for subtle pointer lifetime and argument parsing behaviors --------- Co-authored-by: Stuart Henderson <sthen@users.noreply.github.com> (OpenBSD supported versions update) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (llama.cpp submodule update) * Updated Makefile to download cosmocc at end of setup * Added --image-min-tokens to TUI chat (#905) * Integration tests (#906) Adds integration tests for llamafiles: run a pre-built llamafile as well as the plain executable with a model passed as a parameter test tui (piping inputs to the process), server (sending HTTP requestS), cli (passing prompts as params), hybrid modes test plain text, multimodal, and tool calling with ad-hoc prompts test thinking vs no-thinking mode test CPU vs GPU * Added timeout multiplier * Added combined marker, improved combined tests * Added check for GPU presence * Added meaningful temperature test * Fixed platforms where sh is needed * Adding retry logic to server requests * Add Blackwell GPU architecture support for CUDA 13.x (#907) - Add sm_110f (Jetson Thor & family) and sm_121a (DGX Spark GB10) support for aarch64 platforms with CUDA 13.x - Add sm_120f (RTX 5000 series, RTX PRO Blackwell) support for x86_64 platforms with CUDA 13.x - Enable --compress-mode=size for optimized binary size on Blackwell GPUs - Detect CUDA version and host architecture at build time Co-authored-by: wingx <wingenlit@outlook.com> * Fix cuda combined mode (#909) * Implement chat in combined mode as an OpenAI client * Implemented stop_tui in tests * Fixed CLI tests using --verbose * Accept non-utf8 chars in responses * Simplify prompt for t=0 * Added patch for tools/server/server.cpp * Server output > devnull to avoid buffer fill up with --verbose * Add image to cli (#912) * Added back support for --image in CLI tool * Added tests for multimodal cli * Added optional mmproj parameter to TUI tests too * Addressed review comments * Added test to check multiple markers/images on cli * Review docs v0.10.0 (#911) * Updated index.md * Moar updates to index.md * Updated quickstart.md * Updated support + example llamafiles * Added example files and examples + minor fixes * Updated structure * Removed security * Updated source installation * Updated README_0.10.0, now frozen doc * Removed ref to new_build_wip in whisperfile, make setup installs cosmocc * Apply suggestion from @dpoulopoulos Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Addressed review comments * Addressed review comments #2 --------- Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Updated README.md, minor fix to docs/index.md * Minor fixes to install llamafile binary * Open next llama.cpp update PR to main * Updated copyrights * Removed old README, added 'based on' badges * Better version handling (#913) * Improve help (#914) * Added per-mode help + nologo/ascii support * If model is missing, bump to help for respective mode * Update skill docs (#915) * Updated skill not to use new_build_wip + improved it * Removed stray new_build_wip reference * Updated RELEASE.md for v0.10.0 --------- Co-authored-by: Peter Wilson <peter@mozilla.ai> Co-authored-by: daavoo <daviddelaiglesiacastro@gmail.com> Co-authored-by: angpt <anushrigupta@gmail.com> Co-authored-by: wingenlit <63510314+wingenlit@users.noreply.github.com> Co-authored-by: wingx <wingenlit@outlook.com> Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>

@dpoulopoulos

* Update llama.cpp to a44d7712 and refresh patches * Updated apply-patches and renames * Removed Makefile patch, added as a removal * Refactored patches to be applied in the llama.cpp dir * Fixed apply-patches.sh * Updated tool_server_server.cpp.patch * Updated Makefile to pull llama.cpp deps * Update llama.cpp submodule with dependency submodules * added miniaudio.h.patch * Fixed wrong index in miniaudio patch * Updated patches * Updates to server.cpp * Added cosmocc-override.cmake * Made patches minimal * Removed common.h patch * Added cosmocc 4.0.2 target * Added readme * Updated llama.cpp to commit a44d77126c911d105f7f800c17da21b2a5b112d1 * Updated llama.cpp to commit dbc15a79672e72e0b9c1832adddf3334f5c9229c * Updated patches for newer llama.cpp version * Added miniaudio * Updated patches with common/download.cpp * Updated patches with common/download.cpp * Added extra deps to llama.cpp setup * Moved to using deps from the vendors folder * Removed miniaudio from added files * New BUILD.mk + common/chat.cpp patch * Updated cosmocc to 4.0.2 * Piping od with awk for better compatibility * Renamed miniaudio patch 🤦 * Updated README_0.10.0.md * Moved llama.cpp/common ahead of other deps * Added COSMO to server build * Update build/config.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * Update build/rules.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * First TUI iteration (keeping original llamafile dir for comparison) * Add comment blocks to rule file * Integrated llama.cpp server + TUI tool in the same llamafile tool * Code clean * Disable ggml logging when TUI * Updated README * Refactored code llamafile_new -> llamafile and simplified build * Simplified Makefile, updated README * Fixing uncaught exception on llama.cpp server termination when running from tui * LLAMAFILE_INCS -> INCLUDES to fix -iquote issue * Patching common/log to fix uncaught exception * Updated main, removed unused import, cleaned removeArgs code * Metal support - first iteration (only works on TUI) * Added metal support to standalone llama.cpp * Fixed removeArgs (again) * Workaround for segfault at exit when TUI+metal+server * Improved logging (now sending null callback back to metal) * Make sure standalone llama.cpp builds metal dylib if not present * Updated README_0.10.0.md * Fixed typo in readme * Improved g_interrupted_exit handling on TUI * Moved g_interrupted_exit to cover for both sigint and newline * Improved comments around sleep+exit in server.cpp * Improved LLAMAFILE_TUI documentation in BUILD.mk * Made GPU init message always appear as ephemeral * Added back missing pictures * Improved descriptions * Removed redundant block * Update llama.cpp submodule to f47edb8c19199bf0ab471f80e6a4783f0b43ef81 * Removed src_llama-hparams.cpp as not needed anymore * Updated tools_server_server-queue patch * Updated patch for tools/server/server.cpp * Moved patch * Minor updates to other patches * Updated BUILD.mk for llama.cpp * Updated llamafile code to use new llama.cpp main/server * Updated llama.cpp/BUILD.mk with fixes, mtmd new files, and without main/main * Used LLAMA_EXAMPLE_CLI for TUI params * fix(update-llama-cpp): Use `new_build_wip` as base ref. * Adding zipalign as a submodule (#848) * Added third_party/zipalign as submodule * Updated build for zipalign * Fixes to zipalign paths in Makefile * Fixed BUILD.mk to look for zlib.h into cosmocc's third_party/zlib * Updated creating_llamafiles.md * Updated makefile to also compile zipalign with make -j8 * Patching ggml to fix issues arising with multimodal models * Added reset-repo command to Makefile * Minor fixes to Makefile * Added more examples to creating_llamafiles.md * TUI support for mtmd API (#852) * fix(update-llama-cpp): Use `new_build_wip` as base ref. (#850) * TUI support for mtmd API - first sketch * Improved token counting (using n_pos as in llama.cpp server) * Removed extra logging from mtmd/clip and mtmd-helper * Fixed parsing bug in eval_string, factored out function, added tests --------- Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com> * Added missing tests * Fix minja segfaulting in cosmo build (#858) * Added tests for minja regexp bug and example patch * Built an ad-hoc test for the cosmo build * Avoid updating it in place, do it only when success * Updated test to use actual patched minja code * Add cuda support (#859) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * Fixed cuda.c to copy dso in ~/.llamafile * Fixed BF16->FP16 issue with tinyblas * Updated llamafile.h comments * Add GGML version format validation * Made logging restriction consistent with metal * Free all function pointers in case of error * 862 bug metal dylib compilation (#863) * Suggested patch with -std=c++17 for cpp files * Read GGML_VERSION and GGML_COMMIT from build/config.mk * Fixed GGML_VERSION in BUILD.mk, added comments * Objects cleanup if compile fails, early fail for MAX_METAL_SRCS * Improved error message * Add cpu optimizations (#868) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * First iteration - import tinyblas files, update build, fix sgemm * Updated llamafile_sgemm interface to the one from moderl llama.cpp * Improved CPU ident, option to disable for testing/benchmarks * Added tests * Added IQK kernels for quants + test files * Q8_0 layout bug: test to check hypothesis + fix * Added q8 layout test to build, improved comments * Skills first commit (v0.1.0) (#870) * Fix timeout (#876) * Using wait_for() instead of wait() to avoid 72 mins timeout * tested and not timing out anymore * Fixed circular deps issue with .SECONDEXPANSION * Fix mmap issues when loading bundled model (#882) * Fix first iteration * Improved comments in patched llama-mmap.cpp * Simplified llama-map.cpp patch * Updated ggml/src/gguf.cpp to handle opening ggufs in llamafiles * Load gguf from .gguf, /zip/, @, .llamafile * Properly show think mode in TUI (#885) * Fix first iteration * Using llama.cpp's chat parser in TUI * Extend the approach to all models, similarly to what llama.cpp CLI does * Removed extra newline between initial info and chat format * Addressing reviews (handling interrupts + better logging) * Adding tools to use skill docs as a Claude plugin (#886) * Refactored skill docs to be used as Claude plugin * Added .llamafile_plugin dir with symlinks to docs * Added symlink from docs/AGENT.md to CLAUDE.md * Now you can install docs as a plugin with `/plugin marketplace add ./.llamafile_plugin` and `/plugin install llamafile` * Added tools/generate_patches.sh * Updated README_0.10.0.md * Created llama.cpp.patches/README.md * Minor updates to skills and patches README * Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 * skill updated to 0.1.1 (update upstream llama.cpp instructions) (#887) * Updated skill with llama.cpp upstream sync * Added check_patches tool * Update llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update llama.cpp.patches/README.md for new patches - Update httplib.h.patch → httplib.cpp.patch (code moved upstream) - Remove minja.hpp.patch (minja replaced with built-in jinja) - Add ngram-mod.cpp.patch documentation - Document common/license.cpp in llamafile-files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * First iteration * Update llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 (#888) * Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk * Updated license.cpp with the one generated by cmake in upstream llama.cpp * Updated info about license.cpp in patches' README * Remove minja from tests * Updated refs to minja in docs * Fix templating support for Apertus (#894) * Fixed templating issue with Apertus * Load the PEG parser in chatbot_main if one is provided * Add whisper (#880) * Updated whisper.cpp submodule from v1.6.2-168 (6739eb83) to v1.8.3 (2eeeba56). * Updated patches scripts + removed old patches * Added whisperfile + extra tools (mic2raw, mic2txt, stream, whisper-server) * Added slurp * Updated docs and man pages --------- Co-authored-by: angpt <anushrigupta@gmail.com> * Update whisper.cpp submodule to v1.8.3 * Refactored common functions in metal, vulkan, cuda into llamafile.c * Add support for legacy chat, cli, server modalities (#896) * Add CLI, SERVER, CHAT, and combined modes * Removed log 'path does not exist' * Added server routes to main.cpp * Fixing GPU log callbacks * Added --nothink feature for CLI * Refactored args + cleaned FLAGS * Enabled make cosmocc for any make version * Updated ci.yml to work with new llamafile / zipalign * Llamacpp 7f5ee549683d600ad41db6a295a232cdd2d8eb9f (#901) llama.cpp update, Qwen3.5 Think Mode & CLI Improvements llama.cpp Submodule Update Updated to 7f5ee549683d600ad41db6a295a232cdd2d8eb9f Updated associated patches (removed obsolete vendor_miniaudio_miniaudio.h.patch) Qwen3.5 Think Mode Support Proper handling of think/nothink mode in both chat and CLI modes Uses common_chat_templates_apply() with enable_thinking parameter instead of manually constructing prompts Correctly parses reasoning content using PEG parser with COMMON_REASONING_FORMAT_DEEPSEEK System Prompt Handling Captures -p/--prompt value early in argument parsing (needed for combined mode where server parsing excludes -p) /clear command now properly resets g_pending_file_content to prevent uploaded files from persisting after clear Code Quality Refactored cli_apply_chat_template() to return both prompt and parser params Added documentation comments for subtle pointer lifetime and argument parsing behaviors --------- Co-authored-by: Stuart Henderson <sthen@users.noreply.github.com> (OpenBSD supported versions update) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (llama.cpp submodule update) * Updated Makefile to download cosmocc at end of setup * Added --image-min-tokens to TUI chat (#905) * Integration tests (#906) Adds integration tests for llamafiles: run a pre-built llamafile as well as the plain executable with a model passed as a parameter test tui (piping inputs to the process), server (sending HTTP requestS), cli (passing prompts as params), hybrid modes test plain text, multimodal, and tool calling with ad-hoc prompts test thinking vs no-thinking mode test CPU vs GPU * Added timeout multiplier * Added combined marker, improved combined tests * Added check for GPU presence * Added meaningful temperature test * Fixed platforms where sh is needed * Adding retry logic to server requests * Add Blackwell GPU architecture support for CUDA 13.x (#907) - Add sm_110f (Jetson Thor & family) and sm_121a (DGX Spark GB10) support for aarch64 platforms with CUDA 13.x - Add sm_120f (RTX 5000 series, RTX PRO Blackwell) support for x86_64 platforms with CUDA 13.x - Enable --compress-mode=size for optimized binary size on Blackwell GPUs - Detect CUDA version and host architecture at build time Co-authored-by: wingx <wingenlit@outlook.com> * Fix cuda combined mode (#909) * Implement chat in combined mode as an OpenAI client * Implemented stop_tui in tests * Fixed CLI tests using --verbose * Accept non-utf8 chars in responses * Simplify prompt for t=0 * Added patch for tools/server/server.cpp * Server output > devnull to avoid buffer fill up with --verbose * Add image to cli (#912) * Added back support for --image in CLI tool * Added tests for multimodal cli * Added optional mmproj parameter to TUI tests too * Addressed review comments * Added test to check multiple markers/images on cli * Review docs v0.10.0 (#911) * Updated index.md * Moar updates to index.md * Updated quickstart.md * Updated support + example llamafiles * Added example files and examples + minor fixes * Updated structure * Removed security * Updated source installation * Updated README_0.10.0, now frozen doc * Removed ref to new_build_wip in whisperfile, make setup installs cosmocc * Apply suggestion from @dpoulopoulos Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Addressed review comments * Addressed review comments #2 --------- Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Updated README.md, minor fix to docs/index.md * Minor fixes to install llamafile binary * Open next llama.cpp update PR to main * Updated copyrights * Removed old README, added 'based on' badges * Better version handling (#913) * Improve help (#914) * Added per-mode help + nologo/ascii support * If model is missing, bump to help for respective mode * Update skill docs (#915) * Updated skill not to use new_build_wip + improved it * Removed stray new_build_wip reference * Fixed --fit issue + fit mmproj size + free mem (#920) --------- Co-authored-by: Peter Wilson <peter@mozilla.ai> Co-authored-by: daavoo <daviddelaiglesiacastro@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: angpt <anushrigupta@gmail.com> Co-authored-by: wingenlit <63510314+wingenlit@users.noreply.github.com> Co-authored-by: wingx <wingenlit@outlook.com> Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>

Added back support for --image in CLI tool

f036f06

github-actions bot added documentation devops llamafile testing labels Mar 16, 2026

aittalam changed the base branch from main to new_build_wip March 16, 2026 17:18

aittalam added 2 commits March 16, 2026 17:34

Added tests for multimodal cli

454cf7e

Added optional mmproj parameter to TUI tests too

1bb0a96

aittalam self-assigned this Mar 16, 2026

aittalam marked this pull request as ready for review March 16, 2026 20:06

aittalam added 4 commits March 17, 2026 07:37

Addressed review comment 1

950534a

Addressed review comment #2

7802835

Addressed review comment #3

a8351ca

Added test to check multiple markers/images on cli

9224973

aittalam merged commit 0b19932 into new_build_wip Mar 17, 2026
1 check passed

aittalam deleted the add-image-to-cli branch March 17, 2026 08:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add image to cli#912

Add image to cli#912
aittalam merged 7 commits intonew_build_wipfrom
add-image-to-cli

aittalam commented Mar 16, 2026 •

edited

Loading

Uh oh!

aittalam commented Mar 16, 2026

Uh oh!

aittalam commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aittalam commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aittalam commented Mar 16, 2026

Code review

Uh oh!

aittalam commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aittalam commented Mar 16, 2026 •

edited

Loading