Merged
Conversation
Member
Author
Code reviewFound 3 issues:
llamafile/llamafile/chatbot_cli.cpp Lines 296 to 327 in 1bb0a96
llamafile/llamafile/chatbot_cli.cpp Lines 194 to 222 in 1bb0a96
llamafile/llamafile/chatbot_cli.cpp Lines 306 to 315 in 1bb0a96 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Member
Author
|
Addressed review comments:
|
aittalam
added a commit
that referenced
this pull request
Mar 19, 2026
* Update llama.cpp to a44d7712 and refresh patches * Updated apply-patches and renames * Removed Makefile patch, added as a removal * Refactored patches to be applied in the llama.cpp dir * Fixed apply-patches.sh * Updated tool_server_server.cpp.patch * Updated Makefile to pull llama.cpp deps * Update llama.cpp submodule with dependency submodules * added miniaudio.h.patch * Fixed wrong index in miniaudio patch * Updated patches * Updates to server.cpp * Added cosmocc-override.cmake * Made patches minimal * Removed common.h patch * Added cosmocc 4.0.2 target * Added readme * Updated llama.cpp to commit a44d77126c911d105f7f800c17da21b2a5b112d1 * Updated llama.cpp to commit dbc15a79672e72e0b9c1832adddf3334f5c9229c * Updated patches for newer llama.cpp version * Added miniaudio * Updated patches with common/download.cpp * Updated patches with common/download.cpp * Added extra deps to llama.cpp setup * Moved to using deps from the vendors folder * Removed miniaudio from added files * New BUILD.mk + common/chat.cpp patch * Updated cosmocc to 4.0.2 * Piping od with awk for better compatibility * Renamed miniaudio patch 🤦 * Updated README_0.10.0.md * Moved llama.cpp/common ahead of other deps * Added COSMO to server build * Update build/config.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * Update build/rules.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * First TUI iteration (keeping original llamafile dir for comparison) * Add comment blocks to rule file * Integrated llama.cpp server + TUI tool in the same llamafile tool * Code clean * Disable ggml logging when TUI * Updated README * Refactored code llamafile_new -> llamafile and simplified build * Simplified Makefile, updated README * Fixing uncaught exception on llama.cpp server termination when running from tui * LLAMAFILE_INCS -> INCLUDES to fix -iquote issue * Patching common/log to fix uncaught exception * Updated main, removed unused import, cleaned removeArgs code * Metal support - first iteration (only works on TUI) * Added metal support to standalone llama.cpp * Fixed removeArgs (again) * Workaround for segfault at exit when TUI+metal+server * Improved logging (now sending null callback back to metal) * Make sure standalone llama.cpp builds metal dylib if not present * Updated README_0.10.0.md * Fixed typo in readme * Improved g_interrupted_exit handling on TUI * Moved g_interrupted_exit to cover for both sigint and newline * Improved comments around sleep+exit in server.cpp * Improved LLAMAFILE_TUI documentation in BUILD.mk * Made GPU init message always appear as ephemeral * Added back missing pictures * Improved descriptions * Removed redundant block * Update llama.cpp submodule to f47edb8c19199bf0ab471f80e6a4783f0b43ef81 * Removed src_llama-hparams.cpp as not needed anymore * Updated tools_server_server-queue patch * Updated patch for tools/server/server.cpp * Moved patch * Minor updates to other patches * Updated BUILD.mk for llama.cpp * Updated llamafile code to use new llama.cpp main/server * Updated llama.cpp/BUILD.mk with fixes, mtmd new files, and without main/main * Used LLAMA_EXAMPLE_CLI for TUI params * fix(update-llama-cpp): Use `new_build_wip` as base ref. * Adding zipalign as a submodule (#848) * Added third_party/zipalign as submodule * Updated build for zipalign * Fixes to zipalign paths in Makefile * Fixed BUILD.mk to look for zlib.h into cosmocc's third_party/zlib * Updated creating_llamafiles.md * Updated makefile to also compile zipalign with make -j8 * Patching ggml to fix issues arising with multimodal models * Added reset-repo command to Makefile * Minor fixes to Makefile * Added more examples to creating_llamafiles.md * TUI support for mtmd API (#852) * fix(update-llama-cpp): Use `new_build_wip` as base ref. (#850) * TUI support for mtmd API - first sketch * Improved token counting (using n_pos as in llama.cpp server) * Removed extra logging from mtmd/clip and mtmd-helper * Fixed parsing bug in eval_string, factored out function, added tests --------- Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com> * Added missing tests * Fix minja segfaulting in cosmo build (#858) * Added tests for minja regexp bug and example patch * Built an ad-hoc test for the cosmo build * Avoid updating it in place, do it only when success * Updated test to use actual patched minja code * Add cuda support (#859) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * Fixed cuda.c to copy dso in ~/.llamafile * Fixed BF16->FP16 issue with tinyblas * Updated llamafile.h comments * Add GGML version format validation * Made logging restriction consistent with metal * Free all function pointers in case of error * 862 bug metal dylib compilation (#863) * Suggested patch with -std=c++17 for cpp files * Read GGML_VERSION and GGML_COMMIT from build/config.mk * Fixed GGML_VERSION in BUILD.mk, added comments * Objects cleanup if compile fails, early fail for MAX_METAL_SRCS * Improved error message * Add cpu optimizations (#868) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * First iteration - import tinyblas files, update build, fix sgemm * Updated llamafile_sgemm interface to the one from moderl llama.cpp * Improved CPU ident, option to disable for testing/benchmarks * Added tests * Added IQK kernels for quants + test files * Q8_0 layout bug: test to check hypothesis + fix * Added q8 layout test to build, improved comments * Skills first commit (v0.1.0) (#870) * Fix timeout (#876) * Using wait_for() instead of wait() to avoid 72 mins timeout * tested and not timing out anymore * Fixed circular deps issue with .SECONDEXPANSION * Fix mmap issues when loading bundled model (#882) * Fix first iteration * Improved comments in patched llama-mmap.cpp * Simplified llama-map.cpp patch * Updated ggml/src/gguf.cpp to handle opening ggufs in llamafiles * Load gguf from .gguf, /zip/, @, .llamafile * Properly show think mode in TUI (#885) * Fix first iteration * Using llama.cpp's chat parser in TUI * Extend the approach to all models, similarly to what llama.cpp CLI does * Removed extra newline between initial info and chat format * Addressing reviews (handling interrupts + better logging) * Adding tools to use skill docs as a Claude plugin (#886) * Refactored skill docs to be used as Claude plugin * Added .llamafile_plugin dir with symlinks to docs * Added symlink from docs/AGENT.md to CLAUDE.md * Now you can install docs as a plugin with `/plugin marketplace add ./.llamafile_plugin` and `/plugin install llamafile` * Added tools/generate_patches.sh * Updated README_0.10.0.md * Created llama.cpp.patches/README.md * Minor updates to skills and patches README * skill updated to 0.1.1 (update upstream llama.cpp instructions) (#887) * Updated skill with llama.cpp upstream sync * Added check_patches tool * Update llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 (#888) * Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk * Updated license.cpp with the one generated by cmake in upstream llama.cpp * Updated info about license.cpp in patches' README * Remove minja from tests * Updated refs to minja in docs * Fix templating support for Apertus (#894) * Fixed templating issue with Apertus * Load the PEG parser in chatbot_main if one is provided * Add whisper (#880) * Updated whisper.cpp submodule from v1.6.2-168 (6739eb83) to v1.8.3 (2eeeba56). * Updated patches scripts + removed old patches * Added whisperfile + extra tools (mic2raw, mic2txt, stream, whisper-server) * Added slurp * Updated docs and man pages --------- Co-authored-by: angpt <anushrigupta@gmail.com> * Add support for legacy chat, cli, server modalities (#896) * Add CLI, SERVER, CHAT, and combined modes * Removed log 'path does not exist' * Added server routes to main.cpp * Fixing GPU log callbacks * Added --nothink feature for CLI * Refactored args + cleaned FLAGS * Enabled make cosmocc for any make version * Updated ci.yml to work with new llamafile / zipalign * Llamacpp 7f5ee549683d600ad41db6a295a232cdd2d8eb9f (#901) llama.cpp update, Qwen3.5 Think Mode & CLI Improvements llama.cpp Submodule Update Updated to 7f5ee549683d600ad41db6a295a232cdd2d8eb9f Updated associated patches (removed obsolete vendor_miniaudio_miniaudio.h.patch) Qwen3.5 Think Mode Support Proper handling of think/nothink mode in both chat and CLI modes Uses common_chat_templates_apply() with enable_thinking parameter instead of manually constructing prompts Correctly parses reasoning content using PEG parser with COMMON_REASONING_FORMAT_DEEPSEEK System Prompt Handling Captures -p/--prompt value early in argument parsing (needed for combined mode where server parsing excludes -p) /clear command now properly resets g_pending_file_content to prevent uploaded files from persisting after clear Code Quality Refactored cli_apply_chat_template() to return both prompt and parser params Added documentation comments for subtle pointer lifetime and argument parsing behaviors --------- Co-authored-by: Stuart Henderson <sthen@users.noreply.github.com> (OpenBSD supported versions update) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (llama.cpp submodule update) * Updated Makefile to download cosmocc at end of setup * Added --image-min-tokens to TUI chat (#905) * Integration tests (#906) Adds integration tests for llamafiles: run a pre-built llamafile as well as the plain executable with a model passed as a parameter test tui (piping inputs to the process), server (sending HTTP requestS), cli (passing prompts as params), hybrid modes test plain text, multimodal, and tool calling with ad-hoc prompts test thinking vs no-thinking mode test CPU vs GPU * Added timeout multiplier * Added combined marker, improved combined tests * Added check for GPU presence * Added meaningful temperature test * Fixed platforms where sh is needed * Adding retry logic to server requests * Add Blackwell GPU architecture support for CUDA 13.x (#907) - Add sm_110f (Jetson Thor & family) and sm_121a (DGX Spark GB10) support for aarch64 platforms with CUDA 13.x - Add sm_120f (RTX 5000 series, RTX PRO Blackwell) support for x86_64 platforms with CUDA 13.x - Enable --compress-mode=size for optimized binary size on Blackwell GPUs - Detect CUDA version and host architecture at build time Co-authored-by: wingx <wingenlit@outlook.com> * Fix cuda combined mode (#909) * Implement chat in combined mode as an OpenAI client * Implemented stop_tui in tests * Fixed CLI tests using --verbose * Accept non-utf8 chars in responses * Simplify prompt for t=0 * Added patch for tools/server/server.cpp * Server output > devnull to avoid buffer fill up with --verbose * Add image to cli (#912) * Added back support for --image in CLI tool * Added tests for multimodal cli * Added optional mmproj parameter to TUI tests too * Addressed review comments * Added test to check multiple markers/images on cli * Review docs v0.10.0 (#911) * Updated index.md * Moar updates to index.md * Updated quickstart.md * Updated support + example llamafiles * Added example files and examples + minor fixes * Updated structure * Removed security * Updated source installation * Updated README_0.10.0, now frozen doc * Removed ref to new_build_wip in whisperfile, make setup installs cosmocc * Apply suggestion from @dpoulopoulos Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Addressed review comments * Addressed review comments #2 --------- Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Updated README.md, minor fix to docs/index.md * Minor fixes to install llamafile binary * Open next llama.cpp update PR to main * Updated copyrights * Removed old README, added 'based on' badges * Better version handling (#913) * Improve help (#914) * Added per-mode help + nologo/ascii support * If model is missing, bump to help for respective mode * Update skill docs (#915) * Updated skill not to use new_build_wip + improved it * Removed stray new_build_wip reference * Updated RELEASE.md for v0.10.0 --------- Co-authored-by: Peter Wilson <peter@mozilla.ai> Co-authored-by: daavoo <daviddelaiglesiacastro@gmail.com> Co-authored-by: angpt <anushrigupta@gmail.com> Co-authored-by: wingenlit <63510314+wingenlit@users.noreply.github.com> Co-authored-by: wingx <wingenlit@outlook.com> Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>
aittalam
added a commit
that referenced
this pull request
Mar 26, 2026
* Update llama.cpp to a44d7712 and refresh patches * Updated apply-patches and renames * Removed Makefile patch, added as a removal * Refactored patches to be applied in the llama.cpp dir * Fixed apply-patches.sh * Updated tool_server_server.cpp.patch * Updated Makefile to pull llama.cpp deps * Update llama.cpp submodule with dependency submodules * added miniaudio.h.patch * Fixed wrong index in miniaudio patch * Updated patches * Updates to server.cpp * Added cosmocc-override.cmake * Made patches minimal * Removed common.h patch * Added cosmocc 4.0.2 target * Added readme * Updated llama.cpp to commit a44d77126c911d105f7f800c17da21b2a5b112d1 * Updated llama.cpp to commit dbc15a79672e72e0b9c1832adddf3334f5c9229c * Updated patches for newer llama.cpp version * Added miniaudio * Updated patches with common/download.cpp * Updated patches with common/download.cpp * Added extra deps to llama.cpp setup * Moved to using deps from the vendors folder * Removed miniaudio from added files * New BUILD.mk + common/chat.cpp patch * Updated cosmocc to 4.0.2 * Piping od with awk for better compatibility * Renamed miniaudio patch 🤦 * Updated README_0.10.0.md * Moved llama.cpp/common ahead of other deps * Added COSMO to server build * Update build/config.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * Update build/rules.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * First TUI iteration (keeping original llamafile dir for comparison) * Add comment blocks to rule file * Integrated llama.cpp server + TUI tool in the same llamafile tool * Code clean * Disable ggml logging when TUI * Updated README * Refactored code llamafile_new -> llamafile and simplified build * Simplified Makefile, updated README * Fixing uncaught exception on llama.cpp server termination when running from tui * LLAMAFILE_INCS -> INCLUDES to fix -iquote issue * Patching common/log to fix uncaught exception * Updated main, removed unused import, cleaned removeArgs code * Metal support - first iteration (only works on TUI) * Added metal support to standalone llama.cpp * Fixed removeArgs (again) * Workaround for segfault at exit when TUI+metal+server * Improved logging (now sending null callback back to metal) * Make sure standalone llama.cpp builds metal dylib if not present * Updated README_0.10.0.md * Fixed typo in readme * Improved g_interrupted_exit handling on TUI * Moved g_interrupted_exit to cover for both sigint and newline * Improved comments around sleep+exit in server.cpp * Improved LLAMAFILE_TUI documentation in BUILD.mk * Made GPU init message always appear as ephemeral * Added back missing pictures * Improved descriptions * Removed redundant block * Update llama.cpp submodule to f47edb8c19199bf0ab471f80e6a4783f0b43ef81 * Removed src_llama-hparams.cpp as not needed anymore * Updated tools_server_server-queue patch * Updated patch for tools/server/server.cpp * Moved patch * Minor updates to other patches * Updated BUILD.mk for llama.cpp * Updated llamafile code to use new llama.cpp main/server * Updated llama.cpp/BUILD.mk with fixes, mtmd new files, and without main/main * Used LLAMA_EXAMPLE_CLI for TUI params * fix(update-llama-cpp): Use `new_build_wip` as base ref. * Adding zipalign as a submodule (#848) * Added third_party/zipalign as submodule * Updated build for zipalign * Fixes to zipalign paths in Makefile * Fixed BUILD.mk to look for zlib.h into cosmocc's third_party/zlib * Updated creating_llamafiles.md * Updated makefile to also compile zipalign with make -j8 * Patching ggml to fix issues arising with multimodal models * Added reset-repo command to Makefile * Minor fixes to Makefile * Added more examples to creating_llamafiles.md * TUI support for mtmd API (#852) * fix(update-llama-cpp): Use `new_build_wip` as base ref. (#850) * TUI support for mtmd API - first sketch * Improved token counting (using n_pos as in llama.cpp server) * Removed extra logging from mtmd/clip and mtmd-helper * Fixed parsing bug in eval_string, factored out function, added tests --------- Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com> * Added missing tests * Fix minja segfaulting in cosmo build (#858) * Added tests for minja regexp bug and example patch * Built an ad-hoc test for the cosmo build * Avoid updating it in place, do it only when success * Updated test to use actual patched minja code * Add cuda support (#859) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * Fixed cuda.c to copy dso in ~/.llamafile * Fixed BF16->FP16 issue with tinyblas * Updated llamafile.h comments * Add GGML version format validation * Made logging restriction consistent with metal * Free all function pointers in case of error * 862 bug metal dylib compilation (#863) * Suggested patch with -std=c++17 for cpp files * Read GGML_VERSION and GGML_COMMIT from build/config.mk * Fixed GGML_VERSION in BUILD.mk, added comments * Objects cleanup if compile fails, early fail for MAX_METAL_SRCS * Improved error message * Add cpu optimizations (#868) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * First iteration - import tinyblas files, update build, fix sgemm * Updated llamafile_sgemm interface to the one from moderl llama.cpp * Improved CPU ident, option to disable for testing/benchmarks * Added tests * Added IQK kernels for quants + test files * Q8_0 layout bug: test to check hypothesis + fix * Added q8 layout test to build, improved comments * Skills first commit (v0.1.0) (#870) * Fix timeout (#876) * Using wait_for() instead of wait() to avoid 72 mins timeout * tested and not timing out anymore * Fixed circular deps issue with .SECONDEXPANSION * Fix mmap issues when loading bundled model (#882) * Fix first iteration * Improved comments in patched llama-mmap.cpp * Simplified llama-map.cpp patch * Updated ggml/src/gguf.cpp to handle opening ggufs in llamafiles * Load gguf from .gguf, /zip/, @, .llamafile * Properly show think mode in TUI (#885) * Fix first iteration * Using llama.cpp's chat parser in TUI * Extend the approach to all models, similarly to what llama.cpp CLI does * Removed extra newline between initial info and chat format * Addressing reviews (handling interrupts + better logging) * Adding tools to use skill docs as a Claude plugin (#886) * Refactored skill docs to be used as Claude plugin * Added .llamafile_plugin dir with symlinks to docs * Added symlink from docs/AGENT.md to CLAUDE.md * Now you can install docs as a plugin with `/plugin marketplace add ./.llamafile_plugin` and `/plugin install llamafile` * Added tools/generate_patches.sh * Updated README_0.10.0.md * Created llama.cpp.patches/README.md * Minor updates to skills and patches README * Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 * skill updated to 0.1.1 (update upstream llama.cpp instructions) (#887) * Updated skill with llama.cpp upstream sync * Added check_patches tool * Update llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update llama.cpp.patches/README.md for new patches - Update httplib.h.patch → httplib.cpp.patch (code moved upstream) - Remove minja.hpp.patch (minja replaced with built-in jinja) - Add ngram-mod.cpp.patch documentation - Document common/license.cpp in llamafile-files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * First iteration * Update llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 (#888) * Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk * Updated license.cpp with the one generated by cmake in upstream llama.cpp * Updated info about license.cpp in patches' README * Remove minja from tests * Updated refs to minja in docs * Fix templating support for Apertus (#894) * Fixed templating issue with Apertus * Load the PEG parser in chatbot_main if one is provided * Add whisper (#880) * Updated whisper.cpp submodule from v1.6.2-168 (6739eb83) to v1.8.3 (2eeeba56). * Updated patches scripts + removed old patches * Added whisperfile + extra tools (mic2raw, mic2txt, stream, whisper-server) * Added slurp * Updated docs and man pages --------- Co-authored-by: angpt <anushrigupta@gmail.com> * Update whisper.cpp submodule to v1.8.3 * Refactored common functions in metal, vulkan, cuda into llamafile.c * Add support for legacy chat, cli, server modalities (#896) * Add CLI, SERVER, CHAT, and combined modes * Removed log 'path does not exist' * Added server routes to main.cpp * Fixing GPU log callbacks * Added --nothink feature for CLI * Refactored args + cleaned FLAGS * Enabled make cosmocc for any make version * Updated ci.yml to work with new llamafile / zipalign * Llamacpp 7f5ee549683d600ad41db6a295a232cdd2d8eb9f (#901) llama.cpp update, Qwen3.5 Think Mode & CLI Improvements llama.cpp Submodule Update Updated to 7f5ee549683d600ad41db6a295a232cdd2d8eb9f Updated associated patches (removed obsolete vendor_miniaudio_miniaudio.h.patch) Qwen3.5 Think Mode Support Proper handling of think/nothink mode in both chat and CLI modes Uses common_chat_templates_apply() with enable_thinking parameter instead of manually constructing prompts Correctly parses reasoning content using PEG parser with COMMON_REASONING_FORMAT_DEEPSEEK System Prompt Handling Captures -p/--prompt value early in argument parsing (needed for combined mode where server parsing excludes -p) /clear command now properly resets g_pending_file_content to prevent uploaded files from persisting after clear Code Quality Refactored cli_apply_chat_template() to return both prompt and parser params Added documentation comments for subtle pointer lifetime and argument parsing behaviors --------- Co-authored-by: Stuart Henderson <sthen@users.noreply.github.com> (OpenBSD supported versions update) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (llama.cpp submodule update) * Updated Makefile to download cosmocc at end of setup * Added --image-min-tokens to TUI chat (#905) * Integration tests (#906) Adds integration tests for llamafiles: run a pre-built llamafile as well as the plain executable with a model passed as a parameter test tui (piping inputs to the process), server (sending HTTP requestS), cli (passing prompts as params), hybrid modes test plain text, multimodal, and tool calling with ad-hoc prompts test thinking vs no-thinking mode test CPU vs GPU * Added timeout multiplier * Added combined marker, improved combined tests * Added check for GPU presence * Added meaningful temperature test * Fixed platforms where sh is needed * Adding retry logic to server requests * Add Blackwell GPU architecture support for CUDA 13.x (#907) - Add sm_110f (Jetson Thor & family) and sm_121a (DGX Spark GB10) support for aarch64 platforms with CUDA 13.x - Add sm_120f (RTX 5000 series, RTX PRO Blackwell) support for x86_64 platforms with CUDA 13.x - Enable --compress-mode=size for optimized binary size on Blackwell GPUs - Detect CUDA version and host architecture at build time Co-authored-by: wingx <wingenlit@outlook.com> * Fix cuda combined mode (#909) * Implement chat in combined mode as an OpenAI client * Implemented stop_tui in tests * Fixed CLI tests using --verbose * Accept non-utf8 chars in responses * Simplify prompt for t=0 * Added patch for tools/server/server.cpp * Server output > devnull to avoid buffer fill up with --verbose * Add image to cli (#912) * Added back support for --image in CLI tool * Added tests for multimodal cli * Added optional mmproj parameter to TUI tests too * Addressed review comments * Added test to check multiple markers/images on cli * Review docs v0.10.0 (#911) * Updated index.md * Moar updates to index.md * Updated quickstart.md * Updated support + example llamafiles * Added example files and examples + minor fixes * Updated structure * Removed security * Updated source installation * Updated README_0.10.0, now frozen doc * Removed ref to new_build_wip in whisperfile, make setup installs cosmocc * Apply suggestion from @dpoulopoulos Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Addressed review comments * Addressed review comments #2 --------- Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Updated README.md, minor fix to docs/index.md * Minor fixes to install llamafile binary * Open next llama.cpp update PR to main * Updated copyrights * Removed old README, added 'based on' badges * Better version handling (#913) * Improve help (#914) * Added per-mode help + nologo/ascii support * If model is missing, bump to help for respective mode * Update skill docs (#915) * Updated skill not to use new_build_wip + improved it * Removed stray new_build_wip reference * Fixed --fit issue + fit mmproj size + free mem (#920) --------- Co-authored-by: Peter Wilson <peter@mozilla.ai> Co-authored-by: daavoo <daviddelaiglesiacastro@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: angpt <anushrigupta@gmail.com> Co-authored-by: wingenlit <63510314+wingenlit@users.noreply.github.com> Co-authored-by: wingx <wingenlit@outlook.com> Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR re-introduces
--imageback into CLI mode, so that users can specify one or more images to a multimodal model.