Conversation
* Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk * Updated license.cpp with the one generated by cmake in upstream llama.cpp * Updated info about license.cpp in patches' README * Remove minja from tests * Updated refs to minja in docs
* Fixed templating issue with Apertus * Load the PEG parser in chatbot_main if one is provided
* Updated whisper.cpp submodule from v1.6.2-168 (6739eb83) to v1.8.3 (2eeeba56). * Updated patches scripts + removed old patches * Added whisperfile + extra tools (mic2raw, mic2txt, stream, whisper-server) * Added slurp * Updated docs and man pages --------- Co-authored-by: angpt <anushrigupta@gmail.com>
* Add CLI, SERVER, CHAT, and combined modes * Removed log 'path does not exist' * Added server routes to main.cpp * Fixing GPU log callbacks * Added --nothink feature for CLI * Refactored args + cleaned FLAGS
llama.cpp update, Qwen3.5 Think Mode & CLI Improvements
llama.cpp Submodule Update
Updated to 7f5ee549683d600ad41db6a295a232cdd2d8eb9f
Updated associated patches (removed obsolete vendor_miniaudio_miniaudio.h.patch)
Qwen3.5 Think Mode Support
Proper handling of think/nothink mode in both chat and CLI modes
Uses common_chat_templates_apply() with enable_thinking parameter instead of manually constructing prompts
Correctly parses reasoning content using PEG parser with COMMON_REASONING_FORMAT_DEEPSEEK
System Prompt Handling
Captures -p/--prompt value early in argument parsing (needed for combined mode where server parsing excludes -p)
/clear command now properly resets g_pending_file_content to prevent uploaded files from persisting after clear
Code Quality
Refactored cli_apply_chat_template() to return both prompt and parser params
Added documentation comments for subtle pointer lifetime and argument parsing behaviors
---------
Co-authored-by: Stuart Henderson <sthen@users.noreply.github.com> (OpenBSD supported versions update)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (llama.cpp submodule update)
Adds integration tests for llamafiles:
run a pre-built llamafile as well as the plain executable with a model passed as a parameter
test tui (piping inputs to the process), server (sending HTTP requestS), cli (passing prompts as params), hybrid modes
test plain text, multimodal, and tool calling with ad-hoc prompts
test thinking vs no-thinking mode
test CPU vs GPU
* Added timeout multiplier
* Added combined marker, improved combined tests
* Added check for GPU presence
* Added meaningful temperature test
* Fixed platforms where sh is needed
* Adding retry logic to server requests
- Add sm_110f (Jetson Thor & family) and sm_121a (DGX Spark GB10) support for aarch64 platforms with CUDA 13.x - Add sm_120f (RTX 5000 series, RTX PRO Blackwell) support for x86_64 platforms with CUDA 13.x - Enable --compress-mode=size for optimized binary size on Blackwell GPUs - Detect CUDA version and host architecture at build time Co-authored-by: wingx <wingenlit@outlook.com>
* Implement chat in combined mode as an OpenAI client * Implemented stop_tui in tests * Fixed CLI tests using --verbose * Accept non-utf8 chars in responses * Simplify prompt for t=0 * Added patch for tools/server/server.cpp * Server output > devnull to avoid buffer fill up with --verbose
* Added back support for --image in CLI tool * Added tests for multimodal cli * Added optional mmproj parameter to TUI tests too * Addressed review comments * Added test to check multiple markers/images on cli
* Updated index.md * Moar updates to index.md * Updated quickstart.md * Updated support + example llamafiles * Added example files and examples + minor fixes * Updated structure * Removed security * Updated source installation * Updated README_0.10.0, now frozen doc * Removed ref to new_build_wip in whisperfile, make setup installs cosmocc * Apply suggestion from @dpoulopoulos Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Addressed review comments * Addressed review comments #2 --------- Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>
* Added per-mode help + nologo/ascii support * If model is missing, bump to help for respective mode
* Updated skill not to use new_build_wip + improved it * Removed stray new_build_wip reference
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A modern version of llamafile, rebuilt with the intention of keeping it aligned with recent versions of llama.cpp