llamafile reloaded (v0.10.0) by aittalam · Pull Request #867 · mozilla-ai/llamafile

aittalam · 2026-01-23T17:48:50Z

A modern version of llamafile, rebuilt with the intention of keeping it aligned with recent versions of llama.cpp

* Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk * Updated license.cpp with the one generated by cmake in upstream llama.cpp * Updated info about license.cpp in patches' README * Remove minja from tests * Updated refs to minja in docs

* Fixed templating issue with Apertus * Load the PEG parser in chatbot_main if one is provided

* Updated whisper.cpp submodule from v1.6.2-168 (6739eb83) to v1.8.3 (2eeeba56). * Updated patches scripts + removed old patches * Added whisperfile + extra tools (mic2raw, mic2txt, stream, whisper-server) * Added slurp * Updated docs and man pages --------- Co-authored-by: angpt <anushrigupta@gmail.com>

* Add CLI, SERVER, CHAT, and combined modes * Removed log 'path does not exist' * Added server routes to main.cpp * Fixing GPU log callbacks * Added --nothink feature for CLI * Refactored args + cleaned FLAGS

llama.cpp update, Qwen3.5 Think Mode & CLI Improvements llama.cpp Submodule Update Updated to 7f5ee549683d600ad41db6a295a232cdd2d8eb9f Updated associated patches (removed obsolete vendor_miniaudio_miniaudio.h.patch) Qwen3.5 Think Mode Support Proper handling of think/nothink mode in both chat and CLI modes Uses common_chat_templates_apply() with enable_thinking parameter instead of manually constructing prompts Correctly parses reasoning content using PEG parser with COMMON_REASONING_FORMAT_DEEPSEEK System Prompt Handling Captures -p/--prompt value early in argument parsing (needed for combined mode where server parsing excludes -p) /clear command now properly resets g_pending_file_content to prevent uploaded files from persisting after clear Code Quality Refactored cli_apply_chat_template() to return both prompt and parser params Added documentation comments for subtle pointer lifetime and argument parsing behaviors --------- Co-authored-by: Stuart Henderson <sthen@users.noreply.github.com> (OpenBSD supported versions update) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (llama.cpp submodule update)

…new_build_wip

Adds integration tests for llamafiles: run a pre-built llamafile as well as the plain executable with a model passed as a parameter test tui (piping inputs to the process), server (sending HTTP requestS), cli (passing prompts as params), hybrid modes test plain text, multimodal, and tool calling with ad-hoc prompts test thinking vs no-thinking mode test CPU vs GPU * Added timeout multiplier * Added combined marker, improved combined tests * Added check for GPU presence * Added meaningful temperature test * Fixed platforms where sh is needed * Adding retry logic to server requests

- Add sm_110f (Jetson Thor & family) and sm_121a (DGX Spark GB10) support for aarch64 platforms with CUDA 13.x - Add sm_120f (RTX 5000 series, RTX PRO Blackwell) support for x86_64 platforms with CUDA 13.x - Enable --compress-mode=size for optimized binary size on Blackwell GPUs - Detect CUDA version and host architecture at build time Co-authored-by: wingx <wingenlit@outlook.com>

* Implement chat in combined mode as an OpenAI client * Implemented stop_tui in tests * Fixed CLI tests using --verbose * Accept non-utf8 chars in responses * Simplify prompt for t=0 * Added patch for tools/server/server.cpp * Server output > devnull to avoid buffer fill up with --verbose

* Added back support for --image in CLI tool * Added tests for multimodal cli * Added optional mmproj parameter to TUI tests too * Addressed review comments * Added test to check multiple markers/images on cli

@dpoulopoulos

* Updated index.md * Moar updates to index.md * Updated quickstart.md * Updated support + example llamafiles * Added example files and examples + minor fixes * Updated structure * Removed security * Updated source installation * Updated README_0.10.0, now frozen doc * Removed ref to new_build_wip in whisperfile, make setup installs cosmocc * Apply suggestion from @dpoulopoulos Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Addressed review comments * Addressed review comments #2 --------- Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>

* Added per-mode help + nologo/ascii support * If model is missing, bump to help for respective mode

* Updated skill not to use new_build_wip + improved it * Removed stray new_build_wip reference

aittalam added 30 commits November 20, 2025 11:49

Update llama.cpp to a44d7712 and refresh patches

8963741

Updated apply-patches and renames

3921f83

Removed Makefile patch, added as a removal

de256d6

Refactored patches to be applied in the llama.cpp dir

cbbefdb

Fixed apply-patches.sh

76b99bb

Updated tool_server_server.cpp.patch

a8291c6

Updated Makefile to pull llama.cpp deps

c98c65e

Update llama.cpp submodule with dependency submodules

a604991

added miniaudio.h.patch

137134f

Fixed wrong index in miniaudio patch

69d8662

Updated patches

d4208ef

Updates to server.cpp

03a0c7f

Added cosmocc-override.cmake

0f44e3b

Made patches minimal

c834f70

Removed common.h patch

1a63fea

Added cosmocc 4.0.2 target

b66b090

Added readme

a2c3175

Updated llama.cpp to commit a44d77126c911d105f7f800c17da21b2a5b112d1

9ee9a60

Updated llama.cpp to commit dbc15a79672e72e0b9c1832adddf3334f5c9229c

219e7b4

Updated patches for newer llama.cpp version

999a5f4

Added miniaudio

63b63e9

Updated patches with common/download.cpp

03eec4f

Updated patches with common/download.cpp

d6dcf56

Added extra deps to llama.cpp setup

d1d0633

Moved to using deps from the vendors folder

9edbc1f

Removed miniaudio from added files

a864b81

New BUILD.mk + common/chat.cpp patch

dda1664

Updated cosmocc to 4.0.2

90cad4c

Piping od with awk for better compatibility

824a70e

Renamed miniaudio patch 🤦

abe9784

aittalam and others added 6 commits February 25, 2026 16:55

Fix templating support for Apertus (#894)

808ed0a

* Fixed templating issue with Apertus * Load the PEG parser in chatbot_main if one is provided

Add support for legacy chat, cli, server modalities (#896)

1d14a4e

* Add CLI, SERVER, CHAT, and combined modes * Removed log 'path does not exist' * Added server routes to main.cpp * Fixing GPU log callbacks * Added --nothink feature for CLI * Refactored args + cleaned FLAGS

Enabled make cosmocc for any make version

8d973d6

Updated ci.yml to work with new llamafile / zipalign

6734cb4

github-actions bot added the devops label Mar 2, 2026

aittalam and others added 20 commits March 3, 2026 19:39

Merge branch 'main' into new_build_wip

cc3de2b

Merge branch 'new_build_wip' of github.com:mozilla-ai/llamafile into …

9503b11

…new_build_wip

Updated Makefile to download cosmocc at end of setup

73ef765

Merge branch 'new_build_wip' of github.com:mozilla-ai/llamafile into …

724c895

…new_build_wip

Added --image-min-tokens to TUI chat (#905)

3a9a1ea

Add image to cli (#912)

0b19932

* Added back support for --image in CLI tool * Added tests for multimodal cli * Added optional mmproj parameter to TUI tests too * Addressed review comments * Added test to check multiple markers/images on cli

Updated README.md, minor fix to docs/index.md

04cd7ae

Minor fixes to install llamafile binary

e08facd

Open next llama.cpp update PR to main

5216b45

Updated copyrights

28ad138

Removed old README, added 'based on' badges

725da50

Better version handling (#913)

963a5f4

Improve help (#914)

bae229c

* Added per-mode help + nologo/ascii support * If model is missing, bump to help for respective mode

Update skill docs (#915)

1102757

* Updated skill not to use new_build_wip + improved it * Removed stray new_build_wip reference

Updated RELEASE.md for v0.10.0

840bdb8

aittalam marked this pull request as ready for review March 19, 2026 11:12

aittalam merged commit 4cc1a5f into main Mar 19, 2026
3 checks passed

aittalam deleted the new_build_wip branch March 19, 2026 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llamafile reloaded (v0.10.0)#867

llamafile reloaded (v0.10.0)#867
aittalam merged 127 commits intomainfrom
new_build_wip

aittalam commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aittalam commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants