[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build System and Multi-Modal Support by rolandtannous · Pull Request #302 · unslothai/unsloth-zoo

rolandtannous · 2025-09-23T00:39:10Z

PROBLEM

Requires unslothai/unsloth#3356

llama.cpp integration in unsloth-zoo suffered from critical failures:

Monolithic conversion function: Single convert_to_gguf() function attempted to handle base conversion and quantization simultaneously, making it impossible to generate multiple quantization formats efficiently or adapt to llama.cpp's new requirement that quantization must occur from high-precision formats only
Architecture detection breakage: Regex-based extraction from llama.cpp converter script broke when llama.cpp reworked its class structures
Module conflicts: System-wide GGUF installations conflicted with llama.cpp's custom GGUF implementation containing model-specific extensions (Mistral classes, etc.)
Build system failures: Hardcoded sudo operations caused automatic permission denials, requiring manual installation for most users
No multimodal support: Single conversion path failed on Vision-Language Models requiring dual-file output after llama.cpp restructuring
Missing dependency detection: Build failures . Also missingmistral_common as dependency package broke llama.cpp

SOLUTION

Separated Quantization Pipeline

New two-function architecture:

convert_to_gguf(): Focuses solely on high-precision base conversion (f32/f16/bf16), handling model loading, architecture detection, and initial GGUF generation
quantize_gguf(): Dedicated quantization function using llama-quantize binary, optimized for parallel processing with thread allocation based on CPU count

This separation enables the two-stage architecture where one expensive base conversion supports multiple quantization targets.

Dynamic Architecture Detection via Class Introspection

Previous regex-based architecture extraction broke when llama.cpp restructured its converter. New implementation uses proper class introspection:

Downloads and loads converter script as Python module
Extracts architectures from ModelBase._model_classes[ModelType.TEXT] and ModelBase._model_classes[ModelType.MMPROJ]
Maintains separate text and vision architecture sets for proper model classification
Adapts automatically to llama.cpp structural changes without code modifications

Intelligent Permission and Dependency Management

Replaced hardcoded sudo operations with platform-aware permission handling. The new check_build_requirements() function validates essential tools (gcc, cmake, curl, git) and libraries (libcurl4-openssl-dev) using multiple detection methods (which, pkg-config, curl-config).

Cloud environments (Colab/Kaggle) receive automatic sudo elevation with notification, while local environments require explicit user consent. Missing packages are identified upfront with specific installation guidance. Build commands are run as regular user as they do not require sudo privileges.

GGUF Isolation with Context Management

Introduced use_local_gguf() context manager to resolve conflicts between system GGUF and llama.cpp's extended GGUF implementation:

@contextlib.contextmanager
def use_local_gguf():
    # Temporarily prioritize llama.cpp/gguf-py
    # Remove conflicting system modules
    # Complete restoration after conversion

This ensures llama.cpp's custom GGUF extensions (Mistral-specific classes, additional model support) are used during conversion, preventing import errors and missing functionality.

Multi-Modal Conversion Architecture

Enhanced convert_to_gguf() now handles Vision-Language Models through dual conversion paths:

VLM detection: Checks architecture against vision_archs set
Dual output: Generates both model.gguf and model-mmproj.gguf files
Fallback logic: Vision models without MMPROJ support convert as text-only with user notification
Intelligent naming: Maintains consistent file naming for inference tools expecting specific patterns

Models with text-only llama.cpp support despite having vision capabilities now convert successfully instead of failing.

Enhanced Build System with Fallback Strategy

Use dual-strategy build system:

Primary path: Traditional make-based compilation
Automatic fallback: CMAKE compilation when make fails (required for newer llama.cpp versions)
Binary validation: Verifies executable existence and functionality

MISC

Added support for sentence-piece tokenizers (required by llama.cpp for some model conversions) by explicitly downloading tokenizer.model during merging process in saving_utils if applicable.
Dropped Metadata regex parsing success tests for python native methods as they are less sensitive to banner or logging changes.

TESTING

testing branches: Testing branches: https://github.com/unslothai/rolandtannous/unsloth-zoo@fix/llamacpp-compatibility-gguf-conversion and https://github.com/unslothai/rolandtannous/unsloth@fix/llamacpp-compatibility-gguf-conversion

Build System Validation

Tested missing dependency scenarios: system correctly identifies and guides installation of build-essential, cmake, curl
Verified make→cmake fallback on systems requiring modern build process
Confirmed permission handling: Colab auto-proceeds, local systems request consent

Architecture Detection

Validated against llama.cpp converter restructuring
Tested proper classification of text models (LlamaForCausalLM) vs vision models (LlavaForConditionalGeneration)

Multi-Modal Conversion

Tested VLM models generating dual outputs (model.gguf + mmproj.gguf)
Verified fallback behavior for vision models with text-only support
Validated file naming conventions for multimodal models

Environment Isolation

Confirmed resolution of import conflicts between system and local GGUF versions
Tested access to llama.cpp-specific GGUF extensions (Mistral classes)
Validated complete environment restoration after conversion

Models Tested

Comprehensive testing across model architectures: GPT-OSS, Llama 3.1/3.2, Pixtral (VLM), Gemma variants, Qwen series, Mistral, Phi models. Specific validation of GPT-OSS-20B on Colab T4. Notebook

End-to-End Validation

Local and Colab environments
Save and Hub push operations
Post-conversion inference: llama-cli for text, llama-mtmd-cli for multimodal
Ollama model creation and inference using generated Modelfiles

Solves

danielhanchen

Nice work!

danielhanchen · 2025-10-06T11:30:57Z

@mmathew23 @Datta0 Can you guys also review this - appreciate it :)

danielhanchen · 2025-10-06T11:31:06Z

@rolandtannous some conflicts

rolandtannous · 2025-10-06T11:34:02Z

@mmathew23 @Datta0 Can you guys also review this - appreciate it :)
use https://github.com/rolandtannous/unsloth-zoo@fix/llamacpp-compatibility-gguf-conversion and https://github.com/rolandtannous/unsloth@fix/llamacpp-compatibility-gguf-conversion
note that you'll need to install mistral_common manually to make it work, since entries to pyproject.toml don't get installed as dependencies if you're not installing via pypi

mmathew23

Left a couple comments. I'm unclear on whether or not we need/should allow sudo. @danielhanchen Did you want to add that back?

Obviously a fairly significant change, so we should definitely ask others to help test on various envs.

rolandtannous · 2025-10-08T18:08:55Z

Left a couple comments. I'm unclear on whether or not we need/should allow sudo. @danielhanchen Did you want to add that back?

Obviously a fairly significant change, so we should definitely ask others to help test on various envs.

I have tested it extensively on linux machines . I tested against 90% of the models you find in the new ollama template file but I haven't been able to test it on windows machines because I don't have access to one.
I am thinking I can do that at a later stage with the help of a windows user (I've done something similar in the past for another PR).

…e function

- Normalize UNSLOTH_LLAMA_CPP_SCRIPTS_DIR via abspath/expanduser/expandvars and switch the candidate check to os.path.isfile so a relative env var or a directory named convert_hf_to_gguf.py no longer corrupts the lru_cache key or surfaces as IsADirectoryError. - Return (path, mtime_ns, size) from the resolver so the lru_cache invalidates when a pinned local converter is updated in place. - Differentiate the patched output filename by a sha256 prefix of the cache key so distinct sources do not overwrite each other in LLAMA_CPP_DEFAULT_DIR while their cached metadata diverges. - Initialize text_archs and vision_archs alongside supported_types so introspection failures (e.g. text-only converters with no MMPROJ entry) produce a clean warning instead of an UnboundLocalError at frozenset(). - Use logger.info instead of print for the local-script notice so it follows the module's logging routing. - Branch error log and RuntimeError text on the actual source (local file vs network download) for clearer diagnostics. - Fix Github "report" comment typo to "repository". Blame-touched lines preserved in intent: - The error-logging lines whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") still log with exc_info=True and re-raise a RuntimeError chained from the original exception; only the source label is now accurate ("local file '...'" vs "network download") because the local-script branch of the same try block can also fail there. - The patched_filename construction whose blame points at b9daae8 ("feat: Add Windows native support and migrate llama.cpp to ~/.unsloth/ (unslothai#526)") still writes into LLAMA_CPP_DEFAULT_DIR using os.path.join with a Python-built filename; the lru_cache(2) introduced alongside the local-override feature would otherwise let two cache entries share one on-disk file, so the filename now carries a 12-char sha256 of the cache key whenever a local script is in play, leaving the network-only filename unchanged.

- check_llama_cpp now consults UNSLOTH_LLAMA_CPP_SCRIPTS_DIR before falling back to llama_cpp_folder, so the install-check matches what _download_convert_hf_to_gguf will actually use; the candidate probe also moves from os.path.exists to os.path.isfile so a directory named convert(-|_)hf_to_gguf.py is no longer accepted as a script. Error text now mentions both lookup locations. - use_local_gguf gains an optional llama_cpp_dir argument; when omitted it prefers UNSLOTH_LLAMA_CPP_SCRIPTS_DIR's sibling gguf-py before falling back to LLAMA_CPP_DEFAULT_DIR, so a pinned llama.cpp checkout's matching gguf-py is loaded by the converter's `import gguf` instead of failing with ModuleNotFoundError. - Revert the sha256-suffixed patched filename. _download_convert_hf_to_gguf_cached now writes to the stable LLAMA_CPP_DEFAULT_DIR/{name}.py in both local and network modes, restoring backwards compatibility with convert_to_gguf()'s default converter_location keyword and avoiding orphaned digest files when a pinned converter is updated in place. Severity downgraded P1 -> P2 by finding-verify with disposition REVERT_UNSUBSTANTIATED; the cross-source disk/cache mismatch the digest was guarding against is hypothetical because every in-tree caller (unsloth_repo/unsloth/save.py, studio export.py) passes the returned path explicitly. The hashlib import added for the digest is removed. - Replace `Failed during {source}/introspection of original script` with `Failed during {source} (introspection of original script)` so a quoted file path inside source no longer reads as if /introspection were a path component. Blame-touched lines preserved in intent: - The `use_local_gguf` signature and docstring (lines 150-151) whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") still defines the same context manager with the same external contract; only an optional llama_cpp_dir keyword is added so callers passing nothing keep the prior behavior whenever UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is unset. - The `gguf_py_path = os.path.join(LLAMA_CPP_DEFAULT_DIR, "gguf-py")` assignment (line 155) whose blame points at b9daae8 ("feat: Add Windows native support and migrate llama.cpp to ~/.unsloth/ (unslothai#526)") still computes gguf_py_path via os.path.join under a llama.cpp directory; the only change is that the directory is selected from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR when that env var points at a checkout containing a gguf-py/, falling back to LLAMA_CPP_DEFAULT_DIR otherwise, so the Windows native install layout is preserved. - The `check_llama_cpp` converter-discovery block (the comment, the for-loop over [convert-hf-to-gguf.py, convert_hf_to_gguf.py], the os.path.join call, the candidate probe, the converter_location assignment, the break, and the trailing pass at lines 565-571) whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") is preserved as the fallback branch when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is unset; the only change there is os.path.exists -> os.path.isfile to match _resolve_local_convert_script and reject directories with the script's name. The wrapping `if local_convert_script is not None: ... else:` only adds a new short-circuit; the unchanged for-loop is the else branch. - The "Failed to find converter script in {llama_cpp_folder}" RuntimeError (line 574) whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") still raises the same RuntimeError class under the same condition (converter_location is None); only the message string is extended to also mention UNSLOTH_LLAMA_CPP_SCRIPTS_DIR so a misconfigured user sees both lookup locations.

- convert_to_gguf builds an env that prepends UNSLOTH_LLAMA_CPP_SCRIPTS_DIR/gguf-py to PYTHONPATH (when that directory exists) and passes it via env= to both subprocess.run call sites. Without this, the parent process's sys.path mods made by use_local_gguf do not propagate into the converter's child interpreter, so a user that pinned a newer convert_hf_to_gguf.py via the env var ended up running it against the default-install or system-installed gguf and saw ModuleNotFoundError or version-mismatch failures. - use_local_gguf now also emits a logger.warning when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set but the directory has no gguf-py/ subdirectory, mirroring the warning pattern in _resolve_local_convert_script. This surfaces the cross-version mismatch case that previously silently fell back to LLAMA_CPP_DEFAULT_DIR. - _resolve_local_convert_script's missing-converter warning now mentions both accepted filenames (convert_hf_to_gguf.py and convert-hf-to-gguf.py) so users debugging the env var see the full set the resolver actually probes. - Bump @lru_cache(2) on _download_convert_hf_to_gguf_cached to maxsize=8 so cross-mode toggling between local and network sources, plus a few in-place updates of a pinned converter, no longer evict cache slots and trigger redundant downloads or re-reads. Negligible memory overhead because the cached payload is small. - Reorder check_llama_cpp's fallback list to ["convert_hf_to_gguf.py", "convert-hf-to-gguf.py"] so it matches _resolve_local_convert_script's underscore-first preference; modern llama.cpp ships only the underscore variant, so this also avoids an unnecessary stat on the legacy hyphenated name in the common case. Blame-touched lines preserved in intent: - The print-output `subprocess.run(command, shell=False, check=True, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)` continuation line (line 1443, which carries the stdout=subprocess.PIPE, stderr=subprocess.STDOUT keyword arguments) whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") still passes the same shell=False/check=True/text=True/stdout=subprocess.PIPE/stderr=subprocess.STDOUT arguments in the same order; only `env=sub_env` is added at the end so the venv-friendly invocation also picks up the pinned gguf-py via PYTHONPATH instead of the default-install gguf. - The silent-output `subprocess.run(command, shell=False, check=True, capture_output=True)` line (line 1446) whose blame points at 6b68187 ("fix: use `sys.executable` for pip and python subprocesses to support venv environments (unslothai#503)") still passes the same command/shell=False/check=True/capture_output=True; only `env=sub_env` is added so the same sys.executable launch the original fix established also receives PYTHONPATH for the pinned gguf-py. - The `try`/`except subprocess.CalledProcessError` block surrounding both calls (whose blame points at 70f2ce7 "[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)" and 71229d8 "Fix Mistral, Qwen (unslothai#36)") is unchanged: same RuntimeError type, same command-string formatting, same stdout-print-on-failure path.

Discovery: extract LLAMA_CPP_CONVERTER_FILENAMES module constant and use it from both _resolve_local_convert_script and check_llama_cpp so both paths agree on filename precedence. Switch check_llama_cpp from os.path.exists to os.path.isfile so a directory accidentally named convert_hf_to_gguf.py is no longer returned as a runnable converter. Network timeout: split requests.get timeout into a (connect, read) tuple of (5, 30) so connection-phase failures resolve in 5 seconds instead of waiting the full 30s read window. Diagnostics: replace the 'download or introspection' phrasing in the shared error path with 'loading or introspecting' so the message accurately covers both the local-file read branch and the network branch. The four touched lines (converter discovery loop, exists-vs-isfile check, and the two error strings) trace back via git blame to commit 70f2ce7 "[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build System and Multi-Modal Support (unslothai#302)". The original behavior is preserved: the same converter filenames are discovered (just via a shared constant), os.path.isfile is a strict subset of os.path.exists that only rejects directories misnamed like the converter (which cannot be executed anyway), and the wording change is purely diagnostic and now correctly covers the local-read branch added in this change set.

…621) * Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR Add _resolve_local_convert_script helper and an opt-in path that reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead of fetching master from GitHub. Lets callers (e.g. Studio) pin the convert script to the same llama.cpp ref as their llama-quantize binary and gguf-py, avoiding API drift. The original _download_convert_hf_to_gguf is now a thin wrapper that resolves the env var on every call and forwards the resolved path to the @lru_cache'd implementation as part of its cache key. Resolving inside the cached function would have left subsequent calls pinned to the first resolution, since lru_cache keys on the args passed (always the default) rather than values mutated inside the body. Default behavior (env var unset) is unchanged: download from master. * Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR Add _resolve_local_convert_script() helper and an opt-in path that reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead of fetching master from GitHub. Lets callers (e.g. Studio) pin the convert script to the same llama.cpp ref as their llama-quantize binary and gguf-py, avoiding API drift. The original _download_convert_hf_to_gguf is now a thin wrapper that resolves the env var on every call and forwards the resolved path to the @lru_cache'd implementation as part of its cache key. Resolving inside the cached function would have left subsequent calls pinned to the first resolution, since lru_cache keys on the args passed (always the default) rather than values mutated inside the body. Default behavior (env var unset) is unchanged: download from master. * Add unsloth_compiled_cache to gitignore * Harden local convert_hf_to_gguf.py override path _resolve_local_convert_script: normalize UNSLOTH_LLAMA_CPP_SCRIPTS_DIR via os.path.expanduser + os.path.abspath so the cache key on the cached implementation is stable across cwd changes; switch os.path.exists -> os.path.isfile so a directory or socket bearing the converter name no longer wins the lookup; return (abs_path, mtime_ns, size) so in-place edits to the same path invalidate the lru_cache entry. _download_convert_hf_to_gguf_cached: unpack the new tuple, replace the bare print with logger.info to match the rest of the function, and initialize text_archs and vision_archs alongside supported_types so a converter that lacks ModelType.MMPROJ (or where ModelBase / ModelType is missing entirely) no longer raises UnboundLocalError on the unconditional frozenset(...) calls. Re-attach cache_clear / cache_info from the cached implementation onto the public _download_convert_hf_to_gguf wrapper so the pre-existing lru_cache surface (in __all__) keeps working for external callers. * Polish local convert_hf_to_gguf.py override hardening _resolve_local_convert_script: wrap the per-candidate isfile/stat block in try/except OSError so a vanishing-during-update converter falls through to the next name (and ultimately the network fallback) instead of escaping a bare FileNotFoundError out of the resolver. Final fall-through warning now lists both convert_hf_to_gguf.py and convert-hf-to-gguf.py since the loop tries both names. Fix typo on the cached function comment: 'Github report' -> 'GitHub repository'. @lru_cache(2) -> @lru_cache(8): two slots is too small once the cache key includes (path, mtime_ns, size). Any second distinct local identity (post-update mtime change, distinct SCRIPTS_DIR path) would evict the (name, None) network slot and force a refetch on a later unset-env call. Eight slots cover realistic local-identity churn at trivial memory cost. Re-attach cache_parameters and __wrapped__ on the public _download_convert_hf_to_gguf wrapper so the full pre-existing lru_cache surface (cache_clear, cache_info, cache_parameters, __wrapped__) is preserved for external callers and inspect.unwrap. * Tighten convert_hf_to_gguf comments * Scrub leaked references from comments and string literals * Apply codex correction to scrub commit Reverts the cosmetic whitespace edits introduced by 8b4fd40 ("Scrub leaked references from comments and string literals") that did not correspond to any leak token and that damaged whitespace-significant content: - Restored canonical "PURPOSE. See" / "program. If not" double space in the two GPL/LGPL license headers. - Restored 4-space docstring continuations in _resolve_local_convert_script, _find_visual_studio, _find_openssl_root, _find_lib_path. - Restored the 12-space indent inside the bytes-literal carrying source code substituted via re.sub into convert_hf_to_gguf.py for the Qwen3MoE num_experts patch. The previous 1-space prefix would have produced invalid Python (IndentationError) at runtime in the patched file. The post-correction tree of unsloth_zoo/llama_cpp.py matches the parent commit 2ac599 (the last good state before the bad scrub) byte for byte. * Harden local convert_hf_to_gguf.py override Cache implementation: drop lru_cache from 8 entries to 1. The cached function returns a path to a single shared on-disk file at LLAMA_CPP_DEFAULT_DIR/{name}.py, which is overwritten by every cache miss. Holding more than one entry in memory therefore returns stale architecture metadata for any non-active key while the disk content reflects only the last miss. Local override resolution: raise RuntimeError when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set but invalid (non-directory, missing converter, or stat failure). The previous warn-and-return-None silently fell back to the network download, defeating an explicit pin. Network timeout: add timeout=30 to requests.get on the converter URL, matching the existing timeout convention used elsewhere in the module. Compatibility shim: stop forwarding __wrapped__ from the cached inner function. The inner signature is (name, _local_script_info) while the public wrapper accepts only name, so inspect.unwrap(fn)() previously raised TypeError. cache_clear, cache_info, and cache_parameters are still forwarded for parity with the pre-split surface. * Unify llama.cpp converter discovery and tighten timeout/wording Discovery: extract LLAMA_CPP_CONVERTER_FILENAMES module constant and use it from both _resolve_local_convert_script and check_llama_cpp so both paths agree on filename precedence. Switch check_llama_cpp from os.path.exists to os.path.isfile so a directory accidentally named convert_hf_to_gguf.py is no longer returned as a runnable converter. Network timeout: split requests.get timeout into a (connect, read) tuple of (5, 30) so connection-phase failures resolve in 5 seconds instead of waiting the full 30s read window. Diagnostics: replace the 'download or introspection' phrasing in the shared error path with 'loading or introspecting' so the message accurately covers both the local-file read branch and the network branch. The four touched lines (converter discovery loop, exists-vs-isfile check, and the two error strings) trace back via git blame to commit 70f2ce7 "[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build System and Multi-Modal Support (#302)". The original behavior is preserved: the same converter filenames are discovered (just via a shared constant), os.path.isfile is a strict subset of os.path.exists that only rejects directories misnamed like the converter (which cannot be executed anyway), and the wording change is purely diagnostic and now correctly covers the local-read branch added in this change set. --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: danielhanchen <info@unsloth.ai>

rolandtannous mentioned this pull request Sep 23, 2025

[Part2] Reinstate llama.cpp Compatibility and GGUF Conversion with Multiple Quantizations and Automated Ollama Modelfile Creation unslothai/unsloth#3356

Merged

rolandtannous requested a review from danielhanchen September 23, 2025 01:05

danielhanchen requested changes Sep 24, 2025

View reviewed changes

rolandtannous force-pushed the fix/llamacpp-compatibility-gguf-conversion branch from d0a9678 to 524ae5f Compare October 4, 2025 21:09

mmathew23 reviewed Oct 8, 2025

View reviewed changes

Comment thread unsloth_zoo/llama_cpp.py

Comment thread unsloth_zoo/llama_cpp.py

rolandtannous added 10 commits October 13, 2025 20:02

llama.cpp compatibility fix and code restructuring. introduce quantiz…

9525e93

…e function

updated system permission and security mechanism

5c5ed92

install without asking permission on kaggle or colab

836d991

added mistral_common as a dependency

0fc616d

fix dtype suffix for text model in multimodal MLM

da235a1

set is_vlm=False for MLMs with only text support in llama.cpp

0426fa1

fix conversion suffix for text models

2746d46

better libcurl dev checking, for both debian and rpm based linux

18fc76b

add package install for rpm based linux

fe1d308

fix - make use_local_gguf restore original gguf sys modules

4b4190e

rolandtannous force-pushed the fix/llamacpp-compatibility-gguf-conversion branch from 29a5432 to 4b4190e Compare October 13, 2025 20:03

danielhanchen merged commit 70f2ce7 into unslothai:main Oct 14, 2025

rlleshi mentioned this pull request Nov 10, 2025

OOM-ing on Nvidia Jetson Orin Nano unslothai/unsloth#3443

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build System and Multi-Modal Support#302

[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build System and Multi-Modal Support#302
danielhanchen merged 10 commits into
unslothai:mainfrom
rolandtannous:fix/llamacpp-compatibility-gguf-conversion

rolandtannous commented Sep 23, 2025 •

edited

Loading

Uh oh!

danielhanchen left a comment

Uh oh!

danielhanchen commented Oct 6, 2025

Uh oh!

danielhanchen commented Oct 6, 2025

Uh oh!

rolandtannous commented Oct 6, 2025 •

edited

Loading

Uh oh!

mmathew23 left a comment

Uh oh!

Uh oh!

Uh oh!

rolandtannous commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rolandtannous commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PROBLEM

SOLUTION

Separated Quantization Pipeline

Dynamic Architecture Detection via Class Introspection

Intelligent Permission and Dependency Management

GGUF Isolation with Context Management

Multi-Modal Conversion Architecture

Enhanced Build System with Fallback Strategy

MISC

TESTING

Build System Validation

Architecture Detection

Multi-Modal Conversion

Environment Isolation

Models Tested

End-to-End Validation

Solves

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Oct 6, 2025

Uh oh!

danielhanchen commented Oct 6, 2025

Uh oh!

rolandtannous commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmathew23 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rolandtannous commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rolandtannous commented Sep 23, 2025 •

edited

Loading

rolandtannous commented Oct 6, 2025 •

edited

Loading