Skip to content

[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build System and Multi-Modal Support#302

Merged
danielhanchen merged 10 commits into
unslothai:mainfrom
rolandtannous:fix/llamacpp-compatibility-gguf-conversion
Oct 14, 2025
Merged

[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build System and Multi-Modal Support#302
danielhanchen merged 10 commits into
unslothai:mainfrom
rolandtannous:fix/llamacpp-compatibility-gguf-conversion

Conversation

@rolandtannous

@rolandtannous rolandtannous commented Sep 23, 2025

Copy link
Copy Markdown

PROBLEM

Requires unslothai/unsloth#3356

llama.cpp integration in unsloth-zoo suffered from critical failures:

  • Monolithic conversion function: Single convert_to_gguf() function attempted to handle base conversion and quantization simultaneously, making it impossible to generate multiple quantization formats efficiently or adapt to llama.cpp's new requirement that quantization must occur from high-precision formats only
  • Architecture detection breakage: Regex-based extraction from llama.cpp converter script broke when llama.cpp reworked its class structures
  • Module conflicts: System-wide GGUF installations conflicted with llama.cpp's custom GGUF implementation containing model-specific extensions (Mistral classes, etc.)
  • Build system failures: Hardcoded sudo operations caused automatic permission denials, requiring manual installation for most users
  • No multimodal support: Single conversion path failed on Vision-Language Models requiring dual-file output after llama.cpp restructuring
  • Missing dependency detection: Build failures . Also missingmistral_common as dependency package broke llama.cpp

SOLUTION

Separated Quantization Pipeline

New two-function architecture:

  • convert_to_gguf(): Focuses solely on high-precision base conversion (f32/f16/bf16), handling model loading, architecture detection, and initial GGUF generation
  • quantize_gguf(): Dedicated quantization function using llama-quantize binary, optimized for parallel processing with thread allocation based on CPU count

This separation enables the two-stage architecture where one expensive base conversion supports multiple quantization targets.

Dynamic Architecture Detection via Class Introspection

Previous regex-based architecture extraction broke when llama.cpp restructured its converter. New implementation uses proper class introspection:

  • Downloads and loads converter script as Python module
  • Extracts architectures from ModelBase._model_classes[ModelType.TEXT] and ModelBase._model_classes[ModelType.MMPROJ]
  • Maintains separate text and vision architecture sets for proper model classification
  • Adapts automatically to llama.cpp structural changes without code modifications

Intelligent Permission and Dependency Management

Replaced hardcoded sudo operations with platform-aware permission handling. The new check_build_requirements() function validates essential tools (gcc, cmake, curl, git) and libraries (libcurl4-openssl-dev) using multiple detection methods (which, pkg-config, curl-config).

Cloud environments (Colab/Kaggle) receive automatic sudo elevation with notification, while local environments require explicit user consent. Missing packages are identified upfront with specific installation guidance. Build commands are run as regular user as they do not require sudo privileges.

GGUF Isolation with Context Management

Introduced use_local_gguf() context manager to resolve conflicts between system GGUF and llama.cpp's extended GGUF implementation:

@contextlib.contextmanager
def use_local_gguf():
    # Temporarily prioritize llama.cpp/gguf-py
    # Remove conflicting system modules
    # Complete restoration after conversion

This ensures llama.cpp's custom GGUF extensions (Mistral-specific classes, additional model support) are used during conversion, preventing import errors and missing functionality.

Multi-Modal Conversion Architecture

Enhanced convert_to_gguf() now handles Vision-Language Models through dual conversion paths:

  • VLM detection: Checks architecture against vision_archs set
  • Dual output: Generates both model.gguf and model-mmproj.gguf files
  • Fallback logic: Vision models without MMPROJ support convert as text-only with user notification
  • Intelligent naming: Maintains consistent file naming for inference tools expecting specific patterns

Models with text-only llama.cpp support despite having vision capabilities now convert successfully instead of failing.

Enhanced Build System with Fallback Strategy

Use dual-strategy build system:

  • Primary path: Traditional make-based compilation
  • Automatic fallback: CMAKE compilation when make fails (required for newer llama.cpp versions)
  • Binary validation: Verifies executable existence and functionality

MISC

  • Added support for sentence-piece tokenizers (required by llama.cpp for some model conversions) by explicitly downloading tokenizer.model during merging process in saving_utils if applicable.
  • Dropped Metadata regex parsing success tests for python native methods as they are less sensitive to banner or logging changes.

TESTING

testing branches: Testing branches: https://github.com/unslothai/rolandtannous/unsloth-zoo@fix/llamacpp-compatibility-gguf-conversion and https://github.com/unslothai/rolandtannous/unsloth@fix/llamacpp-compatibility-gguf-conversion

Build System Validation

  • Tested missing dependency scenarios: system correctly identifies and guides installation of build-essential, cmake, curl
  • Verified make→cmake fallback on systems requiring modern build process
  • Confirmed permission handling: Colab auto-proceeds, local systems request consent

Architecture Detection

  • Validated against llama.cpp converter restructuring
  • Tested proper classification of text models (LlamaForCausalLM) vs vision models (LlavaForConditionalGeneration)

Multi-Modal Conversion

  • Tested VLM models generating dual outputs (model.gguf + mmproj.gguf)
  • Verified fallback behavior for vision models with text-only support
  • Validated file naming conventions for multimodal models

Environment Isolation

  • Confirmed resolution of import conflicts between system and local GGUF versions
  • Tested access to llama.cpp-specific GGUF extensions (Mistral classes)
  • Validated complete environment restoration after conversion

Models Tested

Comprehensive testing across model architectures: GPT-OSS, Llama 3.1/3.2, Pixtral (VLM), Gemma variants, Qwen series, Mistral, Phi models. Specific validation of GPT-OSS-20B on Colab T4. Notebook

End-to-End Validation

  • Local and Colab environments
  • Save and Hub push operations
  • Post-conversion inference: llama-cli for text, llama-mtmd-cli for multimodal
  • Ollama model creation and inference using generated Modelfiles

Solves

unslothai/unsloth#3348
unslothai/unsloth#3297
unslothai/unsloth#3090
unslothai/unsloth#3229
unslothai/unsloth#3215
unslothai/unsloth#3202
unslothai/unsloth#3194
unslothai/unsloth#3133
unslothai/unsloth#3124
unslothai/unsloth#3040
unslothai/unsloth#2984
unslothai/unsloth#2950
unslothai/unsloth#2860
unslothai/unsloth#2667
unslothai/unsloth#2580
unslothai/unsloth#2526
unslothai/unsloth#2478
unslothai/unsloth#2399
unslothai/unsloth#2370
unslothai/unsloth#2365
unslothai/unsloth#2360
unslothai/unsloth#2326
unslothai/unsloth#2321
unslothai/unsloth#2290
unslothai/unsloth#2209
unslothai/unsloth#2193
unslothai/unsloth#2115
unslothai/unsloth#2058
unslothai/unsloth#2007
unslothai/unsloth#1917
unslothai/unsloth#1905
unslothai/unsloth#1903
unslothai/unsloth#1846
unslothai/unsloth#1781
unslothai/unsloth#1729
unslothai/unsloth#1721
unslothai/unsloth#1645
unslothai/unsloth#1610
unslothai/unsloth#1546
unslothai/unsloth#1504
unslothai/unsloth#965
unslothai/unsloth#835
unslothai/unsloth#748
unslothai/unsloth#785
unslothai/unsloth#2098
unslothai/unsloth#3050

@danielhanchen danielhanchen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@rolandtannous rolandtannous force-pushed the fix/llamacpp-compatibility-gguf-conversion branch from d0a9678 to 524ae5f Compare October 4, 2025 21:09
@danielhanchen

Copy link
Copy Markdown
Member

@mmathew23 @Datta0 Can you guys also review this - appreciate it :)

@danielhanchen

Copy link
Copy Markdown
Member

@rolandtannous some conflicts

@rolandtannous

rolandtannous commented Oct 6, 2025

Copy link
Copy Markdown
Author

@mmathew23 @Datta0 Can you guys also review this - appreciate it :)
use https://github.com/rolandtannous/unsloth-zoo@fix/llamacpp-compatibility-gguf-conversion and https://github.com/rolandtannous/unsloth@fix/llamacpp-compatibility-gguf-conversion
note that you'll need to install mistral_common manually to make it work, since entries to pyproject.toml don't get installed as dependencies if you're not installing via pypi

@mmathew23 mmathew23 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments. I'm unclear on whether or not we need/should allow sudo. @danielhanchen Did you want to add that back?

Obviously a fairly significant change, so we should definitely ask others to help test on various envs.

Comment thread unsloth_zoo/llama_cpp.py
Comment thread unsloth_zoo/llama_cpp.py
@rolandtannous

Copy link
Copy Markdown
Author

Left a couple comments. I'm unclear on whether or not we need/should allow sudo. @danielhanchen Did you want to add that back?

Obviously a fairly significant change, so we should definitely ask others to help test on various envs.

I have tested it extensively on linux machines . I tested against 90% of the models you find in the new ollama template file but I haven't been able to test it on windows machines because I don't have access to one.
I am thinking I can do that at a later stage with the help of a windows user (I've done something similar in the past for another PR).

@rolandtannous rolandtannous force-pushed the fix/llamacpp-compatibility-gguf-conversion branch from 29a5432 to 4b4190e Compare October 13, 2025 20:03
@danielhanchen danielhanchen merged commit 70f2ce7 into unslothai:main Oct 14, 2025
danielhanchen added a commit to shimmyshimmer/unsloth-zoo-staging-2 that referenced this pull request May 4, 2026
- Normalize UNSLOTH_LLAMA_CPP_SCRIPTS_DIR via abspath/expanduser/expandvars and switch the candidate check to os.path.isfile so a relative env var or a directory named convert_hf_to_gguf.py no longer corrupts the lru_cache key or surfaces as IsADirectoryError.
- Return (path, mtime_ns, size) from the resolver so the lru_cache invalidates when a pinned local converter is updated in place.
- Differentiate the patched output filename by a sha256 prefix of the cache key so distinct sources do not overwrite each other in LLAMA_CPP_DEFAULT_DIR while their cached metadata diverges.
- Initialize text_archs and vision_archs alongside supported_types so introspection failures (e.g. text-only converters with no MMPROJ entry) produce a clean warning instead of an UnboundLocalError at frozenset().
- Use logger.info instead of print for the local-script notice so it follows the module's logging routing.
- Branch error log and RuntimeError text on the actual source (local file vs network download) for clearer diagnostics.
- Fix Github "report" comment typo to "repository".

Blame-touched lines preserved in intent:
- The error-logging lines whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") still log with exc_info=True and re-raise a RuntimeError chained from the original exception; only the source label is now accurate ("local file '...'" vs "network download") because the local-script branch of the same try block can also fail there.
- The patched_filename construction whose blame points at b9daae8 ("feat: Add Windows native support and migrate llama.cpp to ~/.unsloth/ (unslothai#526)") still writes into LLAMA_CPP_DEFAULT_DIR using os.path.join with a Python-built filename; the lru_cache(2) introduced alongside the local-override feature would otherwise let two cache entries share one on-disk file, so the filename now carries a 12-char sha256 of the cache key whenever a local script is in play, leaving the network-only filename unchanged.
danielhanchen added a commit to shimmyshimmer/unsloth-zoo-staging-2 that referenced this pull request May 4, 2026
- check_llama_cpp now consults UNSLOTH_LLAMA_CPP_SCRIPTS_DIR before falling back to llama_cpp_folder, so the install-check matches what _download_convert_hf_to_gguf will actually use; the candidate probe also moves from os.path.exists to os.path.isfile so a directory named convert(-|_)hf_to_gguf.py is no longer accepted as a script. Error text now mentions both lookup locations.
- use_local_gguf gains an optional llama_cpp_dir argument; when omitted it prefers UNSLOTH_LLAMA_CPP_SCRIPTS_DIR's sibling gguf-py before falling back to LLAMA_CPP_DEFAULT_DIR, so a pinned llama.cpp checkout's matching gguf-py is loaded by the converter's `import gguf` instead of failing with ModuleNotFoundError.
- Revert the sha256-suffixed patched filename. _download_convert_hf_to_gguf_cached now writes to the stable LLAMA_CPP_DEFAULT_DIR/{name}.py in both local and network modes, restoring backwards compatibility with convert_to_gguf()'s default converter_location keyword and avoiding orphaned digest files when a pinned converter is updated in place. Severity downgraded P1 -> P2 by finding-verify with disposition REVERT_UNSUBSTANTIATED; the cross-source disk/cache mismatch the digest was guarding against is hypothetical because every in-tree caller (unsloth_repo/unsloth/save.py, studio export.py) passes the returned path explicitly. The hashlib import added for the digest is removed.
- Replace `Failed during {source}/introspection of original script` with `Failed during {source} (introspection of original script)` so a quoted file path inside source no longer reads as if /introspection were a path component.

Blame-touched lines preserved in intent:
- The `use_local_gguf` signature and docstring (lines 150-151) whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") still defines the same context manager with the same external contract; only an optional llama_cpp_dir keyword is added so callers passing nothing keep the prior behavior whenever UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is unset.
- The `gguf_py_path = os.path.join(LLAMA_CPP_DEFAULT_DIR, "gguf-py")` assignment (line 155) whose blame points at b9daae8 ("feat: Add Windows native support and migrate llama.cpp to ~/.unsloth/ (unslothai#526)") still computes gguf_py_path via os.path.join under a llama.cpp directory; the only change is that the directory is selected from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR when that env var points at a checkout containing a gguf-py/, falling back to LLAMA_CPP_DEFAULT_DIR otherwise, so the Windows native install layout is preserved.
- The `check_llama_cpp` converter-discovery block (the comment, the for-loop over [convert-hf-to-gguf.py, convert_hf_to_gguf.py], the os.path.join call, the candidate probe, the converter_location assignment, the break, and the trailing pass at lines 565-571) whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") is preserved as the fallback branch when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is unset; the only change there is os.path.exists -> os.path.isfile to match _resolve_local_convert_script and reject directories with the script's name. The wrapping `if local_convert_script is not None: ... else:` only adds a new short-circuit; the unchanged for-loop is the else branch.
- The "Failed to find converter script in {llama_cpp_folder}" RuntimeError (line 574) whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") still raises the same RuntimeError class under the same condition (converter_location is None); only the message string is extended to also mention UNSLOTH_LLAMA_CPP_SCRIPTS_DIR so a misconfigured user sees both lookup locations.
danielhanchen added a commit to shimmyshimmer/unsloth-zoo-staging-2 that referenced this pull request May 5, 2026
- convert_to_gguf builds an env that prepends UNSLOTH_LLAMA_CPP_SCRIPTS_DIR/gguf-py to PYTHONPATH (when that directory exists) and passes it via env= to both subprocess.run call sites. Without this, the parent process's sys.path mods made by use_local_gguf do not propagate into the converter's child interpreter, so a user that pinned a newer convert_hf_to_gguf.py via the env var ended up running it against the default-install or system-installed gguf and saw ModuleNotFoundError or version-mismatch failures.
- use_local_gguf now also emits a logger.warning when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set but the directory has no gguf-py/ subdirectory, mirroring the warning pattern in _resolve_local_convert_script. This surfaces the cross-version mismatch case that previously silently fell back to LLAMA_CPP_DEFAULT_DIR.
- _resolve_local_convert_script's missing-converter warning now mentions both accepted filenames (convert_hf_to_gguf.py and convert-hf-to-gguf.py) so users debugging the env var see the full set the resolver actually probes.
- Bump @lru_cache(2) on _download_convert_hf_to_gguf_cached to maxsize=8 so cross-mode toggling between local and network sources, plus a few in-place updates of a pinned converter, no longer evict cache slots and trigger redundant downloads or re-reads. Negligible memory overhead because the cached payload is small.
- Reorder check_llama_cpp's fallback list to ["convert_hf_to_gguf.py", "convert-hf-to-gguf.py"] so it matches _resolve_local_convert_script's underscore-first preference; modern llama.cpp ships only the underscore variant, so this also avoids an unnecessary stat on the legacy hyphenated name in the common case.

Blame-touched lines preserved in intent:
- The print-output `subprocess.run(command, shell=False, check=True, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)` continuation line (line 1443, which carries the stdout=subprocess.PIPE, stderr=subprocess.STDOUT keyword arguments) whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") still passes the same shell=False/check=True/text=True/stdout=subprocess.PIPE/stderr=subprocess.STDOUT arguments in the same order; only `env=sub_env` is added at the end so the venv-friendly invocation also picks up the pinned gguf-py via PYTHONPATH instead of the default-install gguf.
- The silent-output `subprocess.run(command, shell=False, check=True, capture_output=True)` line (line 1446) whose blame points at 6b68187 ("fix: use `sys.executable` for pip and python subprocesses to support venv environments (unslothai#503)") still passes the same command/shell=False/check=True/capture_output=True; only `env=sub_env` is added so the same sys.executable launch the original fix established also receives PYTHONPATH for the pinned gguf-py.
- The `try`/`except subprocess.CalledProcessError` block surrounding both calls (whose blame points at 70f2ce7 "[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)" and 71229d8 "Fix Mistral, Qwen (unslothai#36)") is unchanged: same RuntimeError type, same command-string formatting, same stdout-print-on-failure path.
danielhanchen added a commit to danielhanchen/unsloth-zoo-staging that referenced this pull request May 5, 2026
Discovery: extract LLAMA_CPP_CONVERTER_FILENAMES module constant and use it
from both _resolve_local_convert_script and check_llama_cpp so both paths
agree on filename precedence. Switch check_llama_cpp from os.path.exists
to os.path.isfile so a directory accidentally named convert_hf_to_gguf.py
is no longer returned as a runnable converter.

Network timeout: split requests.get timeout into a (connect, read) tuple
of (5, 30) so connection-phase failures resolve in 5 seconds instead of
waiting the full 30s read window.

Diagnostics: replace the 'download or introspection' phrasing in the
shared error path with 'loading or introspecting' so the message accurately
covers both the local-file read branch and the network branch.

The four touched lines (converter discovery loop, exists-vs-isfile check,
and the two error strings) trace back via git blame to commit 70f2ce7
"[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build
System and Multi-Modal Support (unslothai#302)". The original behavior is
preserved: the same converter filenames are discovered (just via a shared
constant), os.path.isfile is a strict subset of os.path.exists that only
rejects directories misnamed like the converter (which cannot be executed
anyway), and the wording change is purely diagnostic and now correctly
covers the local-read branch added in this change set.
danielhanchen added a commit that referenced this pull request May 5, 2026
…621)

* Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR

Add _resolve_local_convert_script helper and an opt-in path that
reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead
of fetching master from GitHub. Lets callers (e.g. Studio) pin the
convert script to the same llama.cpp ref as their llama-quantize
binary and gguf-py, avoiding API drift.

The original _download_convert_hf_to_gguf is now a thin wrapper that
resolves the env var on every call and forwards the resolved path to
the @lru_cache'd implementation as part of its cache key. Resolving
inside the cached function would have left subsequent calls pinned to
the first resolution, since lru_cache keys on the args passed (always
the default) rather than values mutated inside the body.

Default behavior (env var unset) is unchanged: download from master.

* Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR

Add _resolve_local_convert_script() helper and an opt-in path that
reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead
of fetching master from GitHub. Lets callers (e.g. Studio) pin the
convert script to the same llama.cpp ref as their llama-quantize
binary and gguf-py, avoiding API drift.

The original _download_convert_hf_to_gguf is now a thin wrapper that
resolves the env var on every call and forwards the resolved path to
the @lru_cache'd implementation as part of its cache key. Resolving
inside the cached function would have left subsequent calls pinned to
the first resolution, since lru_cache keys on the args passed (always
the default) rather than values mutated inside the body.

Default behavior (env var unset) is unchanged: download from master.

* Add unsloth_compiled_cache to gitignore

* Harden local convert_hf_to_gguf.py override path

_resolve_local_convert_script: normalize UNSLOTH_LLAMA_CPP_SCRIPTS_DIR
via os.path.expanduser + os.path.abspath so the cache key on the
cached implementation is stable across cwd changes; switch
os.path.exists -> os.path.isfile so a directory or socket bearing
the converter name no longer wins the lookup; return
(abs_path, mtime_ns, size) so in-place edits to the same path
invalidate the lru_cache entry.
_download_convert_hf_to_gguf_cached: unpack the new tuple, replace
the bare print with logger.info to match the rest of the function,
and initialize text_archs and vision_archs alongside
supported_types so a converter that lacks ModelType.MMPROJ (or
where ModelBase / ModelType is missing entirely) no longer raises
UnboundLocalError on the unconditional frozenset(...) calls.
Re-attach cache_clear / cache_info from the cached implementation
onto the public _download_convert_hf_to_gguf wrapper so the
pre-existing lru_cache surface (in __all__) keeps working for
external callers.

* Polish local convert_hf_to_gguf.py override hardening

_resolve_local_convert_script: wrap the per-candidate isfile/stat block
in try/except OSError so a vanishing-during-update converter falls
through to the next name (and ultimately the network fallback) instead
of escaping a bare FileNotFoundError out of the resolver.
Final fall-through warning now lists both convert_hf_to_gguf.py and
convert-hf-to-gguf.py since the loop tries both names.
Fix typo on the cached function comment: 'Github report' -> 'GitHub
repository'.
@lru_cache(2) -> @lru_cache(8): two slots is too small once the
cache key includes (path, mtime_ns, size). Any second distinct local
identity (post-update mtime change, distinct SCRIPTS_DIR path) would
evict the (name, None) network slot and force a refetch on a later
unset-env call. Eight slots cover realistic local-identity churn at
trivial memory cost.
Re-attach cache_parameters and __wrapped__ on the public
_download_convert_hf_to_gguf wrapper so the full pre-existing
lru_cache surface (cache_clear, cache_info, cache_parameters,
__wrapped__) is preserved for external callers and inspect.unwrap.

* Tighten convert_hf_to_gguf comments

* Scrub leaked references from comments and string literals

* Apply codex correction to scrub commit

Reverts the cosmetic whitespace edits introduced by 8b4fd40 ("Scrub
leaked references from comments and string literals") that did not
correspond to any leak token and that damaged whitespace-significant
content:

- Restored canonical "PURPOSE.  See" / "program.  If not" double space
  in the two GPL/LGPL license headers.
- Restored 4-space docstring continuations in _resolve_local_convert_script,
  _find_visual_studio, _find_openssl_root, _find_lib_path.
- Restored the 12-space indent inside the bytes-literal carrying source
  code substituted via re.sub into convert_hf_to_gguf.py for the Qwen3MoE
  num_experts patch. The previous 1-space prefix would have produced
  invalid Python (IndentationError) at runtime in the patched file.

The post-correction tree of unsloth_zoo/llama_cpp.py matches the
parent commit 2ac599 (the last good state before the bad scrub) byte
for byte.

* Harden local convert_hf_to_gguf.py override

Cache implementation: drop lru_cache from 8 entries to 1. The cached
function returns a path to a single shared on-disk file at
LLAMA_CPP_DEFAULT_DIR/{name}.py, which is overwritten by every cache miss.
Holding more than one entry in memory therefore returns stale architecture
metadata for any non-active key while the disk content reflects only the
last miss.

Local override resolution: raise RuntimeError when
UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set but invalid (non-directory, missing
converter, or stat failure). The previous warn-and-return-None silently
fell back to the network download, defeating an explicit pin.

Network timeout: add timeout=30 to requests.get on the converter URL,
matching the existing timeout convention used elsewhere in the module.

Compatibility shim: stop forwarding __wrapped__ from the cached inner
function. The inner signature is (name, _local_script_info) while the
public wrapper accepts only name, so inspect.unwrap(fn)() previously
raised TypeError. cache_clear, cache_info, and cache_parameters are
still forwarded for parity with the pre-split surface.

* Unify llama.cpp converter discovery and tighten timeout/wording

Discovery: extract LLAMA_CPP_CONVERTER_FILENAMES module constant and use it
from both _resolve_local_convert_script and check_llama_cpp so both paths
agree on filename precedence. Switch check_llama_cpp from os.path.exists
to os.path.isfile so a directory accidentally named convert_hf_to_gguf.py
is no longer returned as a runnable converter.

Network timeout: split requests.get timeout into a (connect, read) tuple
of (5, 30) so connection-phase failures resolve in 5 seconds instead of
waiting the full 30s read window.

Diagnostics: replace the 'download or introspection' phrasing in the
shared error path with 'loading or introspecting' so the message accurately
covers both the local-file read branch and the network branch.

The four touched lines (converter discovery loop, exists-vs-isfile check,
and the two error strings) trace back via git blame to commit 70f2ce7
"[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build
System and Multi-Modal Support (#302)". The original behavior is
preserved: the same converter filenames are discovered (just via a shared
constant), os.path.isfile is a strict subset of os.path.exists that only
rejects directories misnamed like the converter (which cannot be executed
anyway), and the wording change is purely diagnostic and now correctly
covers the local-read branch added in this change set.

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: danielhanchen <info@unsloth.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants