Skip to content

Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR#621

Merged
danielhanchen merged 13 commits into
unslothai:mainfrom
mmathew23:feature/use-local-llama-cpp-scripts
May 5, 2026
Merged

Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR#621
danielhanchen merged 13 commits into
unslothai:mainfrom
mmathew23:feature/use-local-llama-cpp-scripts

Conversation

@mmathew23

Copy link
Copy Markdown
Collaborator

Add _resolve_local_convert_script() helper and an opt-in path in _download_convert_hf_to_gguf() that reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead of fetching master from GitHub. Lets callers (e.g. Studio) pin the convert script to the same llama.cpp ref as their llama-quantize binary and gguf-py, avoiding API drift. The cache key now includes _local_script so env-var changes are honored. Default behavior (env var unset) is unchanged: download from master.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to use a local convert_hf_to_gguf.py script via the UNSLOTH_LLAMA_CPP_SCRIPTS_DIR environment variable, providing a fallback to network downloads if the local script is unavailable. Feedback includes a recommendation to refactor the caching logic by resolving the local script path outside the @lru_cache decorated function to correctly handle environment variable changes. Additionally, it is suggested to replace a print statement with logger.info for consistency with the existing logging infrastructure.

Comment thread unsloth_zoo/llama_cpp.py Outdated
Comment thread unsloth_zoo/llama_cpp.py Outdated
original_content = response.content
# 1. Obtain the file (local override takes precedence over network)
if _local_script is not None:
print(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use logger.info instead of print for consistency with the rest of the module and to ensure that the message is handled according to the configured logging settings.

Suggested change
print(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}")
logger.info(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason I used print is so that we can see the message in studio. I think llama.cpp logger.info's get filtered out.

mmathew23 added 2 commits May 4, 2026 20:43
Add _resolve_local_convert_script helper and an opt-in path that
reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead
of fetching master from GitHub. Lets callers (e.g. Studio) pin the
convert script to the same llama.cpp ref as their llama-quantize
binary and gguf-py, avoiding API drift.

The original _download_convert_hf_to_gguf is now a thin wrapper that
resolves the env var on every call and forwards the resolved path to
the @lru_cache'd implementation as part of its cache key. Resolving
inside the cached function would have left subsequent calls pinned to
the first resolution, since lru_cache keys on the args passed (always
the default) rather than values mutated inside the body.

Default behavior (env var unset) is unchanged: download from master.
Add _resolve_local_convert_script() helper and an opt-in path that
reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead
of fetching master from GitHub. Lets callers (e.g. Studio) pin the
convert script to the same llama.cpp ref as their llama-quantize
binary and gguf-py, avoiding API drift.

The original _download_convert_hf_to_gguf is now a thin wrapper that
resolves the env var on every call and forwards the resolved path to
the @lru_cache'd implementation as part of its cache key. Resolving
inside the cached function would have left subsequent calls pinned to
the first resolution, since lru_cache keys on the args passed (always
the default) rather than values mutated inside the body.

Default behavior (env var unset) is unchanged: download from master.
@mmathew23 mmathew23 force-pushed the feature/use-local-llama-cpp-scripts branch from 6315061 to ee7a105 Compare May 4, 2026 20:45

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee7a105629

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread unsloth_zoo/llama_cpp.py Outdated
Comment on lines +136 to +137
if os.path.exists(candidate):
return candidate

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept only file paths for local convert override

_resolve_local_convert_script() currently treats any existing path named convert_hf_to_gguf.py as valid. If that path is a directory (or another non-regular entry), _download_convert_hf_to_gguf_cached() will later call open(..., "rb") and fail with IsADirectoryError, causing conversion to abort instead of cleanly falling back to the network path. This is an actionable regression in the new local-override flow; the candidate check should require a regular file (e.g. os.path.isfile) before returning it.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member

This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:

  • unslothai/unsloth#2365@xyehya — GGUF export fails because bundled/downstream llama.cpp converter lacks needed support; local converter override lets users use a compatible pinned script.

If this PR fixes any of them, consider adding closes #N / resolves #N to the description so the issue auto-closes on merge. If the match is wrong, ignore this comment.

@danielhanchen danielhanchen added the auto-reviewing Auto-review in progress label May 4, 2026
@danielhanchen danielhanchen added auto-approved Auto-review approved the PR and removed auto-reviewing Auto-review in progress labels May 5, 2026
@danielhanchen

Copy link
Copy Markdown
Member

Auto-review verdict: Approved

Adds opt-in UNSLOTH_LLAMA_CPP_SCRIPTS_DIR support so callers can pin convert_hf_to_gguf.py and its sibling gguf-py to a specific llama.cpp checkout instead of always fetching master, eliminating API drift between the converter script and the gguf module. Default network-download behavior is preserved when the env var is unset, and the converter cache, in-process introspection, and converter subprocess all consistently use the pinned checkout when it is set.

Reason: Local converter override works end-to-end with stat-keyed cache, hash-suffixed local filenames, and resolver-gated subprocess gguf-py propagation; all P1/P2 findings across 8 iterations were either fixed or rejected as out-of-scope.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b4595adffa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/llama_cpp.py Outdated
replacement = (
b"# Qwen3MoE seems to use num_local_experts instead of num_experts\n"
b" n_experts = self.hparams.get('num_experts', None) or self.hparams.get('num_local_experts')"
b" n_experts = self.hparams.get('num_experts', None) or self.hparams.get('num_local_experts')"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep injected n_experts line at original indentation

The replacement for the Qwen MoE patch now injects " n_experts = ..." with a single leading space, but the matched source line is inside an indented block. When this regex matches (as it does on llama.cpp versions containing n_experts = self.hparams[...]), the generated converter script ends up with inconsistent indentation and fails to execute with a Python indentation/syntax error, so GGUF conversion aborts before processing any model.

Useful? React with 👍 / 👎.

@danielhanchen danielhanchen force-pushed the feature/use-local-llama-cpp-scripts branch from b4595ad to ee7a105 Compare May 5, 2026 04:32
@danielhanchen

Copy link
Copy Markdown
Member

This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:

  • unslothai/unsloth#3581@weoieoeo — GGUF export failure likely from llama.cpp/gguf-py mismatch; PR lets users use a matching local converter script.
  • unslothai/unsloth#2365@xyehya — Adds local llama.cpp converter override for GGUF export, addressing failures from bundled/upstream converter lacking Phi-4 architecture support.

If this PR fixes any of them, consider adding closes #N / resolves #N to the description so the issue auto-closes on merge. If the match is wrong, ignore this comment.

danielhanchen and others added 5 commits May 5, 2026 07:13
_resolve_local_convert_script: normalize UNSLOTH_LLAMA_CPP_SCRIPTS_DIR
via os.path.expanduser + os.path.abspath so the cache key on the
cached implementation is stable across cwd changes; switch
os.path.exists -> os.path.isfile so a directory or socket bearing
the converter name no longer wins the lookup; return
(abs_path, mtime_ns, size) so in-place edits to the same path
invalidate the lru_cache entry.
_download_convert_hf_to_gguf_cached: unpack the new tuple, replace
the bare print with logger.info to match the rest of the function,
and initialize text_archs and vision_archs alongside
supported_types so a converter that lacks ModelType.MMPROJ (or
where ModelBase / ModelType is missing entirely) no longer raises
UnboundLocalError on the unconditional frozenset(...) calls.
Re-attach cache_clear / cache_info from the cached implementation
onto the public _download_convert_hf_to_gguf wrapper so the
pre-existing lru_cache surface (in __all__) keeps working for
external callers.
_resolve_local_convert_script: wrap the per-candidate isfile/stat block
in try/except OSError so a vanishing-during-update converter falls
through to the next name (and ultimately the network fallback) instead
of escaping a bare FileNotFoundError out of the resolver.
Final fall-through warning now lists both convert_hf_to_gguf.py and
convert-hf-to-gguf.py since the loop tries both names.
Fix typo on the cached function comment: 'Github report' -> 'GitHub
repository'.
@lru_cache(2) -> @lru_cache(8): two slots is too small once the
cache key includes (path, mtime_ns, size). Any second distinct local
identity (post-update mtime change, distinct SCRIPTS_DIR path) would
evict the (name, None) network slot and force a refetch on a later
unset-env call. Eight slots cover realistic local-identity churn at
trivial memory cost.
Re-attach cache_parameters and __wrapped__ on the public
_download_convert_hf_to_gguf wrapper so the full pre-existing
lru_cache surface (cache_clear, cache_info, cache_parameters,
__wrapped__) is preserved for external callers and inspect.unwrap.
@danielhanchen danielhanchen removed the auto-reviewing Auto-review in progress label May 5, 2026
@danielhanchen

Copy link
Copy Markdown
Member

Auto-review verdict: Approved

Adds an UNSLOTH_LLAMA_CPP_SCRIPTS_DIR env var so users can pin a local convert_hf_to_gguf.py instead of always downloading from llama.cpp master. Useful for offline / version-pinned environments and Studio-style long-lived processes; the post-review version normalizes the path, hashes file identity into the cache key, gracefully handles vanishing files, preserves the full pre-existing lru_cache surface, and fixes a latent UnboundLocalError on text-only converters.

Reason: All accepted findings fixed; resolver/cache hardening complete; ancestry restored and pushed cleanly.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 274b11f983

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/llama_cpp.py Outdated
Comment on lines +903 to +904
@lru_cache(8)
def _download_convert_hf_to_gguf_cached(name, _local_script_info):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid reusing cached entries across script source switches

_download_convert_hf_to_gguf_cached is now keyed by _local_script_info, but it always writes to the same patched_filename ({LLAMA_CPP_DEFAULT_DIR}/{name}.py). If a caller runs once with network source, then with UNSLOTH_LLAMA_CPP_SCRIPTS_DIR, then switches back, the third call can hit the old cache entry and skip regeneration while the on-disk file still contains the local-script version from the second call. In that sequence, env-var changes are not actually honored and conversion may run with the wrong converter content.

Useful? React with 👍 / 👎.

Comment thread unsloth_zoo/llama_cpp.py Outdated
_download_convert_hf_to_gguf.cache_clear = _download_convert_hf_to_gguf_cached.cache_clear
_download_convert_hf_to_gguf.cache_info = _download_convert_hf_to_gguf_cached.cache_info
_download_convert_hf_to_gguf.cache_parameters = _download_convert_hf_to_gguf_cached.cache_parameters
_download_convert_hf_to_gguf.__wrapped__ = _download_convert_hf_to_gguf_cached.__wrapped__

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve __wrapped__ with the public function signature

Assigning _download_convert_hf_to_gguf.__wrapped__ = _download_convert_hf_to_gguf_cached.__wrapped__ exposes an unwrapped callable that requires the extra _local_script_info positional argument, whereas the previous __wrapped__ accepted just name (defaulted). Callers using inspect.unwrap or invoking .__wrapped__() directly will now get a TypeError, despite the comment stating this surface is preserved.

Useful? React with 👍 / 👎.

Reverts the cosmetic whitespace edits introduced by 8b4fd40 ("Scrub
leaked references from comments and string literals") that did not
correspond to any leak token and that damaged whitespace-significant
content:

- Restored canonical "PURPOSE.  See" / "program.  If not" double space
  in the two GPL/LGPL license headers.
- Restored 4-space docstring continuations in _resolve_local_convert_script,
  _find_visual_studio, _find_openssl_root, _find_lib_path.
- Restored the 12-space indent inside the bytes-literal carrying source
  code substituted via re.sub into convert_hf_to_gguf.py for the Qwen3MoE
  num_experts patch. The previous 1-space prefix would have produced
  invalid Python (IndentationError) at runtime in the patched file.

The post-correction tree of unsloth_zoo/llama_cpp.py matches the
parent commit 2ac599 (the last good state before the bad scrub) byte
for byte.
@danielhanchen

Copy link
Copy Markdown
Member

This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:

  • unslothai/unsloth#4557@gary-mu — GGUF export fails inside llama.cpp convert_hf_to_gguf.py; PR allows pinning a local compatible converter instead of drifting upstream master.
  • unslothai/unsloth#3581@weoieoeo — GGUF export failure likely from mismatched llama.cpp/gguf tooling; PR allows using a local matching converter script.
  • unslothai/unsloth#3124@Oleg777778 — GGUF conversion failures from llama.cpp converter/version drift align with PR allowing a local matching convert_hf_to_gguf.py script.
  • unslothai/unsloth#2365@xyehya — GGUF conversion fails because embedded llama.cpp converter lacks Phi-4 support; PR lets users point Unsloth at a compatible local converter.
  • unslothai/unsloth#1551@poochat — GGUF export fails inside llama.cpp HF-to-GGUF converter; PR lets users pin a compatible local converter to avoid upstream drift.

…and 1 more — see orchestrator state for the full list.

If this PR fixes any of them, consider adding closes #N / resolves #N to the description so the issue auto-closes on merge. If the match is wrong, ignore this comment.

@danielhanchen danielhanchen added the auto-reviewing Auto-review in progress label May 5, 2026
Cache implementation: drop lru_cache from 8 entries to 1. The cached
function returns a path to a single shared on-disk file at
LLAMA_CPP_DEFAULT_DIR/{name}.py, which is overwritten by every cache miss.
Holding more than one entry in memory therefore returns stale architecture
metadata for any non-active key while the disk content reflects only the
last miss.

Local override resolution: raise RuntimeError when
UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set but invalid (non-directory, missing
converter, or stat failure). The previous warn-and-return-None silently
fell back to the network download, defeating an explicit pin.

Network timeout: add timeout=30 to requests.get on the converter URL,
matching the existing timeout convention used elsewhere in the module.

Compatibility shim: stop forwarding __wrapped__ from the cached inner
function. The inner signature is (name, _local_script_info) while the
public wrapper accepts only name, so inspect.unwrap(fn)() previously
raised TypeError. cache_clear, cache_info, and cache_parameters are
still forwarded for parity with the pre-split surface.
Discovery: extract LLAMA_CPP_CONVERTER_FILENAMES module constant and use it
from both _resolve_local_convert_script and check_llama_cpp so both paths
agree on filename precedence. Switch check_llama_cpp from os.path.exists
to os.path.isfile so a directory accidentally named convert_hf_to_gguf.py
is no longer returned as a runnable converter.

Network timeout: split requests.get timeout into a (connect, read) tuple
of (5, 30) so connection-phase failures resolve in 5 seconds instead of
waiting the full 30s read window.

Diagnostics: replace the 'download or introspection' phrasing in the
shared error path with 'loading or introspecting' so the message accurately
covers both the local-file read branch and the network branch.

The four touched lines (converter discovery loop, exists-vs-isfile check,
and the two error strings) trace back via git blame to commit 70f2ce7
"[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build
System and Multi-Modal Support (unslothai#302)". The original behavior is
preserved: the same converter filenames are discovered (just via a shared
constant), os.path.isfile is a strict subset of os.path.exists that only
rejects directories misnamed like the converter (which cannot be executed
anyway), and the wording change is purely diagnostic and now correctly
covers the local-read branch added in this change set.
@danielhanchen danielhanchen removed the auto-reviewing Auto-review in progress label May 5, 2026
@danielhanchen

Copy link
Copy Markdown
Member

Auto-review verdict: Approved

PR adds an opt-in UNSLOTH_LLAMA_CPP_SCRIPTS_DIR override so unsloth-zoo can load convert_hf_to_gguf.py from a pinned local llama.cpp checkout instead of always downloading from master, with the cache key tracking mtime/size so in-place updates are honored. Hardening on top of the original change makes the env-var pin fail closed, drops a single-slot lru_cache, adds a (5,30) requests timeout, removes a misleading wrapped surface, and unifies converter discovery between the new resolver and check_llama_cpp.

Reason: All real review findings fixed; remaining concerns are stylistic or hypothetical and do not block merge.

@danielhanchen danielhanchen merged commit 58a2602 into unslothai:main May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-addresses-issue Pre-flight: appears to address an open issue auto-approved Auto-review approved the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants