Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR by mmathew23 · Pull Request #621 · unslothai/unsloth-zoo

mmathew23 · 2026-05-04T20:26:06Z

Add _resolve_local_convert_script() helper and an opt-in path in _download_convert_hf_to_gguf() that reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead of fetching master from GitHub. Lets callers (e.g. Studio) pin the convert script to the same llama.cpp ref as their llama-quantize binary and gguf-py, avoiding API drift. The cache key now includes _local_script so env-var changes are honored. Default behavior (env var unset) is unchanged: download from master.

gemini-code-assist

Code Review

This pull request introduces the ability to use a local convert_hf_to_gguf.py script via the UNSLOTH_LLAMA_CPP_SCRIPTS_DIR environment variable, providing a fallback to network downloads if the local script is unavailable. Feedback includes a recommendation to refactor the caching logic by resolving the local script path outside the @lru_cache decorated function to correctly handle environment variable changes. Additionally, it is suggested to replace a print statement with logger.info for consistency with the existing logging infrastructure.

gemini-code-assist · 2026-05-04T20:32:04Z

-        original_content = response.content
+        # 1. Obtain the file (local override takes precedence over network)
+        if _local_script is not None:
+            print(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}")


Use logger.info instead of print for consistency with the rest of the module and to ensure that the message is handled according to the configured logging settings.

Suggested change

print(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}")

logger.info(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}")

the reason I used print is so that we can see the message in studio. I think llama.cpp logger.info's get filtered out.

Add _resolve_local_convert_script helper and an opt-in path that reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead of fetching master from GitHub. Lets callers (e.g. Studio) pin the convert script to the same llama.cpp ref as their llama-quantize binary and gguf-py, avoiding API drift. The original _download_convert_hf_to_gguf is now a thin wrapper that resolves the env var on every call and forwards the resolved path to the @lru_cache'd implementation as part of its cache key. Resolving inside the cached function would have left subsequent calls pinned to the first resolution, since lru_cache keys on the args passed (always the default) rather than values mutated inside the body. Default behavior (env var unset) is unchanged: download from master.

Add _resolve_local_convert_script() helper and an opt-in path that reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead of fetching master from GitHub. Lets callers (e.g. Studio) pin the convert script to the same llama.cpp ref as their llama-quantize binary and gguf-py, avoiding API drift. The original _download_convert_hf_to_gguf is now a thin wrapper that resolves the env var on every call and forwards the resolved path to the @lru_cache'd implementation as part of its cache key. Resolving inside the cached function would have left subsequent calls pinned to the first resolution, since lru_cache keys on the args passed (always the default) rather than values mutated inside the body. Default behavior (env var unset) is unchanged: download from master.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee7a105629

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-04T20:49:00Z

+        if os.path.exists(candidate):
+            return candidate


Accept only file paths for local convert override

_resolve_local_convert_script() currently treats any existing path named convert_hf_to_gguf.py as valid. If that path is a directory (or another non-regular entry), _download_convert_hf_to_gguf_cached() will later call open(..., "rb") and fail with IsADirectoryError, causing conversion to abort instead of cleanly falling back to the network path. This is an actionable regression in the new local-override flow; the candidate check should require a regular file (e.g. os.path.isfile) before returning it.

Useful? React with 👍 / 👎.

danielhanchen · 2026-05-04T23:12:45Z

This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:

unslothai/unsloth#2365 — @xyehya — GGUF export fails because bundled/downstream llama.cpp converter lacks needed support; local converter override lets users use a compatible pinned script.

If this PR fixes any of them, consider adding closes #N / resolves #N to the description so the issue auto-closes on merge. If the match is wrong, ignore this comment.

danielhanchen · 2026-05-05T02:22:25Z

Auto-review verdict: Approved

Adds opt-in UNSLOTH_LLAMA_CPP_SCRIPTS_DIR support so callers can pin convert_hf_to_gguf.py and its sibling gguf-py to a specific llama.cpp checkout instead of always fetching master, eliminating API drift between the converter script and the gguf module. Default network-download behavior is preserved when the env var is unset, and the converter cache, in-process introspection, and converter subprocess all consistently use the pinned checkout when it is set.

Reason: Local converter override works end-to-end with stat-keyed cache, hash-suffixed local filenames, and resolver-gated subprocess gguf-py propagation; all P1/P2 findings across 8 iterations were either fixed or rejected as out-of-scope.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b4595adffa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T02:26:52Z

            replacement = (
                b"# Qwen3MoE seems to use num_local_experts instead of num_experts\n"
-                b"            n_experts = self.hparams.get('num_experts', None) or self.hparams.get('num_local_experts')"
+                b" n_experts = self.hparams.get('num_experts', None) or self.hparams.get('num_local_experts')"


Keep injected n_experts line at original indentation

The replacement for the Qwen MoE patch now injects " n_experts = ..." with a single leading space, but the matched source line is inside an indented block. When this regex matches (as it does on llama.cpp versions containing n_experts = self.hparams[...]), the generated converter script ends up with inconsistent indentation and fails to execute with a Python indentation/syntax error, so GGUF conversion aborts before processing any model.

Useful? React with 👍 / 👎.

Add unsloth_compiled_cache to gitignore

danielhanchen · 2026-05-05T06:53:23Z

This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:

unslothai/unsloth#3581 — @weoieoeo — GGUF export failure likely from llama.cpp/gguf-py mismatch; PR lets users use a matching local converter script.
unslothai/unsloth#2365 — @xyehya — Adds local llama.cpp converter override for GGUF export, addressing failures from bundled/upstream converter lacking Phi-4 architecture support.

If this PR fixes any of them, consider adding closes #N / resolves #N to the description so the issue auto-closes on merge. If the match is wrong, ignore this comment.

_resolve_local_convert_script: normalize UNSLOTH_LLAMA_CPP_SCRIPTS_DIR via os.path.expanduser + os.path.abspath so the cache key on the cached implementation is stable across cwd changes; switch os.path.exists -> os.path.isfile so a directory or socket bearing the converter name no longer wins the lookup; return (abs_path, mtime_ns, size) so in-place edits to the same path invalidate the lru_cache entry. _download_convert_hf_to_gguf_cached: unpack the new tuple, replace the bare print with logger.info to match the rest of the function, and initialize text_archs and vision_archs alongside supported_types so a converter that lacks ModelType.MMPROJ (or where ModelBase / ModelType is missing entirely) no longer raises UnboundLocalError on the unconditional frozenset(...) calls. Re-attach cache_clear / cache_info from the cached implementation onto the public _download_convert_hf_to_gguf wrapper so the pre-existing lru_cache surface (in __all__) keeps working for external callers.

_resolve_local_convert_script: wrap the per-candidate isfile/stat block in try/except OSError so a vanishing-during-update converter falls through to the next name (and ultimately the network fallback) instead of escaping a bare FileNotFoundError out of the resolver. Final fall-through warning now lists both convert_hf_to_gguf.py and convert-hf-to-gguf.py since the loop tries both names. Fix typo on the cached function comment: 'Github report' -> 'GitHub repository'. @lru_cache(2) -> @lru_cache(8): two slots is too small once the cache key includes (path, mtime_ns, size). Any second distinct local identity (post-update mtime change, distinct SCRIPTS_DIR path) would evict the (name, None) network slot and force a refetch on a later unset-env call. Eight slots cover realistic local-identity churn at trivial memory cost. Re-attach cache_parameters and __wrapped__ on the public _download_convert_hf_to_gguf wrapper so the full pre-existing lru_cache surface (cache_clear, cache_info, cache_parameters, __wrapped__) is preserved for external callers and inspect.unwrap.

danielhanchen · 2026-05-05T07:52:18Z

Auto-review verdict: Approved

Adds an UNSLOTH_LLAMA_CPP_SCRIPTS_DIR env var so users can pin a local convert_hf_to_gguf.py instead of always downloading from llama.cpp master. Useful for offline / version-pinned environments and Studio-style long-lived processes; the post-review version normalizes the path, hashes file identity into the cache key, gracefully handles vanishing files, preserves the full pre-existing lru_cache surface, and fixes a latent UnboundLocalError on text-only converters.

Reason: All accepted findings fixed; resolver/cache hardening complete; ancestry restored and pushed cleanly.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 274b11f983

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T07:54:08Z

+@lru_cache(8)
+def _download_convert_hf_to_gguf_cached(name, _local_script_info):


Avoid reusing cached entries across script source switches

_download_convert_hf_to_gguf_cached is now keyed by _local_script_info, but it always writes to the same patched_filename ({LLAMA_CPP_DEFAULT_DIR}/{name}.py). If a caller runs once with network source, then with UNSLOTH_LLAMA_CPP_SCRIPTS_DIR, then switches back, the third call can hit the old cache entry and skip regeneration while the on-disk file still contains the local-script version from the second call. In that sequence, env-var changes are not actually honored and conversion may run with the wrong converter content.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-05T07:54:08Z

+_download_convert_hf_to_gguf.cache_clear = _download_convert_hf_to_gguf_cached.cache_clear
+_download_convert_hf_to_gguf.cache_info = _download_convert_hf_to_gguf_cached.cache_info
+_download_convert_hf_to_gguf.cache_parameters = _download_convert_hf_to_gguf_cached.cache_parameters
+_download_convert_hf_to_gguf.__wrapped__ = _download_convert_hf_to_gguf_cached.__wrapped__


Preserve __wrapped__ with the public function signature

Assigning _download_convert_hf_to_gguf.__wrapped__ = _download_convert_hf_to_gguf_cached.__wrapped__ exposes an unwrapped callable that requires the extra _local_script_info positional argument, whereas the previous __wrapped__ accepted just name (defaulted). Callers using inspect.unwrap or invoking .__wrapped__() directly will now get a TypeError, despite the comment stating this surface is preserved.

Useful? React with 👍 / 👎.

Reverts the cosmetic whitespace edits introduced by 8b4fd40 ("Scrub leaked references from comments and string literals") that did not correspond to any leak token and that damaged whitespace-significant content: - Restored canonical "PURPOSE. See" / "program. If not" double space in the two GPL/LGPL license headers. - Restored 4-space docstring continuations in _resolve_local_convert_script, _find_visual_studio, _find_openssl_root, _find_lib_path. - Restored the 12-space indent inside the bytes-literal carrying source code substituted via re.sub into convert_hf_to_gguf.py for the Qwen3MoE num_experts patch. The previous 1-space prefix would have produced invalid Python (IndentationError) at runtime in the patched file. The post-correction tree of unsloth_zoo/llama_cpp.py matches the parent commit 2ac599 (the last good state before the bad scrub) byte for byte.

danielhanchen · 2026-05-05T09:23:55Z

This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:

unslothai/unsloth#4557 — @gary-mu — GGUF export fails inside llama.cpp convert_hf_to_gguf.py; PR allows pinning a local compatible converter instead of drifting upstream master.
unslothai/unsloth#3581 — @weoieoeo — GGUF export failure likely from mismatched llama.cpp/gguf tooling; PR allows using a local matching converter script.
unslothai/unsloth#3124 — @Oleg777778 — GGUF conversion failures from llama.cpp converter/version drift align with PR allowing a local matching convert_hf_to_gguf.py script.
unslothai/unsloth#2365 — @xyehya — GGUF conversion fails because embedded llama.cpp converter lacks Phi-4 support; PR lets users point Unsloth at a compatible local converter.
unslothai/unsloth#1551 — @poochat — GGUF export fails inside llama.cpp HF-to-GGUF converter; PR lets users pin a compatible local converter to avoid upstream drift.

…and 1 more — see orchestrator state for the full list.

If this PR fixes any of them, consider adding closes #N / resolves #N to the description so the issue auto-closes on merge. If the match is wrong, ignore this comment.

Cache implementation: drop lru_cache from 8 entries to 1. The cached function returns a path to a single shared on-disk file at LLAMA_CPP_DEFAULT_DIR/{name}.py, which is overwritten by every cache miss. Holding more than one entry in memory therefore returns stale architecture metadata for any non-active key while the disk content reflects only the last miss. Local override resolution: raise RuntimeError when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set but invalid (non-directory, missing converter, or stat failure). The previous warn-and-return-None silently fell back to the network download, defeating an explicit pin. Network timeout: add timeout=30 to requests.get on the converter URL, matching the existing timeout convention used elsewhere in the module. Compatibility shim: stop forwarding __wrapped__ from the cached inner function. The inner signature is (name, _local_script_info) while the public wrapper accepts only name, so inspect.unwrap(fn)() previously raised TypeError. cache_clear, cache_info, and cache_parameters are still forwarded for parity with the pre-split surface.

Discovery: extract LLAMA_CPP_CONVERTER_FILENAMES module constant and use it from both _resolve_local_convert_script and check_llama_cpp so both paths agree on filename precedence. Switch check_llama_cpp from os.path.exists to os.path.isfile so a directory accidentally named convert_hf_to_gguf.py is no longer returned as a runnable converter. Network timeout: split requests.get timeout into a (connect, read) tuple of (5, 30) so connection-phase failures resolve in 5 seconds instead of waiting the full 30s read window. Diagnostics: replace the 'download or introspection' phrasing in the shared error path with 'loading or introspecting' so the message accurately covers both the local-file read branch and the network branch. The four touched lines (converter discovery loop, exists-vs-isfile check, and the two error strings) trace back via git blame to commit 70f2ce7 "[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build System and Multi-Modal Support (unslothai#302)". The original behavior is preserved: the same converter filenames are discovered (just via a shared constant), os.path.isfile is a strict subset of os.path.exists that only rejects directories misnamed like the converter (which cannot be executed anyway), and the wording change is purely diagnostic and now correctly covers the local-read branch added in this change set.

danielhanchen · 2026-05-05T10:08:23Z

Auto-review verdict: Approved

PR adds an opt-in UNSLOTH_LLAMA_CPP_SCRIPTS_DIR override so unsloth-zoo can load convert_hf_to_gguf.py from a pinned local llama.cpp checkout instead of always downloading from master, with the cache key tracking mtime/size so in-place updates are honored. Hardening on top of the original change makes the env-var pin fail closed, drops a single-slot lru_cache, adds a (5,30) requests timeout, removes a misleading wrapped surface, and unifies converter discovery between the new resolver and check_llama_cpp.

Reason: All real review findings fixed; remaining concerns are stylistic or hypothetical and do not block merge.

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

mmathew23 added 2 commits May 4, 2026 20:43

mmathew23 force-pushed the feature/use-local-llama-cpp-scripts branch from 6315061 to ee7a105 Compare May 4, 2026 20:45

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

mmathew23 mentioned this pull request May 4, 2026

Pin Studio GGUF export to llama.cpp's local convert script unslothai/unsloth#5275

Merged

danielhanchen added the auto-addresses-issue Pre-flight: appears to address an open issue label May 4, 2026

danielhanchen added the auto-reviewing Auto-review in progress label May 4, 2026

danielhanchen mentioned this pull request May 5, 2026

[tests] Review tests for PR #621 shimmyshimmer/unsloth-zoo-staging-2#19

Closed

danielhanchen added auto-approved Auto-review approved the PR and removed auto-reviewing Auto-review in progress labels May 5, 2026

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

danielhanchen force-pushed the feature/use-local-llama-cpp-scripts branch from b4595ad to ee7a105 Compare May 5, 2026 04:32

Datta0 added 2 commits May 5, 2026 04:41

Add unsloth_compiled_cache to gitignore

68e777a

Merge pull request unslothai#622 from Datta0/ignore_cache_git

50e614d

Add unsloth_compiled_cache to gitignore

danielhanchen mentioned this pull request May 5, 2026

Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR danielhanchen/unsloth-zoo-staging#12

Closed

Merge remote-tracking branch 'origin/main' into

0695fef

danielhanchen added the auto-reviewing Auto-review in progress label May 5, 2026

danielhanchen mentioned this pull request May 5, 2026

Pin Studio GGUF export to llama.cpp's local convert script danielhanchen/unsloth-staging-2#97

Closed

danielhanchen and others added 5 commits May 5, 2026 07:13

Tighten convert_hf_to_gguf comments

2ac5996

Scrub leaked references from comments and string literals

8b4fd40

Merge upstream feature/use-local-llama-cpp-scripts

274b11f

danielhanchen removed the auto-reviewing Auto-review in progress label May 5, 2026

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

danielhanchen added the auto-reviewing Auto-review in progress label May 5, 2026

danielhanchen added 2 commits May 5, 2026 09:41

danielhanchen removed the auto-reviewing Auto-review in progress label May 5, 2026

danielhanchen merged commit 58a2602 into unslothai:main May 5, 2026

	print(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}")
	logger.info(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}")

		@lru_cache(8)
		def _download_convert_hf_to_gguf_cached(name, _local_script_info):

Conversation

mmathew23 commented May 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

mmathew23 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 4, 2026

Uh oh!

danielhanchen commented May 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 5, 2026

Uh oh!

danielhanchen commented May 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 5, 2026

Uh oh!

danielhanchen commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants