Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR#621
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to use a local convert_hf_to_gguf.py script via the UNSLOTH_LLAMA_CPP_SCRIPTS_DIR environment variable, providing a fallback to network downloads if the local script is unavailable. Feedback includes a recommendation to refactor the caching logic by resolving the local script path outside the @lru_cache decorated function to correctly handle environment variable changes. Additionally, it is suggested to replace a print statement with logger.info for consistency with the existing logging infrastructure.
| original_content = response.content | ||
| # 1. Obtain the file (local override takes precedence over network) | ||
| if _local_script is not None: | ||
| print(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}") |
There was a problem hiding this comment.
Use logger.info instead of print for consistency with the rest of the module and to ensure that the message is handled according to the configured logging settings.
| print(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}") | |
| logger.info(f"Unsloth: Using local convert_hf_to_gguf.py from {_local_script}") |
There was a problem hiding this comment.
the reason I used print is so that we can see the message in studio. I think llama.cpp logger.info's get filtered out.
Add _resolve_local_convert_script helper and an opt-in path that reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead of fetching master from GitHub. Lets callers (e.g. Studio) pin the convert script to the same llama.cpp ref as their llama-quantize binary and gguf-py, avoiding API drift. The original _download_convert_hf_to_gguf is now a thin wrapper that resolves the env var on every call and forwards the resolved path to the @lru_cache'd implementation as part of its cache key. Resolving inside the cached function would have left subsequent calls pinned to the first resolution, since lru_cache keys on the args passed (always the default) rather than values mutated inside the body. Default behavior (env var unset) is unchanged: download from master.
Add _resolve_local_convert_script() helper and an opt-in path that reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead of fetching master from GitHub. Lets callers (e.g. Studio) pin the convert script to the same llama.cpp ref as their llama-quantize binary and gguf-py, avoiding API drift. The original _download_convert_hf_to_gguf is now a thin wrapper that resolves the env var on every call and forwards the resolved path to the @lru_cache'd implementation as part of its cache key. Resolving inside the cached function would have left subsequent calls pinned to the first resolution, since lru_cache keys on the args passed (always the default) rather than values mutated inside the body. Default behavior (env var unset) is unchanged: download from master.
6315061 to
ee7a105
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee7a105629
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if os.path.exists(candidate): | ||
| return candidate |
There was a problem hiding this comment.
Accept only file paths for local convert override
_resolve_local_convert_script() currently treats any existing path named convert_hf_to_gguf.py as valid. If that path is a directory (or another non-regular entry), _download_convert_hf_to_gguf_cached() will later call open(..., "rb") and fail with IsADirectoryError, causing conversion to abort instead of cleanly falling back to the network path. This is an actionable regression in the new local-override flow; the candidate check should require a regular file (e.g. os.path.isfile) before returning it.
Useful? React with 👍 / 👎.
|
This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:
If this PR fixes any of them, consider adding |
|
Auto-review verdict: Approved Adds opt-in UNSLOTH_LLAMA_CPP_SCRIPTS_DIR support so callers can pin convert_hf_to_gguf.py and its sibling gguf-py to a specific llama.cpp checkout instead of always fetching master, eliminating API drift between the converter script and the gguf module. Default network-download behavior is preserved when the env var is unset, and the converter cache, in-process introspection, and converter subprocess all consistently use the pinned checkout when it is set. Reason: Local converter override works end-to-end with stat-keyed cache, hash-suffixed local filenames, and resolver-gated subprocess gguf-py propagation; all P1/P2 findings across 8 iterations were either fixed or rejected as out-of-scope. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b4595adffa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| replacement = ( | ||
| b"# Qwen3MoE seems to use num_local_experts instead of num_experts\n" | ||
| b" n_experts = self.hparams.get('num_experts', None) or self.hparams.get('num_local_experts')" | ||
| b" n_experts = self.hparams.get('num_experts', None) or self.hparams.get('num_local_experts')" |
There was a problem hiding this comment.
Keep injected
n_experts line at original indentation
The replacement for the Qwen MoE patch now injects " n_experts = ..." with a single leading space, but the matched source line is inside an indented block. When this regex matches (as it does on llama.cpp versions containing n_experts = self.hparams[...]), the generated converter script ends up with inconsistent indentation and fails to execute with a Python indentation/syntax error, so GGUF conversion aborts before processing any model.
Useful? React with 👍 / 👎.
b4595ad to
ee7a105
Compare
Add unsloth_compiled_cache to gitignore
|
This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:
If this PR fixes any of them, consider adding |
_resolve_local_convert_script: normalize UNSLOTH_LLAMA_CPP_SCRIPTS_DIR via os.path.expanduser + os.path.abspath so the cache key on the cached implementation is stable across cwd changes; switch os.path.exists -> os.path.isfile so a directory or socket bearing the converter name no longer wins the lookup; return (abs_path, mtime_ns, size) so in-place edits to the same path invalidate the lru_cache entry. _download_convert_hf_to_gguf_cached: unpack the new tuple, replace the bare print with logger.info to match the rest of the function, and initialize text_archs and vision_archs alongside supported_types so a converter that lacks ModelType.MMPROJ (or where ModelBase / ModelType is missing entirely) no longer raises UnboundLocalError on the unconditional frozenset(...) calls. Re-attach cache_clear / cache_info from the cached implementation onto the public _download_convert_hf_to_gguf wrapper so the pre-existing lru_cache surface (in __all__) keeps working for external callers.
_resolve_local_convert_script: wrap the per-candidate isfile/stat block in try/except OSError so a vanishing-during-update converter falls through to the next name (and ultimately the network fallback) instead of escaping a bare FileNotFoundError out of the resolver. Final fall-through warning now lists both convert_hf_to_gguf.py and convert-hf-to-gguf.py since the loop tries both names. Fix typo on the cached function comment: 'Github report' -> 'GitHub repository'. @lru_cache(2) -> @lru_cache(8): two slots is too small once the cache key includes (path, mtime_ns, size). Any second distinct local identity (post-update mtime change, distinct SCRIPTS_DIR path) would evict the (name, None) network slot and force a refetch on a later unset-env call. Eight slots cover realistic local-identity churn at trivial memory cost. Re-attach cache_parameters and __wrapped__ on the public _download_convert_hf_to_gguf wrapper so the full pre-existing lru_cache surface (cache_clear, cache_info, cache_parameters, __wrapped__) is preserved for external callers and inspect.unwrap.
|
Auto-review verdict: Approved Adds an UNSLOTH_LLAMA_CPP_SCRIPTS_DIR env var so users can pin a local convert_hf_to_gguf.py instead of always downloading from llama.cpp master. Useful for offline / version-pinned environments and Studio-style long-lived processes; the post-review version normalizes the path, hashes file identity into the cache key, gracefully handles vanishing files, preserves the full pre-existing lru_cache surface, and fixes a latent UnboundLocalError on text-only converters. Reason: All accepted findings fixed; resolver/cache hardening complete; ancestry restored and pushed cleanly. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 274b11f983
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @lru_cache(8) | ||
| def _download_convert_hf_to_gguf_cached(name, _local_script_info): |
There was a problem hiding this comment.
Avoid reusing cached entries across script source switches
_download_convert_hf_to_gguf_cached is now keyed by _local_script_info, but it always writes to the same patched_filename ({LLAMA_CPP_DEFAULT_DIR}/{name}.py). If a caller runs once with network source, then with UNSLOTH_LLAMA_CPP_SCRIPTS_DIR, then switches back, the third call can hit the old cache entry and skip regeneration while the on-disk file still contains the local-script version from the second call. In that sequence, env-var changes are not actually honored and conversion may run with the wrong converter content.
Useful? React with 👍 / 👎.
| _download_convert_hf_to_gguf.cache_clear = _download_convert_hf_to_gguf_cached.cache_clear | ||
| _download_convert_hf_to_gguf.cache_info = _download_convert_hf_to_gguf_cached.cache_info | ||
| _download_convert_hf_to_gguf.cache_parameters = _download_convert_hf_to_gguf_cached.cache_parameters | ||
| _download_convert_hf_to_gguf.__wrapped__ = _download_convert_hf_to_gguf_cached.__wrapped__ |
There was a problem hiding this comment.
Preserve
__wrapped__ with the public function signature
Assigning _download_convert_hf_to_gguf.__wrapped__ = _download_convert_hf_to_gguf_cached.__wrapped__ exposes an unwrapped callable that requires the extra _local_script_info positional argument, whereas the previous __wrapped__ accepted just name (defaulted). Callers using inspect.unwrap or invoking .__wrapped__() directly will now get a TypeError, despite the comment stating this surface is preserved.
Useful? React with 👍 / 👎.
Reverts the cosmetic whitespace edits introduced by 8b4fd40 ("Scrub leaked references from comments and string literals") that did not correspond to any leak token and that damaged whitespace-significant content: - Restored canonical "PURPOSE. See" / "program. If not" double space in the two GPL/LGPL license headers. - Restored 4-space docstring continuations in _resolve_local_convert_script, _find_visual_studio, _find_openssl_root, _find_lib_path. - Restored the 12-space indent inside the bytes-literal carrying source code substituted via re.sub into convert_hf_to_gguf.py for the Qwen3MoE num_experts patch. The previous 1-space prefix would have produced invalid Python (IndentationError) at runtime in the patched file. The post-correction tree of unsloth_zoo/llama_cpp.py matches the parent commit 2ac599 (the last good state before the bad scrub) byte for byte.
|
This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:
…and 1 more — see orchestrator state for the full list. If this PR fixes any of them, consider adding |
Cache implementation: drop lru_cache from 8 entries to 1. The cached
function returns a path to a single shared on-disk file at
LLAMA_CPP_DEFAULT_DIR/{name}.py, which is overwritten by every cache miss.
Holding more than one entry in memory therefore returns stale architecture
metadata for any non-active key while the disk content reflects only the
last miss.
Local override resolution: raise RuntimeError when
UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set but invalid (non-directory, missing
converter, or stat failure). The previous warn-and-return-None silently
fell back to the network download, defeating an explicit pin.
Network timeout: add timeout=30 to requests.get on the converter URL,
matching the existing timeout convention used elsewhere in the module.
Compatibility shim: stop forwarding __wrapped__ from the cached inner
function. The inner signature is (name, _local_script_info) while the
public wrapper accepts only name, so inspect.unwrap(fn)() previously
raised TypeError. cache_clear, cache_info, and cache_parameters are
still forwarded for parity with the pre-split surface.
Discovery: extract LLAMA_CPP_CONVERTER_FILENAMES module constant and use it from both _resolve_local_convert_script and check_llama_cpp so both paths agree on filename precedence. Switch check_llama_cpp from os.path.exists to os.path.isfile so a directory accidentally named convert_hf_to_gguf.py is no longer returned as a runnable converter. Network timeout: split requests.get timeout into a (connect, read) tuple of (5, 30) so connection-phase failures resolve in 5 seconds instead of waiting the full 30s read window. Diagnostics: replace the 'download or introspection' phrasing in the shared error path with 'loading or introspecting' so the message accurately covers both the local-file read branch and the network branch. The four touched lines (converter discovery loop, exists-vs-isfile check, and the two error strings) trace back via git blame to commit 70f2ce7 "[Part 1] Complete llama.cpp Integration Overhaul with Enhanced Build System and Multi-Modal Support (unslothai#302)". The original behavior is preserved: the same converter filenames are discovered (just via a shared constant), os.path.isfile is a strict subset of os.path.exists that only rejects directories misnamed like the converter (which cannot be executed anyway), and the wording change is purely diagnostic and now correctly covers the local-read branch added in this change set.
|
Auto-review verdict: Approved PR adds an opt-in UNSLOTH_LLAMA_CPP_SCRIPTS_DIR override so unsloth-zoo can load convert_hf_to_gguf.py from a pinned local llama.cpp checkout instead of always downloading from master, with the cache key tracking mtime/size so in-place updates are honored. Hardening on top of the original change makes the env-var pin fail closed, drops a single-slot lru_cache, adds a (5,30) requests timeout, removes a misleading wrapped surface, and unifies converter discovery between the new resolver and check_llama_cpp. Reason: All real review findings fixed; remaining concerns are stylistic or hypothetical and do not block merge. |
Add _resolve_local_convert_script() helper and an opt-in path in _download_convert_hf_to_gguf() that reads convert_hf_to_gguf.py from UNSLOTH_LLAMA_CPP_SCRIPTS_DIR instead of fetching master from GitHub. Lets callers (e.g. Studio) pin the convert script to the same llama.cpp ref as their llama-quantize binary and gguf-py, avoiding API drift. The cache key now includes _local_script so env-var changes are honored. Default behavior (env var unset) is unchanged: download from master.