fix: use sys.executable for pip and python subprocesses to support venv environments#503
Conversation
Summary of ChangesHello @rolandtannous, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and compatibility of GGUF conversion processes across various Python virtual environments. By ensuring that subprocess calls correctly target the active Python interpreter and its associated tools, it eliminates common installation and execution failures, leading to a more reliable user experience regardless of the environment setup. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0652b78787
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ] | ||
|
|
||
| PIP_OPTIONS = [ | ||
| f"{sys.executable} -m pip", # Always prefer the running interpreter's pip |
There was a problem hiding this comment.
Validate pip candidate before preferring sys.executable
Prepending f"{sys.executable} -m pip" to PIP_OPTIONS causes check_pip() to select that entry even when it cannot actually install packages (for example, interpreters where the pip module is absent), because check_pip() only filters on command-not-found text and does not check return codes. In this case try_execute(f"{pip} install ...") fails immediately and the code never reaches fallback options such as uv pip or pip3, so GGUF dependency installation regresses in those environments.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@codex review
pipmust exist in the current interpreter — the package was installed with it (pip install unsloth-zoo unsloth)- The concern is pre-existing —
check_pip()has the same flaw for uv pip and every other entry already but check the previous point. sys.executable -m pipis actually the safest entry in the list because it's the only one guaranteed to target the correct environment
| if print_output: print(f"\nUnsloth: Converting {description}...") | ||
| args_str = " ".join(f"{k} {v}" for k, v in args.items()) | ||
| command = f"python {converter_location} {args_str} {input_folder}" | ||
| command = f"{sys.executable} {converter_location} {args_str} {input_folder}" |
There was a problem hiding this comment.
Quote interpreter path in conversion shell command
The updated conversion command inserts sys.executable into a shell=True command string without quoting, so interpreter paths containing spaces (such as Windows installs under Program Files or workspace paths with spaces) are tokenized incorrectly and fail before running the converter. This makes GGUF conversion fail in those path layouts after this change.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
updated to
command = f'"{sys.executable}" {converter_location} {args_str} {input_folder}'
should be enough @codex review
There was a problem hiding this comment.
Code Review
This pull request correctly fixes an issue with using the wrong Python interpreter in virtual environments by using sys.executable. The changes are a definite improvement. However, I've identified a critical command injection vulnerability related to the way subprocess commands are constructed and executed with shell=True. While this is a pre-existing issue, the modified code is part of the vulnerable section. I've added a comment with details on the vulnerability and a recommendation for a fix, even though a fully applicable code suggestion is not possible due to repository API limitations.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5405fc8bff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ] | ||
|
|
||
| PIP_OPTIONS = [ | ||
| f"{sys.executable} -m pip", # Always prefer the running interpreter's pip |
There was a problem hiding this comment.
Skip sys.executable pip when pip module is unavailable
Adding f"{sys.executable} -m pip" as the first PIP_OPTIONS entry regresses uv-created environments that don’t seed pip by default (see uv venv --help: --seed is required to install pip). In check_pip(), availability is inferred only from “command not found” text and ignores non-zero exit codes, so python -m pip is treated as valid even when it fails with No module named pip; the later install then aborts and never falls through to uv pip, breaking GGUF setup for those users.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
pip must exist in the current interpreter — the main package was installed with it (pip install unsloth-zoo unsloth)
The concern is pre-existing — check_pip() has the same flaw for uv pip and every other entry already but check the previous point.
sys.executable -m pip is actually the safest entry in the list because it's the only one guaranteed to target the correct environment
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5405fc8bff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ] | ||
|
|
||
| PIP_OPTIONS = [ | ||
| f"{sys.executable} -m pip", # Always prefer the running interpreter's pip |
There was a problem hiding this comment.
Skip broken pip candidate before returning from check_pip
Placing f"{sys.executable} -m pip" first causes a regression in environments where the running interpreter has no pip module (for example, unseeded uv venvs): check_pip() only treats command not found output as failure and does not check the subprocess exit code, so it returns this candidate even when it fails with No module named pip. The later install command then fails immediately instead of falling back to uv pip/pip entries, so GGUF conversion can no longer proceed in those environments.
Useful? React with 👍 / 👎.
|
Pushed follow-up fixes to this PR branch in commit What I changed
Validation
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This is an excellent pull request that addresses a critical issue with virtual environment support. The detailed problem description and analysis are very helpful. The changes are well-implemented, not only fixing the immediate problem by using sys.executable but also significantly improving the robustness and security of the subprocess calls. The rewrite of check_pip is much more reliable, and the switch to shell=False in convert_to_gguf is a great security enhancement. I have one minor suggestion to further improve the code's readability and performance.
| tokens = shlex.split(pip) | ||
| except ValueError: | ||
| return False | ||
| if tokens in (["pip"], ["pip3"], ["uv", "pip"], ["poetry"]): |
There was a problem hiding this comment.
For better performance and readability, consider using a set of tuples for the known commands. This allows for O(1) average time complexity for lookups, which is more efficient than the current O(n) for a tuple of lists. It's a more idiomatic way to check for membership in a collection of fixed items.
| if tokens in (["pip"], ["pip3"], ["uv", "pip"], ["poetry"]): | |
| if tuple(tokens) in {("pip",), ("pip3",), ("uv", "pip"), ("poetry",)}: |
- convert_to_gguf builds an env that prepends UNSLOTH_LLAMA_CPP_SCRIPTS_DIR/gguf-py to PYTHONPATH (when that directory exists) and passes it via env= to both subprocess.run call sites. Without this, the parent process's sys.path mods made by use_local_gguf do not propagate into the converter's child interpreter, so a user that pinned a newer convert_hf_to_gguf.py via the env var ended up running it against the default-install or system-installed gguf and saw ModuleNotFoundError or version-mismatch failures. - use_local_gguf now also emits a logger.warning when UNSLOTH_LLAMA_CPP_SCRIPTS_DIR is set but the directory has no gguf-py/ subdirectory, mirroring the warning pattern in _resolve_local_convert_script. This surfaces the cross-version mismatch case that previously silently fell back to LLAMA_CPP_DEFAULT_DIR. - _resolve_local_convert_script's missing-converter warning now mentions both accepted filenames (convert_hf_to_gguf.py and convert-hf-to-gguf.py) so users debugging the env var see the full set the resolver actually probes. - Bump @lru_cache(2) on _download_convert_hf_to_gguf_cached to maxsize=8 so cross-mode toggling between local and network sources, plus a few in-place updates of a pinned converter, no longer evict cache slots and trigger redundant downloads or re-reads. Negligible memory overhead because the cached payload is small. - Reorder check_llama_cpp's fallback list to ["convert_hf_to_gguf.py", "convert-hf-to-gguf.py"] so it matches _resolve_local_convert_script's underscore-first preference; modern llama.cpp ships only the underscore variant, so this also avoids an unnecessary stat on the legacy hyphenated name in the common case. Blame-touched lines preserved in intent: - The print-output `subprocess.run(command, shell=False, check=True, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)` continuation line (line 1443, which carries the stdout=subprocess.PIPE, stderr=subprocess.STDOUT keyword arguments) whose blame points at 70f2ce7 ("[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)") still passes the same shell=False/check=True/text=True/stdout=subprocess.PIPE/stderr=subprocess.STDOUT arguments in the same order; only `env=sub_env` is added at the end so the venv-friendly invocation also picks up the pinned gguf-py via PYTHONPATH instead of the default-install gguf. - The silent-output `subprocess.run(command, shell=False, check=True, capture_output=True)` line (line 1446) whose blame points at 6b68187 ("fix: use `sys.executable` for pip and python subprocesses to support venv environments (unslothai#503)") still passes the same command/shell=False/check=True/capture_output=True; only `env=sub_env` is added so the same sys.executable launch the original fix established also receives PYTHONPATH for the pinned gguf-py. - The `try`/`except subprocess.CalledProcessError` block surrounding both calls (whose blame points at 70f2ce7 "[Part 1] Complete llama.cpp Integration Overhaul ... Multi-Modal Support (unslothai#302)" and 71229d8 "Fix Mistral, Qwen (unslothai#36)") is unchanged: same RuntimeError type, same command-string formatting, same stdout-print-on-failure path.
Problem
When running from a Python virtual environment (
.venv, conda, poetry, etc.), GGUF conversion fails becausellama_cpp.pyshells out to barepipandpythoncommands, which resolve to the system Python — not the interpreter running the current process.This causes two distinct failures:
1.
pip installhits PEP 668 protection (exit code 1)check_pip()iterates["uv pip", "pip", "pip3", ...]and finds the system'spip, which on modern Debian/Ubuntu (PEP 668) refuses to install packages outside a virtual environment:2.
pythoncommand not found during GGUF conversion (exit code 127)The conversion command is hardcoded as:
On many Linux distributions, there is no bare
pythonbinary — onlypython3. Even wherepythonexists, it points to the system Python, not the venv interpreter, so it would lack the required packages.3.
uv pipfails in non-uv environments (exit code 2)On systems where
uvis installed globally but the Python environment is managed by conda, mamba, or a standard venv (not uv), and after #336 was merged,check_pip()findsuv pipfirst (it was the first entry inPIP_OPTIONS) and tries to use it. Butuv pip installrequires a uv-managed venv and fails:By placing
sys.executable -m pipbeforeuv pipinPIP_OPTIONS,check_pip()now finds a working pip before ever reaching theuv pipentry, resolving this issue as well.Root Cause
All three issues stem from the same root cause: subprocess calls resolve to system-wide executables instead of the currently-running Python interpreter's environment.
Fix
Replace bare
pip/pythonreferences withsys.executable, which always resolves to the Python binary running the current process. This is the officially recommended way to invoke pip programmatically.Change 1:
PIP_OPTIONS(line 70)Prepend
f"{sys.executable} -m pip"so thatcheck_pip()prefers the current interpreter's pip:PIP_OPTIONS = [ + f"{sys.executable} -m pip", # Always prefer the running interpreter's pip "uv pip", "pip", "pip3", "python3 -m pip", "py -m pip", "poetry", ]Change 2: GGUF conversion command (line 1030)
Use
sys.executableinstead of barepython:Why
sys.executable?sys.executablereturns the absolute path of the Python interpreter running the current process. It correctly resolves in all common environments:sys.executableresolves to/path/to/.venv/bin/python/path/to/conda/envs/myenv/bin/python/path/to/.venv/bin/python/path/to/.venv/bin/python/usr/bin/python3For
check_pip()specifically, the fallback list still contains"uv pip","pip", etc., so environments wheresys.executable -m pipdoesn't work (e.g., uv environments that don't ship pip) gracefully fall through to the next option.Alternatives Considered
sys.executable -m pip/sys.executablePATH/VIRTUAL_ENVenv vars before subprocess callsenvthrough many functionsUNSLOTH_STUDIO=1env var to switch behavior--break-system-packagesflagTesting
.venvenvironment on a PEP 668 system (Ubuntu with Python 3.12)convert_hf_to_gguf.pyruns under the venv Python with access to all required packages