fix(studio): inherit llama_extra_args and honor --no-mmproj#5902
Conversation
Reloading the same GGUF from the UI without gguf_variant no longer drops CLI pass-through args like --no-mmproj. Skip mmproj download and launch when --no-mmproj is present in llama_extra_args. Co-authored-by: Cursor <cursoragent@cursor.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to disable the automatic downloading and launching of mmproj for vision models when the --no-mmproj flag is provided in the extra arguments. It also refactors the model inheritance logic in routes/inference.py to prevent inheriting arguments when there is an explicit variant change. Feedback points out a bug in the explicit_variant_change logic where a variant change is not detected if the stored variant is empty, and provides a code suggestion to fix it.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d73634c9d1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Verification logs (patched branch, Arch Linux, 2026-05-31)Installed from editable checkout: Command: unsloth studio run \
-m unsloth/Qwen3.6-27B-MTP-GGUF \
--gguf-variant UD-IQ2_XXS \
--max-seq-length 8192 \
--no-mmproj \
--port 8889 --host 127.0.0.1 --silent#5902 —
|
Reject inherited CLI args when the request changes gguf_variant or when omitted variant resolves differently from the stored extra_args source. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Both Codex concerns were addressed in 99309f1:
No open review threads remain. @danielhanchen @rolandtannous — ready for review when you get a chance. CI is showing |
Resolve inference.py conflict by keeping variant-aware llama_extra_args inheritance when gguf_variant is omitted from the reload request. Co-authored-by: Cursor <cursoragent@cursor.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9276e5bd0a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Pushed 69b3195 to close out the remaining review item: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 69b319508e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Synced with latest
Waiting on maintainer review/approval. pre-commit.ci is the only automated gate visible from fork PRs; GitHub Actions still require maintainer approval for first-time contributors. |
Summary
llama_extra_argswhen reloading the same GGUF from the UI without an explicitgguf_variant(fixes "Not inheriting llama_extra_args" on Apply/reload).--mmprojlaunch when--no-mmprojis present inllama_extra_args(avoids wasted download/VRAM before last-wins CLI parsing).Reproduction logs (Arch Linux, dual RTX 5060 Ti, 2026-05-31)
After the UI replaced Qwen with Gemma, manually reloading Qwen from Chat Settings dropped CLI flags:
{"timestamp": "2026-05-31T13:29:23.365092Z", "event": "Downloading mmproj: unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-F16.gguf"} {"timestamp": "2026-05-31T13:29:23.980788Z", "event": "Using mmproj for vision: .../mmproj-F16.gguf"} {"timestamp": "2026-05-31T13:29:23.980895Z", "event": "Starting llama-server: ... --mmproj .../mmproj-F16.gguf"}(No
Appending user extra args/ no--no-mmprojon UI-initiated reload despite CLI passing--no-mmproj.)Cross-model auto-load also showed args were not inherited when switching repos:
{"timestamp": "2026-05-31T13:28:02.349969Z", "event": "Not inheriting llama_extra_args: stored args came from ('unsloth/Qwen3.6-27B-MTP-GGUF', 'UD-IQ2_XXS'), loading ('unsloth/gemma-4-E2B-it-GGUF', 'UD-Q4_K_XL')"}Test plan
unsloth studio run -m … --no-mmproj -- …then reload the same model from Chat Settings without extra args--mmprojwhen--no-mmprojwas passed via CLIVerification (2026-05-31, patched editable install)
See verification comment on the PR — summary:
studio run -mQwen (only helper pre-cache, no inference load for gemma-4-E2B)--no-mmprojhonored — no mmproj download; llama-server launched with--no-mmprojonlyInheriting llama_extra_args ... ['--no-mmproj']on same-model reloadtest_llama_server_args.pytests passed