fix(studio): inherit llama_extra_args and honor --no-mmproj by jimdawdy-hub · Pull Request #5902 · unslothai/unsloth

jimdawdy-hub · 2026-05-31T13:54:20Z

Summary

Inherit stored llama_extra_args when reloading the same GGUF from the UI without an explicit gguf_variant (fixes "Not inheriting llama_extra_args" on Apply/reload).
Skip mmproj download and --mmproj launch when --no-mmproj is present in llama_extra_args (avoids wasted download/VRAM before last-wins CLI parsing).

Reproduction logs (Arch Linux, dual RTX 5060 Ti, 2026-05-31)

After the UI replaced Qwen with Gemma, manually reloading Qwen from Chat Settings dropped CLI flags:

{"timestamp": "2026-05-31T13:29:23.365092Z", "event": "Downloading mmproj: unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-F16.gguf"}
{"timestamp": "2026-05-31T13:29:23.980788Z", "event": "Using mmproj for vision: .../mmproj-F16.gguf"}
{"timestamp": "2026-05-31T13:29:23.980895Z", "event": "Starting llama-server: ... --mmproj .../mmproj-F16.gguf"}

(No Appending user extra args / no --no-mmproj on UI-initiated reload despite CLI passing --no-mmproj.)

Cross-model auto-load also showed args were not inherited when switching repos:

{"timestamp": "2026-05-31T13:28:02.349969Z", "event": "Not inheriting llama_extra_args: stored args came from ('unsloth/Qwen3.6-27B-MTP-GGUF', 'UD-IQ2_XXS'), loading ('unsloth/gemma-4-E2B-it-GGUF', 'UD-Q4_K_XL')"}

Test plan

unsloth studio run -m … --no-mmproj -- … then reload the same model from Chat Settings without extra args
Confirm logs show inherited args and mmproj is not downloaded
Confirm llama-server command line does not include --mmproj when --no-mmproj was passed via CLI

Verification (2026-05-31, patched editable install)

See verification comment on the PR — summary:

No Gemma auto-load after studio run -m Qwen (only helper pre-cache, no inference load for gemma-4-E2B)
--no-mmproj honored — no mmproj download; llama-server launched with --no-mmproj only
Reload inherits args — Inheriting llama_extra_args ... ['--no-mmproj'] on same-model reload
165 test_llama_server_args.py tests passed

Reloading the same GGUF from the UI without gguf_variant no longer drops CLI pass-through args like --no-mmproj. Skip mmproj download and launch when --no-mmproj is present in llama_extra_args. Co-authored-by: Cursor <cursoragent@cursor.com>

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request introduces a mechanism to disable the automatic downloading and launching of mmproj for vision models when the --no-mmproj flag is provided in the extra arguments. It also refactors the model inheritance logic in routes/inference.py to prevent inheriting arguments when there is an explicit variant change. Feedback points out a bug in the explicit_variant_change logic where a variant change is not detected if the stored variant is empty, and provides a code suggestion to fix it.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d73634c9d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

jimdawdy-hub · 2026-05-31T14:17:07Z

Verification logs (patched branch, Arch Linux, 2026-05-31)

Installed from editable checkout: pip install -e /home/jim/Projects/unsloth (branch fix/studio-poll-cli-model-load, includes #5900 + #5901 + #5902).

Command:

unsloth studio run \
  -m unsloth/Qwen3.6-27B-MTP-GGUF \
  --gguf-variant UD-IQ2_XXS \
  --max-seq-length 8192 \
  --no-mmproj \
  --port 8889 --host 127.0.0.1 --silent

#5902 — `--no-mmproj` honored; no mmproj download on CLI load

Before (repro): Downloading mmproj: unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-F16.gguf

After (patched):

{"timestamp": "2026-05-31T14:14:14.354872Z", "event": "Vision-capable GGUF loaded without a usable mmproj; image input will be disabled for this session"}
{"timestamp": "2026-05-31T14:14:14.355179Z", "event": "Appending user extra args to llama-server: ['--no-mmproj']"}
{"timestamp": "2026-05-31T14:14:14.355224Z", "event": "Starting llama-server: ... --no-mmproj"}

(No Downloading mmproj line. llama-server command ends with --no-mmproj, not --mmproj.)

#5902 — UI reload inherits `llama_extra_args`

{"timestamp": "2026-05-31T14:15:33.870718Z", "event": "Inheriting llama_extra_args from previous load (same model, shadow-stripped): ['--no-mmproj']"}
{"timestamp": "2026-05-31T14:15:35.282062Z", "event": "Starting llama-server: ... --no-mmproj"}
{"timestamp": "2026-05-31T14:16:01.740749Z", "event": "Loaded GGUF model via llama-server: unsloth/Qwen3.6-27B-MTP-GGUF"}

Reload request: POST /api/inference/load with {"model_path":"unsloth/Qwen3.6-27B-MTP-GGUF","gguf_variant":"UD-IQ2_XXS"} (no llama_extra_args field).

#5900 / #5901 — no Gemma auto-load; Qwen stays active

Before (repro):

{"timestamp": "2026-05-31T13:28:02.197259Z", "event": "Detected remote GGUF repo 'unsloth/gemma-4-E2B-it-GGUF', variant=UD-Q4_K_XL, vision=True"}

After (patched session): no Detected remote GGUF repo 'unsloth/gemma-4-E2B-it-GGUF' inference load. Only background helper pre-cache:

{"timestamp": "2026-05-31T14:14:13.453443Z", "event": "Pre-caching helper GGUF: unsloth/gemma-4-E2B-it-GGUF/gemma-4-E2B-it-UD-Q4_K_XL.gguf"}

Status after CLI load:

{"active_model":"unsloth/Qwen3.6-27B-MTP-GGUF","model_identifier":"unsloth/Qwen3.6-27B-MTP-GGUF","gguf_variant":"UD-IQ2_XXS","is_vision":false}

Chat completion stayed on Qwen:

{"model":"unsloth/Qwen3.6-27B-MTP-GGUF","choices":[{"delta":{"content":"The user is asking me to reply with exactly \"OK\"..."}}]}

POST /api/inference/load count in session: 1 CLI load + 1 intentional same-model reload test — no Gemma load.

Unit tests

pytest studio/backend/tests/test_llama_server_args.py — 165 passed

Reject inherited CLI args when the request changes gguf_variant or when omitted variant resolves differently from the stored extra_args source. Co-authored-by: Cursor <cursoragent@cursor.com>

jimdawdy-hub · 2026-06-09T04:31:08Z

Both Codex concerns were addressed in 99309f1:

Empty stored_variant with explicit request: explicit gguf_variant now always compares request_variant != stored_variant, dropping the old requirement that stored_variant be non-empty before doing the comparison.
Omitted variant / resolved config guard: when gguf_variant is omitted, inheritance is only rejected if stored_variant is non-empty and differs from the resolved config.gguf_variant (both lowercased).

No open review threads remain.

@danielhanchen @rolandtannous — ready for review when you get a chance. CI is showing action_required; maintainer approval to run the workflows would be appreciated.

Resolve inference.py conflict by keeping variant-aware llama_extra_args inheritance when gguf_variant is omitted from the reload request. Co-authored-by: Cursor <cursoragent@cursor.com>

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9276e5bd0a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…unslothai#5902

danielhanchen · 2026-06-11T14:31:05Z

Pushed 69b3195 to close out the remaining review item: extra_args_disable_mmproj() now recognises --no-mmproj-auto and mirrors llama-server's last-wins parsing for the --mmproj-auto / --no-mmproj / --no-mmproj-auto boolean (they all map to the same no_mmproj param in common/arg.cpp). Added tests for the alias and the last-wins ordering; all 166 tests in test_llama_server_args.py pass. The variant inheritance guard from the earlier thread is confirmed in the branch.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69b319508e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…a-args-mmproj

jimdawdy-hub · 2026-06-11T19:26:32Z

Synced with latest main and cleared open review threads.

Merged origin/main into this branch; PR is mergeable again.
Resolved remaining Codex/Gemini review threads (including items already addressed in @danielhanchen's follow-up commits).

Waiting on maintainer review/approval. pre-commit.ci is the only automated gate visible from fork PRs; GitHub Actions still require maintainer approval for first-time contributors.

jimdawdy-hub requested review from danielhanchen and rolandtannous as code owners May 31, 2026 13:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7992ba

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

Comment thread studio/backend/routes/inference.py Outdated

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

Comment thread studio/backend/routes/inference.py Outdated

jimdawdy-hub mentioned this pull request May 31, 2026

fix(studio): adopt server-loaded model before chat auto-load #5900

Merged

3 tasks

This was referenced May 31, 2026

fix(studio): poll inference status while CLI model is loading #5901

Open

fix(studio): load run.py by path for editable installs #5909

Merged

jimdawdy-hub and others added 2 commits June 2, 2026 21:50

fix(studio): tighten GGUF llama_extra_args variant inheritance guard

99309f1

Reject inherited CLI args when the request changes gguf_variant or when omitted variant resolves differently from the stored extra_args source. Co-authored-by: Cursor <cursoragent@cursor.com>

Merge branch 'main' into fix/studio-llama-extra-args-mmproj

fa112ba

jimdawdy-hub and others added 2 commits June 9, 2026 20:22

Merge origin/main into fix/studio-llama-extra-args-mmproj

e34349e

Resolve inference.py conflict by keeping variant-aware llama_extra_args inheritance when gguf_variant is omitted from the reload request. Co-authored-by: Cursor <cursoragent@cursor.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9276e5b

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread studio/backend/core/inference/llama_server_args.py Outdated

danielhanchen self-assigned this Jun 11, 2026

Treat --no-mmproj-auto and --mmproj-auto with last-wins parsing for PR …

69b3195

…unslothai#5902

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread studio/backend/routes/inference.py

danielhanchen and others added 2 commits June 11, 2026 15:05

Merge main into fix/studio-llama-extra-args-mmproj

975fc43

Merge remote-tracking branch 'origin/main' into fix/studio-llama-extr…

2d5760a

…a-args-mmproj

danielhanchen merged commit f22e890 into unslothai:main Jun 12, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(studio): inherit llama_extra_args and honor --no-mmproj#5902

fix(studio): inherit llama_extra_args and honor --no-mmproj#5902
danielhanchen merged 9 commits into
unslothai:mainfrom
jimdawdy-hub:fix/studio-llama-extra-args-mmproj

jimdawdy-hub commented May 31, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

jimdawdy-hub commented May 31, 2026

Uh oh!

jimdawdy-hub commented Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

danielhanchen commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

jimdawdy-hub commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jimdawdy-hub commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduction logs (Arch Linux, dual RTX 5060 Ti, 2026-05-31)

Test plan

Verification (2026-05-31, patched editable install)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

jimdawdy-hub commented May 31, 2026

Verification logs (patched branch, Arch Linux, 2026-05-31)

#5902 — --no-mmproj honored; no mmproj download on CLI load

#5902 — UI reload inherits llama_extra_args

#5900 / #5901 — no Gemma auto-load; Qwen stays active

Unit tests

Uh oh!

jimdawdy-hub commented Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

danielhanchen commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

jimdawdy-hub commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jimdawdy-hub commented May 31, 2026 •

edited

Loading

#5902 — `--no-mmproj` honored; no mmproj download on CLI load

#5902 — UI reload inherits `llama_extra_args`