Skip to content

fix(studio): inherit llama_extra_args and honor --no-mmproj#5902

Merged
danielhanchen merged 9 commits into
unslothai:mainfrom
jimdawdy-hub:fix/studio-llama-extra-args-mmproj
Jun 12, 2026
Merged

fix(studio): inherit llama_extra_args and honor --no-mmproj#5902
danielhanchen merged 9 commits into
unslothai:mainfrom
jimdawdy-hub:fix/studio-llama-extra-args-mmproj

Conversation

@jimdawdy-hub

@jimdawdy-hub jimdawdy-hub commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Inherit stored llama_extra_args when reloading the same GGUF from the UI without an explicit gguf_variant (fixes "Not inheriting llama_extra_args" on Apply/reload).
  • Skip mmproj download and --mmproj launch when --no-mmproj is present in llama_extra_args (avoids wasted download/VRAM before last-wins CLI parsing).

Reproduction logs (Arch Linux, dual RTX 5060 Ti, 2026-05-31)

After the UI replaced Qwen with Gemma, manually reloading Qwen from Chat Settings dropped CLI flags:

{"timestamp": "2026-05-31T13:29:23.365092Z", "event": "Downloading mmproj: unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-F16.gguf"}
{"timestamp": "2026-05-31T13:29:23.980788Z", "event": "Using mmproj for vision: .../mmproj-F16.gguf"}
{"timestamp": "2026-05-31T13:29:23.980895Z", "event": "Starting llama-server: ... --mmproj .../mmproj-F16.gguf"}

(No Appending user extra args / no --no-mmproj on UI-initiated reload despite CLI passing --no-mmproj.)

Cross-model auto-load also showed args were not inherited when switching repos:

{"timestamp": "2026-05-31T13:28:02.349969Z", "event": "Not inheriting llama_extra_args: stored args came from ('unsloth/Qwen3.6-27B-MTP-GGUF', 'UD-IQ2_XXS'), loading ('unsloth/gemma-4-E2B-it-GGUF', 'UD-Q4_K_XL')"}

Test plan

  • unsloth studio run -m … --no-mmproj -- … then reload the same model from Chat Settings without extra args
  • Confirm logs show inherited args and mmproj is not downloaded
  • Confirm llama-server command line does not include --mmproj when --no-mmproj was passed via CLI

Verification (2026-05-31, patched editable install)

See verification comment on the PR — summary:

  • No Gemma auto-load after studio run -m Qwen (only helper pre-cache, no inference load for gemma-4-E2B)
  • --no-mmproj honored — no mmproj download; llama-server launched with --no-mmproj only
  • Reload inherits argsInheriting llama_extra_args ... ['--no-mmproj'] on same-model reload
  • 165 test_llama_server_args.py tests passed

Reloading the same GGUF from the UI without gguf_variant no longer drops
CLI pass-through args like --no-mmproj. Skip mmproj download and launch
when --no-mmproj is present in llama_extra_args.

Co-authored-by: Cursor <cursoragent@cursor.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to disable the automatic downloading and launching of mmproj for vision models when the --no-mmproj flag is provided in the extra arguments. It also refactors the model inheritance logic in routes/inference.py to prevent inheriting arguments when there is an explicit variant change. Feedback points out a bug in the explicit_variant_change logic where a variant change is not detected if the stored variant is empty, and provides a code suggestion to fix it.

Comment thread studio/backend/routes/inference.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d73634c9d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/backend/routes/inference.py Outdated
@jimdawdy-hub

Copy link
Copy Markdown
Contributor Author

Verification logs (patched branch, Arch Linux, 2026-05-31)

Installed from editable checkout: pip install -e /home/jim/Projects/unsloth (branch fix/studio-poll-cli-model-load, includes #5900 + #5901 + #5902).

Command:

unsloth studio run \
  -m unsloth/Qwen3.6-27B-MTP-GGUF \
  --gguf-variant UD-IQ2_XXS \
  --max-seq-length 8192 \
  --no-mmproj \
  --port 8889 --host 127.0.0.1 --silent

#5902--no-mmproj honored; no mmproj download on CLI load

Before (repro): Downloading mmproj: unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-F16.gguf

After (patched):

{"timestamp": "2026-05-31T14:14:14.354872Z", "event": "Vision-capable GGUF loaded without a usable mmproj; image input will be disabled for this session"}
{"timestamp": "2026-05-31T14:14:14.355179Z", "event": "Appending user extra args to llama-server: ['--no-mmproj']"}
{"timestamp": "2026-05-31T14:14:14.355224Z", "event": "Starting llama-server: ... --no-mmproj"}

(No Downloading mmproj line. llama-server command ends with --no-mmproj, not --mmproj.)

#5902 — UI reload inherits llama_extra_args

{"timestamp": "2026-05-31T14:15:33.870718Z", "event": "Inheriting llama_extra_args from previous load (same model, shadow-stripped): ['--no-mmproj']"}
{"timestamp": "2026-05-31T14:15:35.282062Z", "event": "Starting llama-server: ... --no-mmproj"}
{"timestamp": "2026-05-31T14:16:01.740749Z", "event": "Loaded GGUF model via llama-server: unsloth/Qwen3.6-27B-MTP-GGUF"}

Reload request: POST /api/inference/load with {"model_path":"unsloth/Qwen3.6-27B-MTP-GGUF","gguf_variant":"UD-IQ2_XXS"} (no llama_extra_args field).

#5900 / #5901 — no Gemma auto-load; Qwen stays active

Before (repro):

{"timestamp": "2026-05-31T13:28:02.197259Z", "event": "Detected remote GGUF repo 'unsloth/gemma-4-E2B-it-GGUF', variant=UD-Q4_K_XL, vision=True"}

After (patched session): no Detected remote GGUF repo 'unsloth/gemma-4-E2B-it-GGUF' inference load. Only background helper pre-cache:

{"timestamp": "2026-05-31T14:14:13.453443Z", "event": "Pre-caching helper GGUF: unsloth/gemma-4-E2B-it-GGUF/gemma-4-E2B-it-UD-Q4_K_XL.gguf"}

Status after CLI load:

{"active_model":"unsloth/Qwen3.6-27B-MTP-GGUF","model_identifier":"unsloth/Qwen3.6-27B-MTP-GGUF","gguf_variant":"UD-IQ2_XXS","is_vision":false}

Chat completion stayed on Qwen:

{"model":"unsloth/Qwen3.6-27B-MTP-GGUF","choices":[{"delta":{"content":"The user is asking me to reply with exactly \"OK\"..."}}]}

POST /api/inference/load count in session: 1 CLI load + 1 intentional same-model reload test — no Gemma load.

Unit tests

pytest studio/backend/tests/test_llama_server_args.py — 165 passed

jimdawdy-hub and others added 2 commits June 2, 2026 21:50
Reject inherited CLI args when the request changes gguf_variant or when
omitted variant resolves differently from the stored extra_args source.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jimdawdy-hub

Copy link
Copy Markdown
Contributor Author

Both Codex concerns were addressed in 99309f1:

  • Empty stored_variant with explicit request: explicit gguf_variant now always compares request_variant != stored_variant, dropping the old requirement that stored_variant be non-empty before doing the comparison.
  • Omitted variant / resolved config guard: when gguf_variant is omitted, inheritance is only rejected if stored_variant is non-empty and differs from the resolved config.gguf_variant (both lowercased).

No open review threads remain.

@danielhanchen @rolandtannous — ready for review when you get a chance. CI is showing action_required; maintainer approval to run the workflows would be appreciated.

jimdawdy-hub and others added 2 commits June 9, 2026 20:22
Resolve inference.py conflict by keeping variant-aware llama_extra_args
inheritance when gguf_variant is omitted from the reload request.

Co-authored-by: Cursor <cursoragent@cursor.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9276e5bd0a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/backend/core/inference/llama_server_args.py Outdated
@danielhanchen danielhanchen self-assigned this Jun 11, 2026
@danielhanchen

Copy link
Copy Markdown
Member

Pushed 69b3195 to close out the remaining review item: extra_args_disable_mmproj() now recognises --no-mmproj-auto and mirrors llama-server's last-wins parsing for the --mmproj-auto / --no-mmproj / --no-mmproj-auto boolean (they all map to the same no_mmproj param in common/arg.cpp). Added tests for the alias and the last-wins ordering; all 166 tests in test_llama_server_args.py pass. The variant inheritance guard from the earlier thread is confirmed in the branch.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69b319508e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/backend/routes/inference.py
@jimdawdy-hub

Copy link
Copy Markdown
Contributor Author

Synced with latest main and cleared open review threads.

  • Merged origin/main into this branch; PR is mergeable again.
  • Resolved remaining Codex/Gemini review threads (including items already addressed in @danielhanchen's follow-up commits).

Waiting on maintainer review/approval. pre-commit.ci is the only automated gate visible from fork PRs; GitHub Actions still require maintainer approval for first-time contributors.

@danielhanchen danielhanchen merged commit f22e890 into unslothai:main Jun 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants