fix(studio): adopt server-loaded model before chat auto-load#5900
Conversation
When the user starts Studio via `studio run -m`, the web UI could still auto-load a different cached GGUF on the first message because the chat checkpoint was empty. Sync from /api/inference/status before falling back to autoLoadSmallestModel so CLI-loaded models are not replaced. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to adopt an already active model on the inference server into the chat UI checkpoint, avoiding unnecessary model reloading. This is implemented via the new tryAdoptServerActiveModel function, which is integrated into the auto-loading flow. The review feedback highlights two main improvements: wrapping tryAdoptServerActiveModel in a try-catch block to prevent bypassing necessary cleanup logic during errors, and simplifying a redundant conditional check in tryAdoptServerActiveModel which also allows for the removal of an unused import.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 559082ea9f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Verification logs (patched branch, Arch Linux, 2026-05-31)Installed from editable checkout: Command: unsloth studio run \
-m unsloth/Qwen3.6-27B-MTP-GGUF \
--gguf-variant UD-IQ2_XXS \
--max-seq-length 8192 \
--no-mmproj \
--port 8889 --host 127.0.0.1 --silent#5902 —
|
Extract shared inference-status hydration for refresh() and CLI adopt paths so the first chat turn gets reasoning/tools flags. Wrap auto-load (including adopt) in try/catch for image-edit cleanup, and drop the redundant adopt call in run(). Co-authored-by: Cursor <cursoragent@cursor.com>
|
All three Codex concerns (error handling for @rolandtannous — ready for review when you get a chance. CI is showing |
Resolve conflicts by keeping shared apply-inference-status-to-store hydration while adopting main's refresh/load paths. Co-authored-by: Cursor <cursoragent@cursor.com>
Merge conflict resolutionMerged latest Conflicts resolved:
The shared helper already includes Review threads: all three Codex/Gemini threads are resolved (redundant checkpoint guard was fixed in PR is MERGEABLE; waiting on |
|
Pushed 4cd23a6 with two small hardening changes to
The shared hydration extraction looks faithful to the original refresh block. Nice catch on the CLI model being replaced by the fallback auto-load. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4dc92f1646
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9899d3bdb8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…rver-active-model
|
Synced with latest
Waiting on maintainer review/approval. pre-commit.ci is the only automated gate visible from fork PRs; GitHub Actions still require maintainer approval for first-time contributors. |
Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
/api/inference/statusbefore runningautoLoadSmallestModel().unsloth studio run -m …from being replaced by the smallest cached GGUF (or the Gemma fallback download) on the first chat message.Problem
If the browser polls status before the CLI load finishes, the UI has no checkpoint. Sending a message then triggers auto-load, which can unload the CLI-loaded model and load a different one (reproduced with Qwen via CLI → Gemma via UI auto-load).
Reproduction logs (Arch Linux, dual RTX 5060 Ti, 2026-05-31)
Command:
POST /api/inference/loadfor QwenGET /api/inference/status— 5s before load completesModel loaded: unsloth/Qwen3.6-27B-MTP-GGUF (UD-IQ2_XXS)GET /api/models/cached-gguf(auto-load path)POST /api/inference/loadforunsloth/gemma-4-E2B-it-GGUF— Qwen unloaded/v1/chat/completionson Gemma, not QwenEarly status poll (no active model yet):
{"timestamp": "2026-05-31T13:27:34.249462Z", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200, "process_time_ms": 43.35}Auto-load replaces Qwen with Gemma:
{"timestamp": "2026-05-31T13:28:02.197259Z", "event": "Detected remote GGUF repo 'unsloth/gemma-4-E2B-it-GGUF', variant=UD-Q4_K_XL, vision=True"} {"timestamp": "2026-05-31T13:28:02.349969Z", "event": "Not inheriting llama_extra_args: stored args came from ('unsloth/Qwen3.6-27B-MTP-GGUF', 'UD-IQ2_XXS'), loading ('unsloth/gemma-4-E2B-it-GGUF', 'UD-Q4_K_XL')"} {"timestamp": "2026-05-31T13:28:13.976568Z", "event": "Loaded GGUF model via llama-server: unsloth/gemma-4-E2B-it-GGUF"}Test plan
unsloth studio run -m unsloth/Qwen3.6-27B-MTP-GGUF --gguf-variant UD-IQ2_XXS …and open the UI before "Model loaded" appears/api/inference/loadfor a different repoVerification (2026-05-31, patched editable install)
See verification comment on the PR — summary:
studio run -mQwen (only helper pre-cache, no inference load for gemma-4-E2B)--no-mmprojhonored — no mmproj download; llama-server launched with--no-mmprojonlyInheriting llama_extra_args ... ['--no-mmproj']on same-model reloadtest_llama_server_args.pytests passed