fix(studio): poll inference status while CLI model is loading#5901
fix(studio): poll inference status while CLI model is loading#5901jimdawdy-hub wants to merge 24 commits into
Conversation
When the user starts Studio via `studio run -m`, the web UI could still auto-load a different cached GGUF on the first message because the chat checkpoint was empty. Sync from /api/inference/status before falling back to autoLoadSmallestModel so CLI-loaded models are not replaced. Co-authored-by: Cursor <cursoragent@cursor.com>
The chat page could refresh /api/inference/status before `studio run -m` finished loading, leaving the UI checkpoint empty. Poll status on mount when no checkpoint is set, and extend waitForModelReady to adopt external loads. Co-authored-by: Cursor <cursoragent@cursor.com>
Reloading the same GGUF from the UI without gguf_variant no longer drops CLI pass-through args like --no-mmproj. Skip mmproj download and launch when --no-mmproj is present in llama_extra_args. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to adopt an already active model on the inference server into the chat UI without triggering a new load, adding polling mechanisms during startup and page refresh. The review feedback suggests simplifying a redundant condition in tryAdoptServerActiveModel and wrapping server status checks in try-catch blocks to prevent crashes from transient network errors. Additionally, it is recommended to defer model and Lora listing requests until after the active model polling completes to improve efficiency and robustness.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab3f95856b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6cecc631ca
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Verification logs (patched branch, Arch Linux, 2026-05-31)Installed from editable checkout: Command: unsloth studio run \
-m unsloth/Qwen3.6-27B-MTP-GGUF \
--gguf-variant UD-IQ2_XXS \
--max-seq-length 8192 \
--no-mmproj \
--port 8889 --host 127.0.0.1 --silent#5902 —
|
…polling Extract shared inference-status hydration, poll status before listing models on CLI startup, wait for empty checkpoints before auto-load, and reject llama_extra_args inheritance when the resolved GGUF variant differs. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1194142d5e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…efresh Keep CLI poll/adopt logic and combine refresh options with main's AbortSignal cancellation. Retain project instruction helpers from main alongside the extended waitForModelReady adopt loop. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3b59d7b87c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
for more information, see https://pre-commit.ci
P1: only enter waitForModelReady() when a UI-initiated load is actually in progress (modelLoading). Removing the checkpointEmpty condition means a fresh empty session goes straight to autoLoadSmallestModel(), which already calls tryAdoptServerActiveModel() first. This avoids the 120 s spin-to-deadline on every normal startup where no CLI model is loading. P2: fetch listModels() / listLoras() and commit them to the store before starting the 60 s CLI-load poll, so the model selector is never blocked for a full minute during an idle Studio session. The poll still runs concurrently on mount when the checkpoint is empty; the final status fetch is re-used from the poll result to avoid an extra round-trip. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d56af4609
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Pushed 6d56af4 to address the two P1/P2 Codex concerns:
@danielhanchen @rolandtannous — would you be able to review when you get a chance? CI is showing |
Addresses the follow-up Codex review on 6d56af4: - P1: dropping the pre-autoload wait entirely reintroduced the CLI-load race (UI auto-loads the smallest model while `studio run -m` is still loading). autoLoadSmallestModel now calls adoptInFlightServerLoad, which adopts an already-active model, and -- only when load-progress reports phase "mmap" (llama-server genuinely paging weights) -- waits for that load to finish before adopting. An idle session has no such evidence and falls straight through to auto-load with no delay. - P3: waitForModelReady no longer spins to a 120s deadline. It returns as soon as modelLoading clears, so a cancelled/failed UI-initiated load no longer hangs the send for two minutes. - P2: refresh() no longer clobbers a model the user picks while the mount-time CLI poll is running -- the poll stops early on selection and the polled active_model is not applied over a fresh local selection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts: # studio/backend/routes/inference.py # studio/frontend/src/features/chat/api/chat-adapter.ts # studio/frontend/src/features/chat/hooks/use-chat-model-runtime.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4b8f1c516c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Resolve chat-adapter conflict (keep abortSignal on autoLoadSmallestModel) and address latest Codex review: report in-flight GGUF loads on /status, gate adopt waits on loading/mmap evidence, re-check checkpoint before adopt, and skip multimodal reset when the user picked during CLI poll. Co-authored-by: Cursor <cursoragent@cursor.com>
Merge + Codex follow-up (c5e6e70)Merged latest
PR is MERGEABLE; waiting on |
|
Pushed b709b8f with two changes:
The polling design here is solid: evidence-gated waiting via |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b709b8f168
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Synced with latest
Waiting on maintainer review/approval. pre-commit.ci is the only automated gate visible from fork PRs; GitHub Actions still require maintainer approval for first-time contributors. |
…odel-load # Conflicts: # studio/backend/routes/inference.py # studio/frontend/src/features/chat/api/chat-adapter.ts # studio/frontend/src/features/chat/hooks/use-chat-model-runtime.ts # studio/frontend/src/features/chat/lib/apply-inference-status-to-store.ts
The load-orchestrator canary failed on GitHub-hosted runners at 361 ms against a 350 ms ceiling. Widen to 400 ms so the guard still catches pathological serialisation without flaking on shared CI hardware. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c47589b73c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Keep the empty-checkpoint refresh poll running past 60s when inference status still reports an in-flight load, so slow studio run -m sessions adopt into the checkpoint without a manual refresh. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f85c07b566
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Only clear multimodal/trust flags when refresh finishes with no active model and no checkpoint in the store, so a selection during in-flight list/status calls is not wiped by stale pre-await guards. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Addressed in 010142e: the capability-clear branch now gates on |
Summary
/api/inference/statusfor up to 60s on chat page mount when no checkpoint is set (covers the race where the UI loads beforestudio run -mfinishes).waitForModelReady()to adopt externally loaded models, not only UImodelLoadingstate.model_identifierfrom status when syncing the checkpoint (HF repo id vs display label).Related
Reproduction logs (Arch Linux, dual RTX 5060 Ti, 2026-05-31)
Same
studio run -m unsloth/Qwen3.6-27B-MTP-GGUF …session as #5900. The UI opened while the CLI load was still in flight:GET /api/inference/statusreturns empty checkpoint / no active modelStatus polled before CLI load finished:
{"timestamp": "2026-05-31T13:27:34.249462Z", "event": "request_completed", "method": "GET", "path": "/api/inference/status", "status_code": 200}Wrong model loaded by UI auto-load:
{"timestamp": "2026-05-31T13:28:02.197259Z", "event": "Detected remote GGUF repo 'unsloth/gemma-4-E2B-it-GGUF', variant=UD-Q4_K_XL, vision=True"}Test plan
studio run -m …, open the UI immediately, wait without refreshingVerification (2026-05-31, patched editable install)
See verification comment on the PR — summary:
studio run -mQwen (only helper pre-cache, no inference load for gemma-4-E2B)--no-mmprojhonored — no mmproj download; llama-server launched with--no-mmprojonlyInheriting llama_extra_args ... ['--no-mmproj']on same-model reloadtest_llama_server_args.pytests passed