Studio: expose --parallel / -np flag on `unsloth studio run` by danielhanchen · Pull Request #5737 · unslothai/unsloth

danielhanchen · 2026-05-23T06:15:24Z

Summary

unsloth studio run was hardcoding llama_parallel_slots=4 in run_kwargs at unsloth_cli/commands/studio.py:803, leaving users unable to tune llama-server's concurrent decode slot count even though the entire downstream path (engine, KV-cache VRAM math, studio.backend.run.run_server(llama_parallel_slots=...)) already accepted any N.

This change adds a --parallel / --n-parallel / -np typer option (default 4 -- matches the previous hardcoded value), forwards it into run_kwargs, and pins the new CLI surface with 4 unit tests.

Why now

The flag was discovered during a sweep analysis where I wanted to run multiple concurrent Studio benchmark requests against one loaded model. The plumbing already worked end-to-end -- the only gap was the CLI surface. From studio/backend/core/inference/llama_cpp.py the argv to llama-server already passes --parallel <n_parallel> (line ~2842), and KV-cache fitting at _fit_context_to_vram already accounts for slot count when computing per-layer SWA vs global cache split.

Concurrency safety

No refactor of routes/inference.py is needed:

cancel_event = threading.Event() and prev_text = "" are per-request locals in every streaming handler (lines 2090 / 2253 / 2298 / 2447 / 2618 / 2777 / 2933).
The cancel registry is keyed by cancel_id / session_id, not by backend identity.
LlamaCppBackend._lock / _serial_load_lock only wrap load/unload, not generate_chat_completion.
state.tool_policy is process-global but read-only per request.

Caveats users should know

Higher N means each slot gets ctx / N KV cache. With -c 8192 -np 8 each call effectively has 1024 tokens of context. The --no-context-shift flag (which Studio sets unconditionally) means an over-ctx request 400s instead of rotating, so users tuning this should be aware. The help text mentions this trade-off.

unsloth studio (the bare default command, no subcommand) still defaults to llama_parallel_slots=1 via run_server's own default; this PR does not change that path. Only the one-liner studio run -- which already silently used 4 -- now exposes the knob.

Tests

unsloth_cli/tests/test_studio_run_parallel_flag.py (4 tests):

option is registered with all 3 aliases (--parallel / --n-parallel / -np)
default is 4 (no behaviour change)
range guards are 1 <= N <= 64
run_kwargs no longer contains the literal llama_parallel_slots = 4

All 4 pass locally.

Test plan

Unit tests pass
Manual: unsloth studio run --help shows the new flag
Manual: unsloth studio run --parallel 8 --model X boots and serves 8 concurrent decode slots (verifiable via concurrent /v1/chat/completions requests)

gemini-code-assist · 2026-05-23T06:15:28Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 256ac52a35

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T06:16:35Z

+    parallel: int = typer.Option(
+        4,
+        "--parallel",
+        "--n-parallel",
+        "-np",


Forward --parallel during studio venv re-exec

When run() executes outside the Studio venv, it rebuilds args and re-execs into unsloth studio run, but that forwarded argument list never includes the new parallel value. In that common path, any user-provided --parallel/--n-parallel/-np is silently dropped and the child process falls back to the default of 4, so the new flag is ineffective unless the command is already running inside the Studio venv.

Useful? React with 👍 / 👎.

The CLI was hardcoding `llama_parallel_slots=4` in `run_kwargs` at `unsloth_cli/commands/studio.py`, leaving users unable to tune the concurrent decode slot count even though the engine, KV-cache math, and `studio.backend.run.run_server(llama_parallel_slots=...)` plumbing all already accepted any N. This change adds a `--parallel` / `--n-parallel` / `-np` typer option (default 4 -- matches the previous hardcoded value), forwards it into `run_kwargs`, and pins the new surface with 4 unit tests. Per-request state in `routes/inference.py` is already isolated (`cancel_event` and `prev_text` are per-request locals in every streaming handler; the `_lock` / `_serial_load_lock` only wrap load/unload, not chat completions), so no concurrency refactor is needed alongside this -- the engine layer already handles N concurrent requests on one loaded model when llama-server is told to. Range guards: 1 <= N <= 64. With higher N each slot gets ctx/N KV cache; users tuning this should be aware that per-call context shrinks proportionally. `unsloth studio` (the bare default command, no subcommand) still defaults to llama_parallel_slots=1 via `run_server`'s own default; this PR does not change that path -- it only exposes the knob on the one-liner `studio run` command that already silently used 4.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5aadebcf4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T14:03:50Z

    from studio.backend.run import run_server, _resolve_external_ip

-    run_kwargs = dict(host = host, port = port, silent = True, llama_parallel_slots = 4)
+    run_kwargs = dict(host = host, port = port, silent = True, llama_parallel_slots = parallel)


Forward --parallel in venv re-exec arguments

The new parallel option is only applied in-process (run_kwargs), but in the common non-Studio-venv path this command execs into unsloth studio run and rebuilds args without adding --parallel <value>, so user-specified values are dropped and the child falls back to the default of 4. Fresh evidence in this commit: the new option is defined/used here, while the re-exec argument list (later in run()) still omits any parallel flag.

Useful? React with 👍 / 👎.

`unsloth studio run` re-execs into the Studio venv when invoked from outside it (the common path). The arg-builder forwards every typer option but the new --parallel, so the child re-execs at the default 4 and any user value is silently dropped. Worse: pre-PR users who already pass `-np N` as a pass-through extra (where llama.cpp's last-wins parsing made it stick) silently lose N after this PR lands. Forward --parallel explicitly in the re-exec arg list. While auditing the re-exec path, also drop the colliding 1-char short aliases -m (--model) and -f (--frontend) plus the redundant -hfr. Click's short-option clustering had been silently mis-parsing ~11 llama-server short flags via the pass-through path: -fa as `-f a`, -mg 0 as `-m g` + stray 0, -fitt 1024 as `-f itt` + stray 1024, -hff path as `-f f` + stray `-h path`, -cmoe / -cram / -sm / -ncmoe etc. The docstring promise ("any flag this command does not recognize is forwarded verbatim") was silently violated. -hf (2-char) is kept because Click treats multi-char shorts atomically (no clustering of -hff / -hfv / -hffv / -hft) and -hf is documented in basics/api/README.md. --model / --hf-repo / --frontend long forms all unchanged. studio_default keeps -f because it has no pass-through. Tests: - test_studio_run_parallel_flag.py: 8 new re-exec coverage cases (all 3 aliases, 3 platforms via sys.platform mock, pre-PR `-np` regression, mixed with pass-through extras). - test_studio_run_short_alias_clashes.py (new): surface checks that the removed shorts cannot reappear, plus 11 parametrized cases proving each previously-broken llama-server short flag now passes through verbatim, plus a happy-path test that documented -hf still works for `org/repo:variant` syntax. All 27 tests pass. Negative test (revert either fix) shows the new tests catch the regression.

for more information, see https://pre-commit.ci

The pre-PR docstring listed --port, -c / --ctx-size, --api-key, -ngl, --jinja, --flash-attn, --no-context-shift as "rejected with HTTP 400", but only --port and --api-key (plus other networking / auth / model identity / single-model UI flags) are actually in studio/backend/core/inference/llama_server_args.py's denylist. -c / -ngl / --jinja / --flash-attn / --no-context-shift are pass-through and last-wins-override Studio's auto-set value. Rewrite the docstring to match the real denylist groups and point at the canonical source. Also add --parallel to one of the examples now that it is a first-class flag.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2409535bbd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-24T13:57:35Z

+            "ignore_unknown_options": True,
+        },
+    )(studio_mod.run)
+    result = CliRunner(mix_stderr = False).invoke(app, args, catch_exceptions = True)


Remove unsupported mix_stderr from CliRunner

The new tests instantiate CliRunner(mix_stderr=False), but that constructor kwarg is not available in older Click releases (including environments where typer pulls an older click), which raises TypeError before any assertions run. In practice this makes the added regression suites fail immediately instead of validating behavior (I reproduced this failure locally, and the same pattern is also present in unsloth_cli/tests/test_studio_run_short_alias_clashes.py).

Useful? React with 👍 / 👎.

…5741 comments (#5746) * ci: broaden Linux llama.cpp runtime pattern to lib*.so* #5741 patched the explicit Linux pattern list to add ``libllama-*-impl.so*`` after ggml-org/llama.cpp#23462 (between b9279 and b9283) split each binary's entry code into a paired ``lib<binary>-impl.so`` shared library. Same class of upstream repackaging will hit us again whenever a new shared lib is added. Mirror what macOS already does and replace the per-lib list with a single ``lib*.so*`` glob. ``copy_globs`` (line 3614) unions patterns, so the per-variant ``libggml-cuda.so*`` / ``libggml-hip.so*`` entries were never filtering anything; the spec lives in ``runtime_payload_health_groups`` (line 5209) which keeps the explicit minimum-required list per variant. Dry-run against b9296-bin-ubuntu-x64.tar.gz: 40 files copied (all ggml, llama, mtmd, impl variants + the two binaries we ship), 22 skipped (other CLIs, rpc-server, LICENSE). Functionally equal to the post-#5741 set. * cleanup: trim #5741 comments on the pydantic split Comments added in #5741 explained the original bug in full each time. They are mostly redundant with the commit message and the PR. Trim them to one short paragraph per site. No behavior change. * ci: narrow Windows runtime pattern to llama-server.exe + llama-quantize.exe Studio only invokes llama-server and llama-quantize. Mac and Linux already filter to those two binaries; Windows was the odd one out with ``*.exe`` copying every CLI upstream ships (llama-cli, llama-bench, llama-mtmd-cli, ...). Dry-run on b9296 (win cpu-x64, cpu-arm64, cuda-13.1, hip-radeon): 20 unused EXEs skipped per variant, all DLLs (incl. the new llama-*-impl.dll family) still copied via ``*.dll``. ``existing_install_matches_choice`` already checks llama-server.exe exists explicitly (line 5297), so the health gate is unchanged.

In full FT, AdamW weight decay shrinks the parameter directly so the implicit prior is W -> 0. In LoRA the trained parameters are A and B while the effective weight is W = W_init + (alpha/r) * B @ A; decaying A and B separately drives BA -> 0, hence W -> W_init rather than 0. The previous default of 0.01 inherited from full-FT recipes adds a measurable pull on the merged adapter back toward the base model over a few thousand steps. 0.001 keeps a small Frobenius-norm prior on ||A||^2 + ||B||^2 for numerical stability without meaningfully biasing the merged weight toward init, and aligns with the value used across the unsloth notebook templates.

* Studio: strip orphan tool_call XML from streamed visible content The speculative-buffer state machine in `studio/backend/core/inference/llama_cpp.py` can slice a tool_call XML block between the silent DRAINING path and the user-visible content_accum, depending on when in the model's emission the BUFFERING -> STREAMING -> DRAINING transitions fire. Three leak shapes were observed in a 2026-05-22 sweep of 900 Qwen3.5 / Qwen3.6 GGUF runs: Pre-fix XML leak rate: 20/900 (2.22%), concentrated 6.7% on the larger Q8 / MTP configs: Qwen3.6-35B-A3B Q8_0 4/60 (6.7%) Qwen3.6-35B-A3B-MTP Q4 4/60 (6.7%) Qwen3.5-35B-A3B Q8_0 3/60 (5.0%) Qwen3.6-27B Q8_0 3/60 (5.0%) The existing `_TOOL_XML_RE` only matched well-formed `<tool_call>...</tool_call>` and `<function=...></function>` pairs, so unterminated openings (close was DRAINED) and orphan closes (opening was DRAINED) survived the strip and reached the user. Fix relaxes the regex to also strip: 1. Orphan opening up to end-of-string: `(?:</tool_call>|\Z)` 2. Orphan closing tag: bare `</tool_call>` / `</function>` Verified on the full sweep: 20/900 -> 0/900 (100% of detected leaks eliminated). 16 unit tests in `test_tool_xml_strip.py` pin all three leak shapes plus the well-formed cases, plus parametrised checks on the 5 actual real-world leak samples from the sweep data. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: strip tail-only </parameter> orphan + tighten regex The 2026-05-22 gdpval sweep surfaced a 4th XML-leak shape not caught by the earlier regex: a bare `</parameter>\n\n` at end-of-buffer (7 of 192 trials, all Qwen3.5-27B + a few Qwen3.6-27B). The model emits the full `<tool_call><function=...><parameter=...>...content... </parameter></function></tool_call>` envelope, the speculative buffer DRAINS the opening tags as intended, but EOS (max_tokens cutoff) truncates the outer `</function></tool_call>` close, leaving just `</parameter>` as the visible tail. We strip this ONLY when end-anchored (`\s*\Z`) so legitimate mid-text uses (user code samples, documentation discussing the Qwen tool-call XML shape) survive. Verified on the 192-trial gdpval corpus: before=7, after=0. While at it, fold the five top-level alternations into three by sharing tag-name and prefix subgroups: <tool_call>... + <function=\w+>... + --> <(?:tool_call|function=\w+)>... </tool_call> | </function> --> </(?:tool_call|function)> Semantically identical (verified by replay over the 192-trial corpus + adversarial inputs, 0 diffs) and 1.34x faster on real workloads. Backtracking-safety pinned by two new perf guards (256KB '<' spam, 1000x orphan opens). Tests: 16 -> 28 (6 new functional + 4 well-formed-vs-orphan + 2 perf guards). * Tighten comments in XML-strip regex and tests Code says what it does; comments were repeating it. Strip the verbose explanations down to the WHY-only bits (engine quirk, tail-anchor rationale, real-world source of each test sample). No code changes. inference.py: 21 -> 12 lines around _TOOL_XML_RE test_tool_xml_strip.py: 343 -> 259 lines (-84) Tests: 28/28 still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…liases, fix test harness Round 1 review fixes for #5737: 1. Deny --parallel / --n-parallel / -np in the pass-through validator. Without this, `unsloth studio run --model X --parallel 8 -- --parallel 999` would last-win-override the running llama-server slot count while Studio's app.state.llama_parallel_slots and KV-cache fitting stay at the typer value (8), so the resource plan and the running process disagree. Also bypasses the typer 1..64 range guard. Reject so the only path is the first-class typer flag. 2. Backwards-compat shim for -m / -hfr / -f. Dropping the short aliases from typer broke any script using `unsloth studio run -m X` or `-hfr Y` or `-f dist`. Add _consume_legacy_short_aliases which pops EXACT whole-token matches (or `-x=value` inline form) from ctx.args into the corresponding typer parameter. Clustered tokens (`-fa`, `-mg`, `-fitt`, ...) are left in the pass-through tail unchanged. --model becomes Optional with an explicit missing-required check after the preprocessor so legacy `-m X` still satisfies the "must specify a model" requirement. 3. Drop mix_stderr from CliRunner. Typer 0.25.1 / Click 8.4.1 removed the kwarg; the test harness raised TypeError before exercising the PR behaviour. Tests run cleanly on current and older Typer/Click. 4. Correct the -np regression test docstring. Pre-PR `-np 8` was clustered by Click as `-p 8` (port=8) + stray `-n`, silently breaking the port binding -- not "passed through as 8 slots". The post-PR assertion (child gets --parallel 8) is unchanged. 5. Update studio run docstring listing rejected flags so it now correctly includes --parallel / -np / --n-parallel. New tests: - test_llama_server_args.py: parametrized denylist coverage for --parallel / --n-parallel / -np including equals-form, including out-of-range bypass attempts (999, 0). is_managed_flag flips True. - test_studio_run_short_alias_clashes.py: legacy -m / -hfr / -f promote to typer params; --model X + -m Y conflict errors; clustered -mg / -fa / -fitt still pass through (the original bug fix holds). 132 tests pass (98 backend + 34 cli).

chatgpt-codex-connector · 2026-05-24T14:11:01Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T14:11:19Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…nd missing model Three additional edge cases for the -m / -hfr / -f preprocessor: - `-m unsloth/foo:UD-Q4_K_XL` round-trips through both the preprocessor and _split_repo_variant so the child sees --model + --gguf-variant. - `-m=foo` inline value form is promoted just like `-m foo`. - Missing --model after the preprocessor raises typer.Exit(2) cleanly (replacing typer's pre-PR required-flag enforcement now that --model is Optional to allow the legacy promotion path).

chatgpt-codex-connector · 2026-05-24T14:12:24Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T14:12:47Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

chatgpt-codex-connector · 2026-05-24T14:19:36Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

chatgpt-codex-connector · 2026-05-24T14:20:07Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 453fcb072b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-25T11:15:36Z

+    parallel: int = typer.Option(
+        1,
+        "--parallel",
+        "--n-parallel",


Honor --parallel passed before studio run

Defining --parallel/--n-parallel on the studio callback causes unsloth studio --parallel 8 run --model X to accept the flag but silently ignore it for run(): the callback returns early for subcommands, and run() falls back to its own default (4). I verified this path by invoking the CLI and capturing the re-exec argv, which forwarded --parallel 4 instead of 8. This is a user-visible misconfiguration because the same option name appears valid in both positions but only works when placed after run.

Useful? React with 👍 / 👎.

…-in-depth Three Opus subagent reviewers (adversarial-user / maintenance / cross-file consistency) flagged a consistent set of cleanups; folded into one commit to avoid the pre-commit.ci force-push race. unsloth_cli/commands/studio.py: - Extract _PARALLEL_MIN / _PARALLEL_MAX / _PARALLEL_DEFAULT_RUN / _PARALLEL_DEFAULT_PLAIN module-level constants and use them in both typer Options (plain studio_default = 1, studio run = 4). - _expand_attached_np_short now rewrites -np<junk> when the suffix starts with a digit (or signed digit) so '-np8x' surfaces as a clean '-np takes an int' typer error instead of a baffling '--port invalid' complaint after Click clusters '-n -p 8x'. - Re-exec forwarding emits --load-in-4bit / --no-load-in-4bit explicitly in both directions; previously the True default relied on both layers sharing the same default forever. - run() docstring now explicitly says --parallel / -np pass-through via llama_extra_args is denied (use the typer flag above). studio/backend/run.py: - Mirror the parallel constants and route the argparse default, range check, and error message through them. Help text mentions the asymmetry with 'unsloth studio run' so direct-launch dev users aren't confused by Default 1 in isolation. studio/backend/core/inference/llama_server_args.py: - _flag_name strips surrounding whitespace before denylist lookup so a caller can't slip a managed flag past the boundary with a trailing space (the trimmed form is what downstream parsers see). Tests: - New typer-aliases-subset-of-denylist invariant: every alias the typer Option claims as --parallel on run() MUST be in the backend parallel denylist group. Catches the failure mode where someone adds a new alias and forgets the boundary. - Extended denylist parametrize to cover ~14 previously untested aliases (-mu, -dr, -hfv/-hfrv/-hffv family, -mmu, full --ui group, --models-preset / --models-autoload / --no-models-autoload). - Whitespace-padded denylist rejection (' --parallel', '-np ', etc). - --load-in-4bit re-exec test pinning both polarities + default. - -np<junk> argv rewriter regression tests. - Cross-reference headers between the two test files.

for more information, see https://pre-commit.ci

…ent --parallel Round 14 (reviewer.py --parallel 20 with gpt-5.3-codex-spark) flagged two real P1s and a stale-rebase warning. All three addressed. - studio/backend/core/inference/llama_server_args.py: widen _flag_name so -np<digit-prefix> with trailing junk (-np8x, -np-1foo, -np+1bar, -np9zzz) classifies as managed flag -np, matching the CLI _expand_attached_np_short rewriter. Without this, POST /api/inference/load with llama_extra_args=['-np8x'] slipped past the boundary while the CLI canonicalised the same form. The two sides now agree on every digit-prefix form. - unsloth_cli/commands/studio.py: reject --parallel on the studio group when a subcommand is invoked. Pre-PR the studio callback had no --parallel; my Round 12 addition made 'unsloth studio --parallel 8 run ...' silently drop the 8 because typer doesn't propagate parent options into subcommand kwargs. Now errors with exit 2 and a message pointing the operator at the correct invocation ('unsloth studio run --parallel 8 ...'). - Picked up origin/main via merge (parent commit 0caf0526): the pre-flight stale-rebase detector found 2 lines on main in studio/backend/core/export/export.py missing from PR HEAD. Merged cleanly with no conflicts. Tests: - Parametrised denylist coverage for -np<digit-prefix>+junk forms. - New runtime test confirms exit 2 + helpful error when the group --parallel is supplied alongside an invoked subcommand. - Test that the default group --parallel value still lets a subcommand resolve (no false-positive rejection).

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: acefedae30

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-25T12:29:26Z

+        # options into subcommand kwargs. Reject explicitly so the
+        # operator gets told where to put the flag instead of getting
+        # the run subcommand's default.
+        if parallel != _PARALLEL_DEFAULT_PLAIN:


Reject parent --parallel even when explicitly set to 1

The callback only errors when parallel != _PARALLEL_DEFAULT_PLAIN, so unsloth studio --parallel 1 run --model ... is treated as valid even though parent options are not propagated to subcommands. In that case the explicit --parallel 1 is silently ignored and run() falls back to its own default (4), producing a user-visible misconfiguration that this guard is intended to prevent.

Useful? React with 👍 / 👎.

Comment-only pass over the seven PR-touched files; trim verbose docstrings, collapse multi-line section dividers, and drop redundant prose that the code already conveys. No behaviour change.

Shorten the test_studio_run_parallel_flag.py module docstring and the `Re-exec arg-builder coverage` block. No behaviour change.

for more information, see https://pre-commit.ci

Trim docstrings and inline comments in studio.py, run.py, llama_server_args.py, and unsloth_cli/__init__.py. No behaviour change; all 215 tests still pass.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-26T00:37:12Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Resolve studio.py conflict by keeping both: - PR #5737's `_PARALLEL_*` constants block - main's `_iter_editable_studio_source_roots` + `_find_frontend_dist` helpers (PR #5782). run.py auto-merged cleanly. 215 PR tests still pass.

chatgpt-codex-connector · 2026-05-27T03:08:57Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

`--embedding` and `--rerank` flip llama-server into single-endpoint mode, which breaks Studio's /v1/chat/completions hop. llama-server's own `--tools` flag silently stacks on top of Studio's tool policy resolved by `--enable-tools` / `--disable-tools`. Add all three (plus the `--embeddings` / `--reranking` plural aliases) to the boundary denylist so HTTP /load and pass-through extras both reject them cleanly instead of silently desyncing the server surface. Test added to the existing `test_denylist_rejects_all_aliases` parametrize. 220 tests pass.

chatgpt-codex-connector · 2026-05-27T03:18:55Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Two cross-OS CI findings: 1. `test_typer_parallel_aliases_are_subset_of_backend_denylist` was doing `from core.inference.llama_server_args import _DENYLIST_GROUPS` which triggers `core/inference/__init__.py` and pulls in the full backend chain (fastapi / structlog / loggers / utils.hardware). The invariant only needs the constants tuple, so load the module directly via `importlib.util.spec_from_file_location` -- the test now runs with just typer + pytest installed. 2. `test_legacy_frontend_alias_still_promotes_to_frontend` asserted the literal string `"/tmp/dist"` after the value round-trips through `Path()`. On Windows `str(Path("/tmp/dist"))` is `"\tmp\dist"`, so the assertion tripped on the same logical path. Compare via `Path(x) == Path("/tmp/dist")` so the test passes on every OS. Both surfaced by the staging-4 cross-OS CI; no production-code change. 220 tests still pass locally.

chatgpt-codex-connector · 2026-05-27T04:49:48Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-27T04:50:01Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Same fix as the previous CLI-test commit: import the module via `importlib.util.spec_from_file_location` instead of `from core.inference.llama_server_args import ...`, so the test no longer needs the full backend chain (fastapi / structlog / loggers / utils.hardware) installed via `core/inference/__init__.py`. The boundary validator is intentionally dependency-free; its unit tests should reflect that.

chatgpt-codex-connector · 2026-05-27T04:52:39Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Without quotes, 'name: PR unslothai#5737 ubuntu-latest' becomes 'PR' (the rest is YAML comment), all three workflows collide on the same name, so concurrency group key '${{ github.workflow }}-${{ github.ref }}' matches across OSes and cancel-in-progress kills sibling workflows. Naming them PR-5737-<os> keeps each in its own concurrency lane.

PR #5784 ("Improve image generation UI") rewrote the message-input textarea's static `aria-label="Message input"` into a JSX conditional `aria-label={overlay ? "Image edit instructions" : "Message input"}` but did not update the RTL bidi-attribute regression test, leaving the literal-string `find('aria-label="Message input"')` anchor with no match. The `Repo tests (CPU)` job has been red on main since. Anchor on the inner `"Message input"` string literal instead -- it survives both spellings and still pins the same textarea element so the `dir="auto"` assertion has the right block to inspect. Verified by re-running the exact CI command: 954 passed, 3 skipped, 23 deselected (was 948 passed, 1 failed).

chatgpt-codex-connector · 2026-05-27T05:30:52Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

danielhanchen and others added 2 commits May 23, 2026 14:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

c5aadeb

for more information, see https://pre-commit.ci

danielhanchen force-pushed the expose-parallel-flag branch from a705850 to c5aadeb Compare May 23, 2026 14:00

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

danielhanchen and others added 3 commits May 24, 2026 13:44

[pre-commit.ci] auto fixes from pre-commit.com hooks

64d5769

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 24, 2026

View reviewed changes

danielhanchen and others added 4 commits May 24, 2026 14:10

danielhanchen requested review from Datta0, pluesclues and rolandtannous as code owners May 24, 2026 14:10

[pre-commit.ci] auto fixes from pre-commit.com hooks

476818f

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

c0ab80b

for more information, see https://pre-commit.ci

Scrub .github/workflows for staging push (matches staging base)

e8bbae1

This was referenced May 24, 2026

Studio: expose --parallel / -np flag on unsloth studio run shimmyshimmer/unsloth-staging-4#53

Closed

[tests] Studio: expose --parallel / -np flag on unsloth studio run shimmyshimmer/unsloth-staging-4#54

Closed

danielhanchen added the auto-reviewing Auto-review in progress label May 24, 2026

chatgpt-codex-connector Bot reviewed May 25, 2026

View reviewed changes

danielhanchen and others added 5 commits May 25, 2026 11:39

[pre-commit.ci] auto fixes from pre-commit.com hooks

f2657da

for more information, see https://pre-commit.ci

fix: repair mlx studio base export save_method (#5727)

cccd9c6

[pre-commit.ci] auto fixes from pre-commit.com hooks

acefeda

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 25, 2026

View reviewed changes

danielhanchen and others added 5 commits May 25, 2026 13:35

Studio: tighten code comments across --parallel PR

1d3e250

Comment-only pass over the seven PR-touched files; trim verbose docstrings, collapse multi-line section dividers, and drop redundant prose that the code already conveys. No behaviour change.

Studio: trim remaining verbose docstrings missed in last pass

6c2068f

Shorten the test_studio_run_parallel_flag.py module docstring and the `Re-exec arg-builder coverage` block. No behaviour change.

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7800b3

for more information, see https://pre-commit.ci

Studio: second comment-tightening pass across PR-touched code

1a53342

Trim docstrings and inline comments in studio.py, run.py, llama_server_args.py, and unsloth_cli/__init__.py. No behaviour change; all 215 tests still pass.

[pre-commit.ci] auto fixes from pre-commit.com hooks

7ce0778

for more information, see https://pre-commit.ci

Merge origin/main into expose-parallel-flag

4141754

Resolve studio.py conflict by keeping both: - PR #5737's `_PARALLEL_*` constants block - main's `_iter_editable_studio_source_roots` + `_find_frontend_dist` helpers (PR #5782). run.py auto-merged cleanly. 215 PR tests still pass.

danielhanchen mentioned this pull request May 27, 2026

PR #5737 cross-OS validation (ubuntu / macos-14 / windows) shimmyshimmer/unsloth-staging-4#57

Closed

[pre-commit.ci] auto fixes from pre-commit.com hooks

d7153ca

for more information, see https://pre-commit.ci

danielhanchen merged commit 649b9f7 into main May 27, 2026
36 checks passed

danielhanchen deleted the expose-parallel-flag branch May 27, 2026 06:13

Imagineer99 mentioned this pull request Jun 11, 2026

[Feature] Add max_concurrency setting for Unsloth API inference #5481

Closed

Uh oh!

Conversation

danielhanchen commented May 23, 2026

Summary

Why now

Concurrency safety

Caveats users should know

Tests

Test plan

Uh oh!

gemini-code-assist Bot commented May 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development