security: default Studio host to 127.0.0.1 and prompt before auto-start#1
Merged
Merged
Conversation
Closes unslothai#4684. Previously Unsloth Studio bound to 0.0.0.0 (all interfaces) by default and the installer silently auto-started a server at the end of install, contradicting the documented "privacy first / 100% offline and locally" guarantee and exposing the service on the network without user consent. Changes: - studio/backend/run.py: change run_server() default host to 127.0.0.1; extract argparse setup into _make_argument_parser() for testability - unsloth_cli/commands/studio.py: change typer.Option default to 127.0.0.1 - install.sh: remove -H 0.0.0.0 from generated launch-studio.sh launcher template; replace silent auto-start with a [Y/n] prompt - install.ps1: replace silent auto-start with a Read-Host [Y/n] prompt - README.md: simplify launch examples to `unsloth studio` (no -H flag); note that -H 0.0.0.0 is available for cloud/network use Users who need all-interface binding (cloud VMs, LAN sharing) can still pass -H 0.0.0.0 explicitly. No other logic was changed: _is_port_free(), startup_banner, and health-check paths all already handle 127.0.0.1 correctly. Tests (TDD — written before implementation): - studio/backend/tests/test_host_defaults.py: AST inspection of run_server() parameter default and argparse --host default - tests/studio/test_cli_studio_defaults.py: AST inspection of typer.Option default for studio_default() --host parameter - tests/sh/test_install_host_defaults.sh: static analysis of installer scripts and README launch section https://claude.ai/code/session_012umxRmBdeDV5U7Xhm1utu6
Bedrovelsen
pushed a commit
that referenced
this pull request
Apr 21, 2026
…rt (unslothai#5049) * Chat-template repair: warn-by-default, AST classification, dict support Follow-up hardening on top of PR unslothai#4426 (which fixed the unslothai#4150 RuntimeError for ChatML LoRA reloads). Behavior changes: - Warn-by-default instead of RuntimeError. When fix_chat_template cannot repair a broken template, emit a warning and return the original. Set UNSLOTH_STRICT_CHAT_TEMPLATE=1 to restore the pre-warn hard fail. Fixes the UX where a missing `{% if add_generation_prompt %}` block on a saved LoRA (typical after LlamaFactory / Axolotl re-serialize) would block model loading entirely. - Local path vs HF hub distinguished in the warning message. For local paths the message points at the likely downstream tool; for HF IDs it points at the upstream model maintainers. Previously both said "file a bug report to the maintainers of <path>" even when <path> was the user's own saves/ directory. - Dict / list chat_template now handled. Hermes-3 ships with {default, tool_use} and the previous code crashed with AttributeError: 'dict' object has no attribute 'find' when entering _fix_chat_template with a dict. Each variant is now fixed independently; structure is preserved. Internals: - _find_end_position now matches all four Jinja whitespace-control variants ({% %}, {%- %}, {% -%}, {%- -%}) and returns the rightmost endfor/endif so multi-for templates aren't locked onto the first loop. Previously {%- endfor -%} (both-side dash, used by Qwen3-Guard) was silently bypassed. - _has_add_generation_prompt_block uses Jinja AST via jinja2.nodes.If/Name walks instead of substring matching, so templates that hide the block behind comments or dash-style variants are classified correctly. - _template_ends_with_toplevel_for gates the GH#4150 ChatML repair on the AST: only fires when the last structural top-level node is a For (standard ChatML shape), ignoring trailing pure-whitespace output nodes. Templates wrapped in an outer If (Qwen3-Guard) are now explicitly skipped at the _fix_chat_template level as well, not just at load_correct_tokenizer's name-based exemption. - _validate_patched_template renders the patched template with and without add_generation_prompt and confirms the patched output responds to the flag by appending (not replacing) content. If validation fails, the patch is discarded and we fall through to the warn path. Verified with an expanded regression suite in tests/: - test_fix_chat_template_pr4426.py: 42/42 template-matrix cells - test_load_correct_tokenizer_pr4426.py: 5/5 tokenizer loads - test_chat_template_followups.py: 10/10 new follow-up tests - test_mistral_pr4426.py: 5 Mistral variants byte-identical - test_qwen_pr4426.py: 14 Qwen variants byte-identical (Qwen1.5, Qwen2, Qwen2.5-Instruct/Coder/Math/VL, Qwen3, Qwen3-Coder, QwQ, Qwen3-Guard-Gen) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard _validate_patched_template against read-only chat_template If tokenizer.chat_template is a property or otherwise read-only, the validation helper would crash with AttributeError when trying to temporarily set the patched template. Catch the assignment failure and return False (skip validation), and best-effort restore in the finally block. * Replace regex separator inference with render-diff; broaden repair to non-ChatML templates The previous `_infer_assistant_separator` was a four-tier regex heuristic that only worked on ChatML-shaped templates and forced a hard `<|im_start|>` / `<|im_end|>` presence gate on Case 2 repair. This meant a Llama-3, Gemma, or Phi-3 template stripped of its generation-prompt block by a downstream tool (LlamaFactory, Axolotl, etc.) would still warn-and-return even though the structural shape is identical to the ChatML case the PR already handles. This replaces the regex with `_derive_assistant_prefix_by_render`: render the template with two dialogs that differ only in assistant content, then `os.path.commonprefix` on the tails captures the exact assistant-turn prefix the template emits. The template itself is ground truth, so non-ChatML shapes work as long as the assistant block is a literal the template emits once per message. Three guards keep the derivation safe: A. both assistant renders extend the base render (no reordering); B. the divergence point is exactly the content-insertion site (sentinel follows the common prefix); C. a user-role cross-check: if a render with a user sentinel also emits the same prefix, role has no effect on output and we reject. A render failure on [user, user] (e.g. Gemma's `raise_exception` alternation check) is evidence that role matters; we accept. Sentinels differ at character 0 so `commonprefix` cannot absorb them, and trailing whitespace/comments after the last `{% endfor %}` are stripped before probing (they would appear in base but not after the appended assistant turn and break Guard A). `_fix_chat_template` and `_repair_string_template` now thread an `is_sharegpt` kwarg; `_fix_chat_template` retries once with `is_sharegpt=True` if the first probe returns None (dual-probe fallback for dict/list callers). The ChatML `<|im_start|>` / `<|im_end|>` hard gate in Case 2 is dropped. `_infer_assistant_separator` is deleted. Verified via: - tests/test_fix_chat_template_pr4426.py: 51/51 cells (new Llama-3, Gemma, Phi-3 broken-template rows all repair FIX-OK) - tests/test_load_correct_tokenizer_pr4426.py: 5/5 - tests/test_chat_template_followups.py: 18/18 (T11-T18 cover non-ChatML repair + probe failure modes) - tests/test_mistral_pr4426.py: 5/5 byte-identical - tests/test_qwen_pr4426.py: 14/14 byte-identical (Qwen3-Guard AST gate still rejects) - tests/hermes3_lora_pr4426.py reload: patched template ends with `<|im_start|>assistant\n`, inference returns sensible output. - temp/sim/battery.py: 79/79 followup; vs baseline: 0 regressions, 9 improvements. - Spot-check probe on real stripped tokenizers (Hermes-3, Phi-4, Llama-3.2-1B, Gemma-3-1B): all derive the expected prefix. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address reviewer findings: variant routing, positive-gate detection, comment-safe end scan Resolves three reviewer findings on PR unslothai#5049 (`fix/chat-template-followups`): Finding #1 [10/10]: dict/list variants now route through `_fix_chat_template_for_tokenizer` via a new `_VariantTokenizerProxy` adapter. Previously the dict/list branches called `_fix_chat_template` directly, silently bypassing the warn/strict (`UNSLOTH_STRICT_CHAT_TEMPLATE`) contract, the `no == yes` diagnostic, broken-existing-block detection, and `_validate_patched_template` guard. The proxy swaps `base.chat_template` to the variant string before each `apply_chat_template` call so tokenizer globals (`bos_token`, custom filters, `raise_exception`) remain available; if the base is read-only it falls back to isolated Jinja rendering. Finding unslothai#2 [1/10]: `_has_add_generation_prompt_block` now requires the `If` body to contain at least one `Output` node (a new `_if_body_emits_content` helper walks descendants). This distinguishes a real generation-prompt block from a header guard like `{% if not add_generation_prompt is defined %}{% set ... %}{% endif %}` (body contains only `Assign`) which references the name but emits nothing. Also dropped a now-redundant `"add_generation_prompt" not in scrubbed` guard in `_fix_chat_template` Case 2 so header-guarded templates still get repaired. Finding unslothai#4 [1/10]: `_find_end_position` now replaces Jinja comments with equal-length whitespace before scanning for `{% endfor %}` / `{% endif %}` tokens. This prevents a trailing comment containing those tokens from being picked as the real end tag. Positions in the padded string map 1:1 to positions in the original template. Tests: - tests/test_chat_template_followups.py: 21/21 (T19 strict-mode dict variant, T20 header-guard repair, T21 comment-endfor trap added; T4/T5 stubs updated with a working apply_chat_template that routes through Jinja). - tests/test_fix_chat_template_pr4426.py: 51/51 cells unchanged. - tests/test_load_correct_tokenizer_pr4426.py: 5/5. - tests/test_mistral_pr4426.py: 5/5 byte-identical. - tests/test_qwen_pr4426.py: 14/14 byte-identical. - temp/sim/battery.py: 79/79 followup; 0 regressions vs baseline. - Phase 3 Hermes-3 broken-LoRA reload: inference still returns `'The answer to the equation 2+2 is 4.'`. - Spot-checks on Hermes-3 / Phi-4 / Llama-3.2-1B / Gemma-3-1B real stripped templates: probe still derives the expected prefix. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tighten comments in chat-template helpers Pure comment minimization across `_find_end_position`, `_has_add_generation_prompt_block`, `_if_body_emits_content`, `_derive_assistant_prefix_by_render`, `_fix_chat_template` Case 2, and `_VariantTokenizerProxy`. No behavior change; same intent, fewer lines. All 21 follow-up tests and the 51-cell Phase 1 matrix still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Sandbox probe, fix is_sharegpt validator mismatch, reject negated gates Three real bugs from the 10-agent Opus review: 1. Probe now uses `jinja2.sandbox.SandboxedEnvironment` instead of bare `jinja2.Environment`. The probe renders at model-load time (before the user calls `apply_chat_template`), so it was a new eager code-execution surface that the base HF tokenizer loading does not have. SandboxedEnvironment blocks attribute-chain exploits at negligible cost. 2. `_repair_string_template` now tries validation with both `is_sharegpt=False` and `is_sharegpt=True`. Previously, when `_fix_chat_template` internally fell back to the other schema via its dual-probe, the outer validation still used the caller's original `is_sharegpt` -- rendering with the wrong message keys and spuriously dropping a valid repair. 3. `_has_add_generation_prompt_block` now skips `If` nodes whose test is a `Not` expression. A negated gate like `{% if not add_generation_prompt %}{{ x }}{% endif %}` fires when agp=False, so its emitting body is not a generation block -- but the old code counted any Name reference regardless of polarity. Cleanup: removed unused `self._label`, added `\r` escape in generation-block literal, switched variant labels to `!r` formatting, removed redundant `import os as _os`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jinja2.sandbox import and sandbox proxy fallback Two critical findings from the 20-reviewer pass: 1. [20/20] The proxy read-only fallback used bare `jinja2.Environment`, not sandboxed. All 20 reviewers independently reproduced marker-file creation via `cycler.__init__.__globals__['os'].system(...)` during `fix_chat_template()`. Fixed: fallback now uses `from jinja2.sandbox import SandboxedEnvironment`. 2. [14/20] The render-diff probe did `import jinja2` then referenced `jinja2.sandbox.SandboxedEnvironment`. `jinja2.sandbox` is a submodule that is NOT auto-imported by `import jinja2` on Jinja 3.1.6. This caused `AttributeError` (swallowed by `except Exception`), making the entire Case 2 repair path silently return None in a clean process. The 6 reviewers who saw it work had `jinja2.sandbox` pre-imported by an earlier module in their process. Fixed: both the probe and the proxy fallback now use `from jinja2.sandbox import SandboxedEnvironment`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes unslothai#4684.
Previously Unsloth Studio bound to 0.0.0.0 (all interfaces) by default
and the installer silently auto-started a server at the end of install,
contradicting the documented "privacy first / 100% offline and locally"
guarantee and exposing the service on the network without user consent.
Changes:
extract argparse setup into _make_argument_parser() for testability
template; replace silent auto-start with a [Y/n] prompt
unsloth studio(no -H flag);note that -H 0.0.0.0 is available for cloud/network use
Users who need all-interface binding (cloud VMs, LAN sharing) can still
pass -H 0.0.0.0 explicitly. No other logic was changed: _is_port_free(),
startup_banner, and health-check paths all already handle 127.0.0.1
correctly.
Tests (TDD — written before implementation):
run_server() parameter default and argparse --host default
typer.Option default for studio_default() --host parameter
scripts and README launch section