Skip to content

[tests] security: default Studio host to 127.0.0.1 and prompt before auto-start#8

Closed
danielhanchen wants to merge 3 commits into
mainfrom
pr-4864-tests
Closed

[tests] security: default Studio host to 127.0.0.1 and prompt before auto-start#8
danielhanchen wants to merge 3 commits into
mainfrom
pr-4864-tests

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Staging mirror of unslothai#4864

Original PR: unslothai#4864
Author: @Bedrovelsen

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.


Original description

Closes unslothai#4684.

Previously Unsloth Studio bound to 0.0.0.0 (all interfaces) by default and the installer silently auto-started a server at the end of install, contradicting the documented "privacy first / 100% offline and locally" guarantee and exposing the service on the network without user consent.

Changes:

  • studio/backend/run.py: change run_server() default host to 127.0.0.1; extract argparse setup into _make_argument_parser() for testability
  • unsloth_cli/commands/studio.py: change typer.Option default to 127.0.0.1
  • install.sh: remove -H 0.0.0.0 from generated launch-studio.sh launcher template; replace silent auto-start with a [Y/n] prompt
  • install.ps1: replace silent auto-start with a Read-Host [Y/n] prompt
  • README.md: simplify launch examples to unsloth studio (no -H flag); note that -H 0.0.0.0 is available for cloud/network use

Users who need all-interface binding (cloud VMs, LAN sharing) can still pass -H 0.0.0.0 explicitly. No other logic was changed: _is_port_free(), startup_banner, and health-check paths all already handle 127.0.0.1 correctly.

Tests (TDD — written before implementation):

  • studio/backend/tests/test_host_defaults.py: AST inspection of run_server() parameter default and argparse --host default
  • tests/studio/test_cli_studio_defaults.py: AST inspection of typer.Option default for studio_default() --host parameter
  • tests/sh/test_install_host_defaults.sh: static analysis of installer scripts and README launch section

https://claude.ai/code/session_012umxRmBdeDV5U7Xhm1utu6


This PR contains test changes only (3 files). Code changes are in a separate PR.

Test files:

  • studio/backend/tests/test_host_defaults.py
  • tests/sh/test_install_host_defaults.sh
  • tests/studio/test_cli_studio_defaults.py

claude and others added 2 commits April 5, 2026 20:29
Closes unslothai#4684.

Previously Unsloth Studio bound to 0.0.0.0 (all interfaces) by default
and the installer silently auto-started a server at the end of install,
contradicting the documented "privacy first / 100% offline and locally"
guarantee and exposing the service on the network without user consent.

Changes:
- studio/backend/run.py: change run_server() default host to 127.0.0.1;
  extract argparse setup into _make_argument_parser() for testability
- unsloth_cli/commands/studio.py: change typer.Option default to 127.0.0.1
- install.sh: remove -H 0.0.0.0 from generated launch-studio.sh launcher
  template; replace silent auto-start with a [Y/n] prompt
- install.ps1: replace silent auto-start with a Read-Host [Y/n] prompt
- README.md: simplify launch examples to `unsloth studio` (no -H flag);
  note that -H 0.0.0.0 is available for cloud/network use

Users who need all-interface binding (cloud VMs, LAN sharing) can still
pass -H 0.0.0.0 explicitly. No other logic was changed: _is_port_free(),
startup_banner, and health-check paths all already handle 127.0.0.1
correctly.

Tests (TDD — written before implementation):
- studio/backend/tests/test_host_defaults.py: AST inspection of
  run_server() parameter default and argparse --host default
- tests/studio/test_cli_studio_defaults.py: AST inspection of
  typer.Option default for studio_default() --host parameter
- tests/sh/test_install_host_defaults.sh: static analysis of installer
  scripts and README launch section

https://claude.ai/code/session_012umxRmBdeDV5U7Xhm1utu6

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9df70654f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +49 to +50
"launcher template: no hardcoded 'studio -H 0.0.0.0'" \
"$_launcher" "studio -H 0.0.0.0"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Broaden insecure-host pattern to include --host form

This regression test only rejects the literal string studio -H 0.0.0.0. If install.sh, install.ps1, or README.md switches to the equivalent studio --host 0.0.0.0, these checks will pass even though the insecure all-interfaces default is still present. Because this script is meant to enforce the security default, the narrow match creates a false negative path.

Useful? React with 👍 / 👎.

Comment on lines +96 to +100
has_host_option = any(
isinstance(a, ast.Constant) and a.value in ("--host", "-H")
for a in default.args
)
if has_host_option:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate host override behavior instead of option presence

This test is described as verifying that users can still override --host, but the implementation only checks for the existence of a call containing --host/-H. It would still pass if runtime validation later rejects values like 0.0.0.0, so it does not actually guard the behavior the docstring promises and can miss regressions in explicit override support.

Useful? React with 👍 / 👎.

continue
args = func_node.args.args
defaults = func_node.args.defaults
offset = len(args) - len(defaults)
import ast
from pathlib import Path

import pytest
import ast
from pathlib import Path

import pytest
@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d1d5d1e30

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


# The primary Launch example must show 'unsloth studio' without -H 0.0.0.0
# (0.0.0.0 may appear later as an opt-in note, but not in the code block).
_readme_launch=$(awk '/#### Launch/{found=1} found{print} /^####/{if(!/#### Launch/)found=0}' "$README" | head -10)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scan entire Launch section before asserting host safety

Limiting the README check to head -10 means the test can miss an insecure primary launch command if the section gains a few explanatory lines before the code block; in that case studio -H 0.0.0.0 could reappear below line 10 and this regression test would still pass. Since this script is intended to enforce a security default, truncating the section creates a false-negative path.

Useful? React with 👍 / 👎.

echo ""
echo "=== install.sh end-of-install block ==="

_end=$(tail -40 "$INSTALL_SH")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid fixed tail windows for end-of-install validation

Using tail -40 (and similarly tail -20 for PowerShell) makes the checks sensitive to file length rather than the actual install block boundaries, so adding unrelated lines at the end can move insecure launch commands out of the scanned window and let the test pass incorrectly. This weakens the regression guard for the installer’s default host behavior.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d1d5d1e30

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +57 to +59
assert_contains \
"install.sh: interactive block prompts user (bash read)" \
"$_end" "read"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match read as a command, not any substring

This check currently passes on any occurrence of the substring read, so the test can report success even when there is no interactive prompt command (for example, if comments/messages include words like already). That weakens the security regression guard this test is meant to provide. The assertion should match an actual shell read invocation (or the specific prompt text) rather than a raw substring.

Useful? React with 👍 / 👎.

Comment on lines +96 to +100
has_host_option = any(
isinstance(a, ast.Constant) and a.value in ("--host", "-H")
for a in default.args
)
if has_host_option:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Verify host override is propagated, not just declared

This test claims to ensure --host can be explicitly overridden, but it only checks that a typer.Option with --host/-H exists. A regression where studio_default still defines the option but ignores the parsed value (e.g., hard-codes 127.0.0.1 when launching) would still pass here, so the intended behavior is not actually protected.

Useful? React with 👍 / 👎.

danielhanchen added a commit that referenced this pull request May 25, 2026
…guard for PR unslothai#5754

Round 9 reviewer flagged a pile of handoff asymmetries: every
GPU-owning lifecycle change (training, export, chat, images) needed
its own bespoke unload sequence and they had drifted out of sync.
Some skipped llama-server is_active; some missed safetensors
loading_models; export and training did not check is_export_active.

Backend handoff (P1)
  * routes/inference.py: new _release_chat_for / _release_export_for
    helpers. Both treat llama-server as held when is_loaded OR
    is_active, safetensors as held when active_model_name OR
    loading_models is non-empty, and export as held when
    current_checkpoint OR is_export_active. Both helpers run their
    unloads in worker threads so async routes do not block the
    event loop.
  * routes/training.py: replaces its bespoke inline llama / safe /
    export unload sequence with await _release_chat_for / _release_
    export_for.
  * routes/export.py: same swap for the chat unload chain (export
    still does NOT call _release_export_for on itself).
  * routes/inference.py GGUF + standard chat-load paths: now use
    _release_export_for to drop a settled export, and the standard
    path's llama unload now also handles is_active=True (round 9
    review #8).

Backend reject-on-active export (P1 #5)
  * routes/inference.py: new _raise_if_export_active. Symmetric
    with _raise_if_training_active: a long-running export is
    refused with HTTP 409 instead of being silently killed when
    /images/load or /load arrives. Diffusion / images load and
    both chat-load paths call it.
  * core/inference/diffusion.py _release_other_gpu_owners_for_
    diffusion: no longer tears down an in-flight export job. Only
    drops a SETTLED export checkpoint (current_checkpoint
    populated, is_export_active False). Round 9 review #5 -- the
    previous behavior could terminate an in-flight export and
    leave a partial output artifact.

Token leak via logger.exception (P1 #6)
  * core/inference/diffusion.py: load-failure logging now uses
    logger.error(..., exc_msg) with the already-scrubbed string
    and exc_info=False. logger.exception() with the raw Exception
    would expose any hf_... token that diffusers / huggingface_hub
    embedded in the message or traceback locals, defeating the
    earlier in-flight scrub.

Dependency pinning (P1 #11)
  * pyproject.toml: huggingfacenotorch optional extra now pins
    diffusers>=0.37.0. Previously the floor was only set in
    studio/backend/requirements/no-torch-runtime.txt, so a normal
    pip install would resolve diffusers 0.36.0 (no
    Flux2KleinPipeline) and the default curated FLUX.2 klein
    Images model would fail at runtime.

Cache-delete exact match (P1 #14)
  * routes/models.py /delete-cached: llama.cpp and safetensors
    guards now match on exact repo-id (case-insensitive) instead
    of prefix. Diffusion guard already does this; the chat guards
    were the remaining surface where loading org/model-v2
    blocked deleting org/model.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_release_llama_for()`` now verifies ``llama.unload_model``
did not return False AND that ``is_loaded`` / ``is_active`` /
``loading_model_identifier`` are all cleared after the call. The
previous version only treated raised exceptions as failure, so a
subprocess refusing to terminate or an in-flight GGUF download
let the next workload allocate on top.

P1 #2: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
now raises RuntimeError when ``exp._shutdown_subprocess`` fails on
a settled checkpoint. Direct backend callers used to log at debug
level and proceed toward diffusion allocation while the export
checkpoint still owned VRAM.

P1 #3 + P1 #7: ``/images/load`` no longer drops chat + idle export
before the cheap backend validation runs. ``DiffusionBackend.load_model``
already calls the strict ``_release_other_gpu_owners_for_diffusion``
and ``_release_chat_backend_for_diffusion`` helpers AFTER family
inference and GGUF filename checks pass, so the GPU is still
freed before allocation and a malformed payload no longer
silently unloads the user's chat / chat-export pair.

P1 #4: ``_release_chat_backend_for_diffusion`` now also rejects a
post-unload state where ``loading_model_identifier`` is still set,
matching the route-level ``_release_llama_for`` strictness. A GGUF
download mid-flight before the diffusion handoff used to slip
through and end up double-owning VRAM after diffusion allocated.

P1 #5: ``_release_diffusion_for`` no longer swallows a post-unload
``status()`` failure as ``after = {}``. Training / chat / export
handoffs need proof that the diffusion pipeline released VRAM;
the helper now raises HTTP 503 when the verification status call
itself raises, so the caller retries.

P1 #6: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
raises RuntimeError when ``get_export_backend()`` itself raises.
Direct backend callers used to silently ``return`` here and
proceed to GPU allocation without being able to verify export
ownership.

P1 #8: ``/training/start`` releases settled export BEFORE chat,
matching the chat-load helpers. If idle export shutdown fails the
user's chat model is preserved instead of being dropped for a
training run that never starts.

P2 #9: GGUF load-error scrubber also collapses ``local_gguf_path``,
the resolved HF cache path passed to
``transformer_cls.from_single_file()``. Without this an exception
like ``OSError: cannot load /home/alice/.cache/huggingface/.../flux.gguf``
would leak the operator's filesystem layout through ``last_error``
and ``/images/status``.

All 85 diffusion-relevant backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_release_safetensors_chat_for`` now re-reads
``active_model_name`` and ``loading_models`` after each unload AND
runs a final sweep against the initial owned-name set. The previous
helper trusted ``unload_model() -> True`` even though the
orchestrator can respond ``unloaded`` while still holding weights
or a concurrent ``load`` can repopulate the tracker between calls.
Per-name and global post-state mismatches now raise HTTP 503 so
the caller retries.

P1 #2: same post-state guarantee inside
``_release_chat_backend_for_diffusion`` for direct backend
callers. ``DiffusionBackend.load_model`` now raises RuntimeError
when the safetensors tracker still owns a previously-resident
name after the unload, matching the route-level helper. The route
layer's existing classifier maps the new wording to HTTP 503.

P1 #3: ``DiffusionBackend.load_model`` now preflights the full
diffusers repo (or explicit GGUF ``base_repo``) via
``hf_hub_download(filename="model_index.json")`` BEFORE the
chat / export unload runs. The GGUF path was already covered by
the existing ``hf_hub_download(gguf_filename)`` round-trip; the
full-repo path used to skip validation and let a typo / private /
gated repo only surface inside ``from_pretrained`` AFTER the
user's chat model was already dropped. Local paths are checked
structurally (must be a directory containing ``model_index.json``)
so we do not network-round-trip for an on-disk miss. Error
messages route through ``_display_repo_id`` so an absolute
filesystem path does not leak the operator's layout.

P1 #6: ``/api/inference/unload`` (the direct chat unload endpoint)
now treats ``unload_model() -> False`` AND a leftover state
(``is_loaded`` / ``is_active`` / ``loading_model_identifier`` for
GGUF, ``active_model_name`` / ``loading_models`` for safetensors)
as 503 instead of unconditionally responding
``status="unloaded"``. The UI used to show the model as gone while
the backend still owned VRAM.

P2 #7: extended the /images/load RuntimeError -> HTTPException
marker list with ``still active or loading after unload`` and
``still loading after unload``. Round 18 introduced these exact
phrasings on the backend side; without the extension a retryable
unload failure was returning HTTP 400 to the user instead of 503.

P2 #8: removed the unused ``unsloth_backend = get_inference_backend()``
eager construction in the GGUF chat-load branch. Eager
construction made the GGUF-only path needlessly fail or pay
startup cost when the safetensors backend was unavailable / lazy;
``_release_safetensors_chat_for`` already handles that case as a
no-op.

All 85 diffusion-relevant + 98 related backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in
``utils/datasets/llm_assist.py`` now also gates on the GGUF chat
backend (llama-server) AND the safetensors chat backend. Round 23
extended it to training + export but missed Chat, so a helper /
advisor GGUF could still race a loaded chat model for VRAM.
Both checks fail closed when status is unverifiable.

P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff
unloads so the diffusion release runs BEFORE the chat releases.
A wedged diffusion unload used to fire AFTER chat was already
gone, so the user lost both on a single failure. Drop chat last
so an earlier failure preserves it. Applied to
``/training/start`` (training.py), ``/export/load`` (export.py),
``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch
(routes/inference.py).

P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens
``model_path`` and ``gguf_variant`` via the shared
``_validate_logged_identifier`` helper, so control characters
and URL-form HF tokens can no longer log-line-smuggle.

P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and
``variant`` the same way.

P1 #9: ``/download-progress`` ``repo_id`` query parameter is
also hardened; the value flows into log lines deep inside
``_get_repo_size_cached`` on lookup failure.

P1 #11: ``CheckFormatRequest.dataset_name`` and
``AiAssistMappingRequest.{dataset_name, model_name}`` in
``models/datasets.py`` now apply the same control-char +
embedded-HF-token validators, matching every other public
request-body model.

All 115 diffusion + training-validation + cached_gguf + export
+ inference model-validation tests pass locally.

(P1 #6 native-path-lease enforcement for diffusion local paths
and P1 #12 React Compiler frontend lint deferred -- both need
focused design / frontend touchups separate from this batch.)
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_release_llama_for()`` now verifies ``llama.unload_model``
did not return False AND that ``is_loaded`` / ``is_active`` /
``loading_model_identifier`` are all cleared after the call. The
previous version only treated raised exceptions as failure, so a
subprocess refusing to terminate or an in-flight GGUF download
let the next workload allocate on top.

P1 #2: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
now raises RuntimeError when ``exp._shutdown_subprocess`` fails on
a settled checkpoint. Direct backend callers used to log at debug
level and proceed toward diffusion allocation while the export
checkpoint still owned VRAM.

P1 #3 + P1 #7: ``/images/load`` no longer drops chat + idle export
before the cheap backend validation runs. ``DiffusionBackend.load_model``
already calls the strict ``_release_other_gpu_owners_for_diffusion``
and ``_release_chat_backend_for_diffusion`` helpers AFTER family
inference and GGUF filename checks pass, so the GPU is still
freed before allocation and a malformed payload no longer
silently unloads the user's chat / chat-export pair.

P1 #4: ``_release_chat_backend_for_diffusion`` now also rejects a
post-unload state where ``loading_model_identifier`` is still set,
matching the route-level ``_release_llama_for`` strictness. A GGUF
download mid-flight before the diffusion handoff used to slip
through and end up double-owning VRAM after diffusion allocated.

P1 #5: ``_release_diffusion_for`` no longer swallows a post-unload
``status()`` failure as ``after = {}``. Training / chat / export
handoffs need proof that the diffusion pipeline released VRAM;
the helper now raises HTTP 503 when the verification status call
itself raises, so the caller retries.

P1 #6: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
raises RuntimeError when ``get_export_backend()`` itself raises.
Direct backend callers used to silently ``return`` here and
proceed to GPU allocation without being able to verify export
ownership.

P1 #8: ``/training/start`` releases settled export BEFORE chat,
matching the chat-load helpers. If idle export shutdown fails the
user's chat model is preserved instead of being dropped for a
training run that never starts.

P2 #9: GGUF load-error scrubber also collapses ``local_gguf_path``,
the resolved HF cache path passed to
``transformer_cls.from_single_file()``. Without this an exception
like ``OSError: cannot load /home/alice/.cache/huggingface/.../flux.gguf``
would leak the operator's filesystem layout through ``last_error``
and ``/images/status``.

All 85 diffusion-relevant backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_release_safetensors_chat_for`` now re-reads
``active_model_name`` and ``loading_models`` after each unload AND
runs a final sweep against the initial owned-name set. The previous
helper trusted ``unload_model() -> True`` even though the
orchestrator can respond ``unloaded`` while still holding weights
or a concurrent ``load`` can repopulate the tracker between calls.
Per-name and global post-state mismatches now raise HTTP 503 so
the caller retries.

P1 #2: same post-state guarantee inside
``_release_chat_backend_for_diffusion`` for direct backend
callers. ``DiffusionBackend.load_model`` now raises RuntimeError
when the safetensors tracker still owns a previously-resident
name after the unload, matching the route-level helper. The route
layer's existing classifier maps the new wording to HTTP 503.

P1 #3: ``DiffusionBackend.load_model`` now preflights the full
diffusers repo (or explicit GGUF ``base_repo``) via
``hf_hub_download(filename="model_index.json")`` BEFORE the
chat / export unload runs. The GGUF path was already covered by
the existing ``hf_hub_download(gguf_filename)`` round-trip; the
full-repo path used to skip validation and let a typo / private /
gated repo only surface inside ``from_pretrained`` AFTER the
user's chat model was already dropped. Local paths are checked
structurally (must be a directory containing ``model_index.json``)
so we do not network-round-trip for an on-disk miss. Error
messages route through ``_display_repo_id`` so an absolute
filesystem path does not leak the operator's layout.

P1 #6: ``/api/inference/unload`` (the direct chat unload endpoint)
now treats ``unload_model() -> False`` AND a leftover state
(``is_loaded`` / ``is_active`` / ``loading_model_identifier`` for
GGUF, ``active_model_name`` / ``loading_models`` for safetensors)
as 503 instead of unconditionally responding
``status="unloaded"``. The UI used to show the model as gone while
the backend still owned VRAM.

P2 #7: extended the /images/load RuntimeError -> HTTPException
marker list with ``still active or loading after unload`` and
``still loading after unload``. Round 18 introduced these exact
phrasings on the backend side; without the extension a retryable
unload failure was returning HTTP 400 to the user instead of 503.

P2 #8: removed the unused ``unsloth_backend = get_inference_backend()``
eager construction in the GGUF chat-load branch. Eager
construction made the GGUF-only path needlessly fail or pay
startup cost when the safetensors backend was unavailable / lazy;
``_release_safetensors_chat_for`` already handles that case as a
no-op.

All 85 diffusion-relevant + 98 related backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in
``utils/datasets/llm_assist.py`` now also gates on the GGUF chat
backend (llama-server) AND the safetensors chat backend. Round 23
extended it to training + export but missed Chat, so a helper /
advisor GGUF could still race a loaded chat model for VRAM.
Both checks fail closed when status is unverifiable.

P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff
unloads so the diffusion release runs BEFORE the chat releases.
A wedged diffusion unload used to fire AFTER chat was already
gone, so the user lost both on a single failure. Drop chat last
so an earlier failure preserves it. Applied to
``/training/start`` (training.py), ``/export/load`` (export.py),
``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch
(routes/inference.py).

P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens
``model_path`` and ``gguf_variant`` via the shared
``_validate_logged_identifier`` helper, so control characters
and URL-form HF tokens can no longer log-line-smuggle.

P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and
``variant`` the same way.

P1 #9: ``/download-progress`` ``repo_id`` query parameter is
also hardened; the value flows into log lines deep inside
``_get_repo_size_cached`` on lookup failure.

P1 #11: ``CheckFormatRequest.dataset_name`` and
``AiAssistMappingRequest.{dataset_name, model_name}`` in
``models/datasets.py`` now apply the same control-char +
embedded-HF-token validators, matching every other public
request-body model.

All 115 diffusion + training-validation + cached_gguf + export
+ inference model-validation tests pass locally.

(P1 #6 native-path-lease enforcement for diffusion local paths
and P1 #12 React Compiler frontend lint deferred -- both need
focused design / frontend touchups separate from this batch.)
danielhanchen added a commit that referenced this pull request May 25, 2026
Two universal-consensus round-31 reviewer findings.

Concurrency: /images/load was leaking the public-load pending
counter on any pre-finally HTTPException (round 31 P1 #1, 11/12
votes). _raise_if_helper_advisor_busy("diffusion") published the
counter, then _resolve_diffusion_repo_for_request ran outside the
clearing try/finally. A request like repo_id="/tmp/model" with no
native_path_lease returned 400 and left public_load_pending() true
until process restart, permanently blocking AI Assist. Fix mirrors
the training / export pattern: track diffusion_load_window_published
in an outer try, publish the flag right after the helper-busy
check succeeds, and clear in an outer finally that only fires when
the flag is set. This also closes round 31 P1 #6: a second
request's failure can no longer decrement a still-active first
request's counter, because the second request has not yet flipped
its own publish flag.

Security: _looks_like_local_diffusion_path missed cwd-relative
directories (round 31 P1 #2, 8/12 votes). DiffusionBackend.
load_model accepts repo_id="exports/my-flux" as a local directory
via Path(repo_id).expanduser().is_dir(), but the detector only
flagged values starting with /, ~, ./, ../, backslash, or
absolute. Tightened the detector to also reject:
  * weight-file suffixes (.gguf / .safetensors / .bin / .pt / .pth)
  * non-2-segment values (`owner`, `a/b/c`, `owner/`, `/repo`, `//`)
  * 2-segment values whose parts are `.` or `..`
  * 2-segment values that actually resolve to an existing local
    path under backend CWD (last-resort exists() probe).
The existence probe is a minor side-channel for an already-
authenticated caller, accepted in exchange for closing the silent
bypass of the new lease boundary. Valid Hub ids like
unsloth/FLUX.2-klein-base-4B-GGUF, microsoft/Phi-3.5-mini-instruct
still pass through unchanged.

Skipped (consistent with prior rounds):
  * R31 P1 #3 (Tauri / native lease enum missing
    `load-diffusion-model` op): architectural surface; defer until
    the Images page actually surfaces a local-path picker.
  * R31 P1 #4-#5, #8: studio.txt / constraints.txt / pyproject hub
    pins. Live B200 install path with huggingface_hub==0.36.2,
    transformers==4.57.6, diffusers==0.37.1 imports
    Flux2KleinPipeline cleanly. The is_offline_mode import error
    only triggers when transformers 5.x is paired with hub 0.x,
    which the constraints pin prevents.
  * R31 P1 #7 (find_spec vs real import): a full transformers
    import at module load breaks tests that stub huggingface_hub;
    find_spec is the existing tradeoff.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor).
danielhanchen added a commit that referenced this pull request May 25, 2026
Two round-33 reviewer findings: hub-floor consistency and the
multipart upload filename validator gap.

Dependencies: reverted the round-26 huggingface_hub>=1.3.0 floor
in no-torch-runtime.txt and pyproject.toml (round 33 P1 #1-#5,
4/12 vote consensus). studio.txt forces huggingface_hub==0.36.2
to match the transformers==4.57.6 pin in extras-no-deps.txt, so
the 1.3.0 floor was internally inconsistent. Reviewers
reproduced the resolver conflict on a fresh install.

Empirical justification (re-verified on the live B200 host before
the revert): huggingface_hub 0.36.2 + transformers 4.57.6 +
diffusers 0.37.1 imports Flux2KleinPipeline cleanly and runs
end-to-end image generation. transformers 4.57.6 carries its own
transformers.utils.hub.is_offline_mode and does not actually need
huggingface_hub.is_offline_mode at import time. The original bump
was guarding against the (never-realised) transformers 5.x path,
which extras-no-deps explicitly pins away.

Validation: multipart /seed/upload-unstructured-file now applies
the same _no_control_chars and _reject_embedded_hf_token checks
to file.filename that SeedInspectUploadRequest.filename already
applies in the JSON variant (round 33 P1 #7). The filename is
reflected back to the client, persisted in the per-file meta
JSON, and echoed by error responses, so the JSON-side hardening
must not be asymmetric with the multipart path.

Skipped (consistent with prior rounds):
  * Find_spec vs full import (R33 P1 #6): preserves test
    compatibility with the huggingface_hub stub fixture.
  * React hooks set-state-in-effect lint (R33 P1 #8): codebase
    has 146 pre-existing violations of the same rule;
    studio-frontend-ci does not gate on lint.
  * Direct DiffusionBackend.load_model bypass (R33 P1 #9): the
    route is the only production entry point, and the backend
    helper now publishes its own diffusion-backend pending tag
    (round 32 P1 #3). Direct-caller hardening would require
    duplicating the lease check into load_model itself, which
    is out of scope for the route-layer security boundary.
  * One-segment Hub IDs (R33 P2 #10): strict 2-segment Hub id
    check is intentional; one-segment names are not valid Hub
    ids.
  * Cwd-relative shadow of Hub IDs (R33 P2 #11): documented
    side-channel tradeoff accepted in round 31 commit msg.

97 targeted backend tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Change default host to 127.0.0.1 instead of 0.0.0.0 / Remove installation auto start instance without notice

2 participants