Skip to content

[studio] Fix VLM detection for transformers v5#10

Closed
danielhanchen wants to merge 7 commits into
mainfrom
pr-4868-code
Closed

[studio] Fix VLM detection for transformers v5#10
danielhanchen wants to merge 7 commits into
mainfrom
pr-4868-code

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Staging mirror of unslothai#4868

Original PR: unslothai#4868
Author: Datta0

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.


Original description

Fixes: unslothai#4859


This PR contains code changes only (1 files). Test changes are in a separate PR.

Changed files:

  • studio/backend/utils/models/model_config.py

@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the VLM detection logic in is_vision_model to improve robustness and fallback mechanisms. It introduces _is_vlm_config for unified configuration checking and _load_model_config_metadata to fetch raw JSON metadata from local paths or the Hugging Face Hub when standard loading fails. Additionally, _is_vision_model_subprocess was updated to return None on failure, allowing the detection logic to proceed to metadata-based checks. Review feedback suggests explicitly setting UTF-8 encoding when reading configuration files and making certain warning logs conditional to avoid noise when checking models that require newer transformer versions.

if is_local_path(model_name):
config_path = Path(normalize_path(model_name)) / "config.json"
if config_path.is_file():
return json.loads(config_path.read_text())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

It is safer to specify the encoding when reading the configuration file to ensure consistent behavior across different platforms.

Suggested change
return json.loads(config_path.read_text())
return json.loads(config_path.read_text(encoding = "utf-8"))

filename = "config.json",
**download_kwargs,
)
return json.loads(Path(config_path).read_text())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

It is safer to specify the encoding when reading the configuration file to ensure consistent behavior across different platforms.

Suggested change
return json.loads(Path(config_path).read_text())
return json.loads(Path(config_path).read_text(encoding = "utf-8"))

Comment on lines +704 to +707
except Exception as e:
logger.warning(
f"Could not determine if {model_name} is vision model in-process: {e}"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This warning will be logged every time a model that requires transformers v5 is checked, because load_model_config is expected to fail in the main process for those models. Consider lowering the log level to debug or making it conditional on not needs_t5 to avoid cluttering the logs with expected failures.

Suggested change
except Exception as e:
logger.warning(
f"Could not determine if {model_name} is vision model in-process: {e}"
)
except Exception as e:
if not needs_t5:
logger.warning(
f"Could not determine if {model_name} is vision model in-process: {e}"
)

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Keep transformers-5 models out of the main process (avoids
  trust_remote_code execution in the backend)
- Add audio-only model filter to subprocess _VISION_CHECK_SCRIPT
  (csm/whisper were not excluded from ForConditionalGeneration match)
- Add allow_arch_suffix param to _is_vlm_config so the raw config.json
  fallback does not misclassify text-only seq2seq models (T5, BART, etc.)
  as VLMs based solely on ForConditionalGeneration suffix
- Fix dict[str, Any] type hints to Dict[str, Any] for Python 3.8+ compat
- Add expanduser() for local model paths in _load_model_config_metadata
- Add encoding="utf-8" to read_text() calls for cross-platform safety

@danielhanchen danielhanchen left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The goal of this PR is to fix VLM detection for newer models (Qwen3.5, Gemma4) that require transformers v5. As a summary, this PR refactors is_vision_model() to try in-process config loading first, then subprocess, then raw config.json metadata fallback, and also extracts VLM detection logic into a shared _is_vlm_config() helper.

Consolidated review from 10 independent reviewers (7 from reviewer.py, 3 subagents):

Reviewers Severity Finding
7/10 P1 is_vision_model() calls load_model_config(..., trust_remote_code=True) in-process for t5 models before the subprocess guard, defeating subprocess isolation
5/10 P2 ForConditionalGeneration suffix in raw config fallback is too broad -- matches T5/BART/Pegasus seq2seq models as VLMs
5/10 P2 Non-t5 models that fail load_model_config fall through to _load_model_config_metadata unconditionally (behavior change)
3/10 P2 dict[str, Any] type hint requires Python 3.9+ (should use Dict[str, Any] from typing)
3/10 P2 Subprocess _VISION_CHECK_SCRIPT missing audio-only model type filter for csm/whisper
2/10 P3 _load_model_config_metadata missing expanduser() for ~/ local paths
2/10 P3 Missing encoding="utf-8" on read_text() calls for cross-platform safety
1/10 P3 _load_model_config_metadata bypasses resolve_cached_repo_id_case()

All findings above have been addressed in commit fbc5bf5. Key fixes:

  1. Restored subprocess-first flow for t5 models -- load_model_config is no longer called in the main process for transformers-5 models
  2. Added allow_arch_suffix param to _is_vlm_config() -- raw config.json fallback now uses allow_arch_suffix=False so text-only seq2seq models are not misclassified
  3. Added audio-only model type filter to _VISION_CHECK_SCRIPT
  4. Fixed dict[str, Any] to Dict[str, Any] for Python 3.8+ compatibility
  5. Added .expanduser() and encoding="utf-8" to _load_model_config_metadata

Concrete suggestions for each finding are in the commit diff.

@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the vision model detection logic to improve robustness, particularly for transformers 5.x models, by introducing subprocess checks and raw configuration metadata parsing as fallbacks. The reviewer provided several suggestions to simplify conditional logic, improve readability, and modernize type hinting and logging practices.

Comment on lines +525 to +545
if model_type in audio_only_types:
is_vlm = False
else:
is_vlm = False
archs = getattr(config, "architectures", None)
if archs:
is_vlm = any(
x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
for x in archs
)
if not is_vlm and hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and model_type:
vlm_types = {"phi3_v","llava","llava_next","llava_onevision",
"internvl_chat","cogvlm2","minicpmv"}
if model_type in vlm_types:
is_vlm = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The logic for determining is_vlm can be simplified. You can initialize is_vlm to False once and then run the checks only if the model is not an audio-only type. This avoids redundant assignments and a nested else block.

Additionally, for better readability and adherence to PEP 8, I've added a space after each comma in the vlm_types set definition.

Suggested change
if model_type in audio_only_types:
is_vlm = False
else:
is_vlm = False
archs = getattr(config, "architectures", None)
if archs:
is_vlm = any(
x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
for x in archs
)
if not is_vlm and hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and model_type:
vlm_types = {"phi3_v","llava","llava_next","llava_onevision",
"internvl_chat","cogvlm2","minicpmv"}
if model_type in vlm_types:
is_vlm = True
is_vlm = False
if model_type not in audio_only_types:
archs = getattr(config, "architectures", None)
if archs:
is_vlm = any(
x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
for x in archs
)
if not is_vlm and hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and model_type:
vlm_types = {"phi3_v", "llava", "llava_next", "llava_onevision",
"internvl_chat", "cogvlm2", "minicpmv"}
if model_type in vlm_types:
is_vlm = True

Comment on lines +621 to +622
def _is_vlm_config(config, allow_arch_suffix = True):
# type: (Any, bool) -> bool

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

For better readability and consistency with modern Python type hinting, please use standard type annotations in the function signature instead of a type comment.

Suggested change
def _is_vlm_config(config, allow_arch_suffix = True):
# type: (Any, bool) -> bool
def _is_vlm_config(config: Any, allow_arch_suffix: bool = True) -> bool:

if model_type in _AUDIO_ONLY_MODEL_TYPES:
return False

if has_vision_config or has_img_processor or has_image_token_index:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

For improved readability, you can wrap this long conditional in parentheses and place each condition on its own line.

Suggested change
if has_vision_config or has_img_processor or has_image_token_index:
if (
has_vision_config or has_img_processor or has_image_token_index
):


from huggingface_hub import hf_hub_download

download_kwargs = {} # type: Dict[str, Any]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

For consistency with modern Python type hinting, please use a variable annotation instead of a type comment.

Suggested change
download_kwargs = {} # type: Dict[str, Any]
download_kwargs: Dict[str, Any] = {}

Comment on lines +710 to 721
if config_data is not None:
if _is_vlm_config(config_data, allow_arch_suffix = False):
logger.info(
f"Model {model_name} detected as VLM: architecture {config.architectures}"
"Model %s detected as VLM from raw config metadata: "
"model_type=%s architectures=%s",
model_name,
config_data.get("model_type"),
config_data.get("architectures", []),
)
return True

# Check 2: Has vision_config (most VLMs: LLaVA, Gemma-3, Qwen2-VL, etc.)
if hasattr(config, "vision_config"):
logger.info(f"Model {model_name} detected as VLM: has vision_config")
return True

# Check 3: Has img_processor (Phi-3.5 Vision uses this instead of vision_config)
if hasattr(config, "img_processor"):
logger.info(f"Model {model_name} detected as VLM: has img_processor")
return True

# Check 4: Has image_token_index (common in VLMs for image placeholder tokens)
if hasattr(config, "image_token_index"):
logger.info(f"Model {model_name} detected as VLM: has image_token_index")
return True

# Check 5: Known VLM model_type values that may not match above checks
if hasattr(config, "model_type"):
if config.model_type in _VLM_MODEL_TYPES:
logger.info(
f"Model {model_name} detected as VLM: model_type={config.model_type}"
)
return True

return False
return False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This block can be simplified. The return False on line 721 is redundant. You can un-indent the return False from line 720 to cover both cases where config_data is None or _is_vlm_config returns False.

Also, there's an inconsistency in the logging style. This new log message uses %-style formatting, while other parts of the file use f-strings. While structlog handles both, it's best to be consistent. Using keyword arguments or %-style formatting is often preferred with structlog for better structured data.

Suggested change
if config_data is not None:
if _is_vlm_config(config_data, allow_arch_suffix = False):
logger.info(
f"Model {model_name} detected as VLM: architecture {config.architectures}"
"Model %s detected as VLM from raw config metadata: "
"model_type=%s architectures=%s",
model_name,
config_data.get("model_type"),
config_data.get("architectures", []),
)
return True
# Check 2: Has vision_config (most VLMs: LLaVA, Gemma-3, Qwen2-VL, etc.)
if hasattr(config, "vision_config"):
logger.info(f"Model {model_name} detected as VLM: has vision_config")
return True
# Check 3: Has img_processor (Phi-3.5 Vision uses this instead of vision_config)
if hasattr(config, "img_processor"):
logger.info(f"Model {model_name} detected as VLM: has img_processor")
return True
# Check 4: Has image_token_index (common in VLMs for image placeholder tokens)
if hasattr(config, "image_token_index"):
logger.info(f"Model {model_name} detected as VLM: has image_token_index")
return True
# Check 5: Known VLM model_type values that may not match above checks
if hasattr(config, "model_type"):
if config.model_type in _VLM_MODEL_TYPES:
logger.info(
f"Model {model_name} detected as VLM: model_type={config.model_type}"
)
return True
return False
return False
if config_data is not None:
if _is_vlm_config(config_data, allow_arch_suffix = False):
logger.info(
"Model %s detected as VLM from raw config metadata: "
"model_type=%s architectures=%s",
model_name,
config_data.get("model_type"),
config_data.get("architectures", []),
)
return True
return False

The allow_arch_suffix=False restriction was overly conservative and
would miss VLMs like Qwen3-Omni that only advertise multimodality via
architectures (ForConditionalGeneration) without vision_config.

The audio-only filter (_AUDIO_ONLY_MODEL_TYPES) already handles the
main false-positive case (whisper/csm). T5/BART models do not reach
the fallback paths because:
- t5 fallback: T5/BART do not need transformers 5
- non-t5 fallback: T5/BART succeed in load_model_config
@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances VLM detection by implementing a fallback mechanism that reads raw config.json metadata when standard loading or subprocesses fail. It refactors detection logic into reusable helper functions and updates subprocess handling to support optional return values for better error management. The review feedback suggests simplifying the detection logic to avoid duplication, adopting modern type hints for consistency, and streamlining nested conditional statements to improve code readability.

Comment on lines +522 to 546
# Exclude audio-only models that share ForConditionalGeneration suffix
audio_only_types = {"csm", "whisper"}
model_type = getattr(config, "model_type", None)
if model_type in audio_only_types:
is_vlm = False
else:
is_vlm = False
archs = getattr(config, "architectures", None)
if archs:
is_vlm = any(
x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
for x in archs
)
if not is_vlm and hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and model_type:
vlm_types = {"phi3_v","llava","llava_next","llava_onevision",
"internvl_chat","cogvlm2","minicpmv"}
if model_type in vlm_types:
is_vlm = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The logic for VLM detection in the subprocess script can be simplified by initializing is_vlm = False at the start and using a more direct conditional structure. Additionally, the hardcoded sets for audio_only_types and vlm_types are duplicates of the global constants _AUDIO_ONLY_MODEL_TYPES and _VLM_MODEL_TYPES. While injecting them into the raw string might be complex, simplifying the logic improves readability.

    # Exclude audio-only models that share ForConditionalGeneration suffix
    is_vlm = False
    audio_only_types = {"csm", "whisper"}
    model_type = getattr(config, "model_type", None)

    if model_type not in audio_only_types:
        archs = getattr(config, "architectures", None)
        if archs:
            is_vlm = any(
                x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
                for x in archs
            )
        if not is_vlm and hasattr(config, "vision_config"):
            is_vlm = True
        if not is_vlm and hasattr(config, "img_processor"):
            is_vlm = True
        if not is_vlm and hasattr(config, "image_token_index"):
            is_vlm = True
        if not is_vlm and model_type:
            vlm_types = {"phi3_v", "llava", "llava_next", "llava_onevision",
                         "internvl_chat", "cogvlm2", "minicpmv"}
            if model_type in vlm_types:
                is_vlm = True

Comment on lines +621 to +622
def _is_vlm_config(config):
# type: (Any) -> bool

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Consider using modern type hints in the function signature instead of the # type: comment for better consistency with the rest of the codebase.

Suggested change
def _is_vlm_config(config):
# type: (Any) -> bool
def _is_vlm_config(config: Any) -> bool:

Comment on lines +711 to 722
if config_data is not None:
if _is_vlm_config(config_data):
logger.info(
f"Model {model_name} detected as VLM: architecture {config.architectures}"
"Model %s detected as VLM from raw config metadata: "
"model_type=%s architectures=%s",
model_name,
config_data.get("model_type"),
config_data.get("architectures", []),
)
return True

# Check 2: Has vision_config (most VLMs: LLaVA, Gemma-3, Qwen2-VL, etc.)
if hasattr(config, "vision_config"):
logger.info(f"Model {model_name} detected as VLM: has vision_config")
return True

# Check 3: Has img_processor (Phi-3.5 Vision uses this instead of vision_config)
if hasattr(config, "img_processor"):
logger.info(f"Model {model_name} detected as VLM: has img_processor")
return True

# Check 4: Has image_token_index (common in VLMs for image placeholder tokens)
if hasattr(config, "image_token_index"):
logger.info(f"Model {model_name} detected as VLM: has image_token_index")
return True

# Check 5: Known VLM model_type values that may not match above checks
if hasattr(config, "model_type"):
if config.model_type in _VLM_MODEL_TYPES:
logger.info(
f"Model {model_name} detected as VLM: model_type={config.model_type}"
)
return True

return False
return False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The nested if statements and multiple return False calls can be simplified to improve readability and reduce redundancy.

Suggested change
if config_data is not None:
if _is_vlm_config(config_data):
logger.info(
f"Model {model_name} detected as VLM: architecture {config.architectures}"
"Model %s detected as VLM from raw config metadata: "
"model_type=%s architectures=%s",
model_name,
config_data.get("model_type"),
config_data.get("architectures", []),
)
return True
# Check 2: Has vision_config (most VLMs: LLaVA, Gemma-3, Qwen2-VL, etc.)
if hasattr(config, "vision_config"):
logger.info(f"Model {model_name} detected as VLM: has vision_config")
return True
# Check 3: Has img_processor (Phi-3.5 Vision uses this instead of vision_config)
if hasattr(config, "img_processor"):
logger.info(f"Model {model_name} detected as VLM: has img_processor")
return True
# Check 4: Has image_token_index (common in VLMs for image placeholder tokens)
if hasattr(config, "image_token_index"):
logger.info(f"Model {model_name} detected as VLM: has image_token_index")
return True
# Check 5: Known VLM model_type values that may not match above checks
if hasattr(config, "model_type"):
if config.model_type in _VLM_MODEL_TYPES:
logger.info(
f"Model {model_name} detected as VLM: model_type={config.model_type}"
)
return True
return False
return False
config_data = _load_model_config_metadata(model_name, hf_token = hf_token)
if config_data is not None and _is_vlm_config(config_data):
logger.info(
"Model %s detected as VLM from raw config metadata: "
"model_type=%s architectures=%s",
model_name,
config_data.get("model_type"),
config_data.get("architectures", []),
)
return True
return False

Fixes cache-casing regression: the raw config fallback now uses
resolve_cached_repo_id_case() before calling hf_hub_download(), matching
the pattern established in unslothai#4822 to avoid duplicate cache entries when
the same model is referenced with different casing.
@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the VLM detection logic in model_config.py to improve robustness, particularly for models requiring Transformers 5.x. It introduces a fallback mechanism that attempts to load and inspect raw config.json metadata if subprocess-based detection fails or times out. Additionally, it centralizes VLM identification logic into a new _is_vlm_config helper and ensures audio-only models are correctly excluded. Reviewer feedback focuses on enhancing code readability and conciseness by simplifying conditional structures, reducing nesting, and utilizing more direct return statements in the detection paths.

Comment on lines +522 to +545
# Exclude audio-only models that share ForConditionalGeneration suffix
audio_only_types = {"csm", "whisper"}
model_type = getattr(config, "model_type", None)
if model_type in audio_only_types:
is_vlm = False
else:
is_vlm = False
archs = getattr(config, "architectures", None)
if archs:
is_vlm = any(
x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
for x in archs
)
if not is_vlm and hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and model_type:
vlm_types = {"phi3_v","llava","llava_next","llava_onevision",
"internvl_chat","cogvlm2","minicpmv"}
if model_type in vlm_types:
is_vlm = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

For better readability and to avoid redundant assignments, this logic can be simplified. You can initialize is_vlm to False and then perform the checks only if the model is not an audio-only type. This avoids the else block and the repeated is_vlm = False.

    # Exclude audio-only models that share ForConditionalGeneration suffix
    is_vlm = False
    model_type = getattr(config, "model_type", None)
    audio_only_types = {"csm", "whisper"}
    if model_type not in audio_only_types:
        archs = getattr(config, "architectures", None)
        if archs:
            is_vlm = any(
                x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
                for x in archs
            )
        if not is_vlm and hasattr(config, "vision_config"):
            is_vlm = True
        if not is_vlm and hasattr(config, "img_processor"):
            is_vlm = True
        if not is_vlm and hasattr(config, "image_token_index"):
            is_vlm = True
        if not is_vlm and model_type:
            vlm_types = {
                "phi3_v", "llava", "llava_next", "llava_onevision",
                "internvl_chat", "cogvlm2", "minicpmv",
            }
            if model_type in vlm_types:
                is_vlm = True

Comment on lines +639 to +647
if architectures:
if any(x.endswith(_VLM_ARCH_SUFFIXES) for x in architectures):
return True

if has_vision_config or has_img_processor or has_image_token_index:
return True
if model_type in _VLM_MODEL_TYPES:
return True
return False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This series of if statements can be made more concise. You can combine the checks into fewer statements for better readability.

Suggested change
if architectures:
if any(x.endswith(_VLM_ARCH_SUFFIXES) for x in architectures):
return True
if has_vision_config or has_img_processor or has_image_token_index:
return True
if model_type in _VLM_MODEL_TYPES:
return True
return False
if architectures and any(x.endswith(_VLM_ARCH_SUFFIXES) for x in architectures):
return True
return (
has_vision_config
or has_img_processor
or has_image_token_index
or model_type in _VLM_MODEL_TYPES
)

Comment on lines +711 to 723
config_data = _load_model_config_metadata(model_name, hf_token = hf_token)
if config_data is not None:
if _is_vlm_config(config_data):
logger.info(
f"Model {model_name} detected as VLM: architecture {config.architectures}"
"Model %s detected as VLM from raw config metadata: "
"model_type=%s architectures=%s",
model_name,
config_data.get("model_type"),
config_data.get("architectures", []),
)
return True

# Check 2: Has vision_config (most VLMs: LLaVA, Gemma-3, Qwen2-VL, etc.)
if hasattr(config, "vision_config"):
logger.info(f"Model {model_name} detected as VLM: has vision_config")
return True

# Check 3: Has img_processor (Phi-3.5 Vision uses this instead of vision_config)
if hasattr(config, "img_processor"):
logger.info(f"Model {model_name} detected as VLM: has img_processor")
return True

# Check 4: Has image_token_index (common in VLMs for image placeholder tokens)
if hasattr(config, "image_token_index"):
logger.info(f"Model {model_name} detected as VLM: has image_token_index")
return True

# Check 5: Known VLM model_type values that may not match above checks
if hasattr(config, "model_type"):
if config.model_type in _VLM_MODEL_TYPES:
logger.info(
f"Model {model_name} detected as VLM: model_type={config.model_type}"
)
return True

return False
return False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This block can be simplified to reduce nesting and improve readability. You can check for config_data being None upfront.

Suggested change
config_data = _load_model_config_metadata(model_name, hf_token = hf_token)
if config_data is not None:
if _is_vlm_config(config_data):
logger.info(
f"Model {model_name} detected as VLM: architecture {config.architectures}"
"Model %s detected as VLM from raw config metadata: "
"model_type=%s architectures=%s",
model_name,
config_data.get("model_type"),
config_data.get("architectures", []),
)
return True
# Check 2: Has vision_config (most VLMs: LLaVA, Gemma-3, Qwen2-VL, etc.)
if hasattr(config, "vision_config"):
logger.info(f"Model {model_name} detected as VLM: has vision_config")
return True
# Check 3: Has img_processor (Phi-3.5 Vision uses this instead of vision_config)
if hasattr(config, "img_processor"):
logger.info(f"Model {model_name} detected as VLM: has img_processor")
return True
# Check 4: Has image_token_index (common in VLMs for image placeholder tokens)
if hasattr(config, "image_token_index"):
logger.info(f"Model {model_name} detected as VLM: has image_token_index")
return True
# Check 5: Known VLM model_type values that may not match above checks
if hasattr(config, "model_type"):
if config.model_type in _VLM_MODEL_TYPES:
logger.info(
f"Model {model_name} detected as VLM: model_type={config.model_type}"
)
return True
return False
return False
# Subprocess failed or timed out -- fall back to raw config.json
config_data = _load_model_config_metadata(model_name, hf_token = hf_token)
if config_data is None:
return False
is_vlm = _is_vlm_config(config_data)
if is_vlm:
logger.info(
"Model %s detected as VLM from raw config metadata: "
"model_type=%s architectures=%s",
model_name,
config_data.get("model_type"),
config_data.get("architectures", []),
)
return is_vlm

Comment on lines +737 to +740
config_data = _load_model_config_metadata(model_name, hf_token = hf_token)
if config_data is not None:
return _is_vlm_config(config_data)
return False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This can be written more concisely on a single line.

    config_data = _load_model_config_metadata(model_name, hf_token = hf_token)
    return config_data is not None and _is_vlm_config(config_data)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6bae97f5d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return False

if architectures:
if any(x.endswith(_VLM_ARCH_SUFFIXES) for x in architectures):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate architecture types before suffix matching

The new _is_vlm_config helper assumes every entry in architectures is a string and unconditionally calls x.endswith(...). This function is now used on raw config.json data in the new fallback paths (is_vision_model after subprocess/in-process failures), so malformed or custom configs with non-string architectures entries will raise AttributeError and can bubble out instead of failing closed to False. Please normalize/guard architectures values before suffix checks so the fallback stays robust.

Useful? React with 👍 / 👎.

Adds isinstance(x, str) guard before calling x.endswith() on
architectures entries, so malformed config.json with non-string
entries (None, dict, etc.) does not raise AttributeError.
danielhanchen added a commit that referenced this pull request May 25, 2026
…for PR unslothai#5754

Round 7 reviewer surfaced a handful of swap-window races, fail-open
guards, and seed precision mismatches. This commit closes them.

Lifecycle / state (P1)
  * core/inference/diffusion.py: status() now emits active_repo_id,
    active_base_repo, pending_repo_id, pending_base_repo, and
    pending_gguf_filename alongside the existing UI-facing fields.
    During a swap (model A loaded, model B loading) the previous
    coalesced 'repo_id or pending_repo_id' hid the loading target
    from delete guards. Splitting the fields lets guards block
    deletion of either repo currently owned by the backend.
  * core/inference/diffusion.py: generate_image() now takes
    _generate_lock BEFORE snapshotting _pipe / _device. Snapshotting
    outside the lock let a concurrent unload/load clear or replace
    the backend between the snapshot and the forward, so the freed
    or swapped pipeline would still run.

Symmetric handoffs (P1)
  * routes/export.py: training-active check now runs BEFORE the
    chat / inference / diffusion unload helpers, so a 409 does not
    leave the user's chat session torn down for nothing. Also
    explicitly fails CLOSED with 503 when is_training_active()
    raises.
  * routes/inference.py: _raise_if_training_active now fails closed
    with 503 when the training backend is importable but its status
    check raises. The previous best-effort log-and-continue could
    let chat / diffusion loads collide with unverifiable training.

Delete guards (P1)
  * routes/models.py /delete-cached: chat guard now also blocks
    when llama-server is_active (i.e. mid-download) and when the
    inference backend's loading_models set contains the target.
    Round 7 review #7 flagged that the PR's diffusion-side loading
    guard had no chat-side parallel, so deleting a chat repo while
    it was downloading could still race the cache.
  * routes/models.py /delete-cached: diffusion guard iterates the
    new active_* + pending_* status fields so a delete during a
    swap is refused on either repo.
  * routes/models.py /delete-finetuned: same active_+ pending
    handling, plus the guard now also refuses deletes of a parent
    directory that contains the loaded pipeline (round 7 review #6:
    rm -rf /exports/flux-model/ could unlink model_index.json that
    the live pipeline is reading via mmap).

Seed precision (P2)
  * models/inference.py + routes/inference.py: DiffusionGenerate-
    Response now carries seed_str alongside the existing numeric
    seed. Seeds above Number.MAX_SAFE_INTEGER are rounded by
    JSON.parse in the browser; seed_str ships full decimal
    precision for display and reproduction.
  * frontend/api.ts: DiffusionGenerateResponse types seed_str;
    images-page.tsx prefers seed_str over seed in the figure
    caption so the displayed value reproduces the image.
  * frontend/api.ts: stringifyWithBigInt no longer regex-replaces
    sentinel strings over the full JSON output. It pulls the seed
    BigInt out, JSON-serialises the remaining payload, and splices
    the seed's decimal digits into the resulting object literal at
    the known position. Avoids the round 7 #10 case where a
    user-supplied prompt equal to '__bigint__:123' was rewritten
    into a JSON integer and rejected as a non-string prompt.

Custom HF repo (P2)
  * frontend/images-page.tsx: custom panel now exposes a 'Base
    diffusers repo' input that maps to DiffusionLoadRequest.
    base_repo. Required when a private / mirrored GGUF needs a
    non-default base (e.g. a 9B Klein transformer would otherwise
    fall back to the 4B base default).
danielhanchen added a commit that referenced this pull request May 25, 2026
… MPS + base namespace for PR unslothai#5754

Round 12 reviewer findings.

Backend correctness (P1)
  * core/inference/diffusion.py load_model: GGUF branch now
    handles an absolute local directory passed as repo_id by
    joining Path(repo_id) / gguf_filename directly instead of
    handing the path to hf_hub_download (which raises
    HFValidationError because the path is not 'namespace/repo').
    Closes round 12 review #1 -- the load request advertised
    'local path' support but actually only worked for Hub repo ids.

Delete guard precision (P1)
  * routes/models.py /delete-finetuned + /delete-cached:
    diffusion guard now consults gguf_filename from status()
    and ALLOWS per-variant deletes that target a different quant
    than the one the loaded pipeline is reading. Loading
    'Q4_K_S' no longer blocks deleting 'Q8_0' from the same
    repo / export directory (round 12 reviews #3 and #4).

Accelerator (P2)
  * core/inference/diffusion.py _drain_cuda_cache: also calls
    torch.mps.empty_cache() when the MPS backend is the
    active accelerator. Apple Silicon swaps now actually return
    held VRAM instead of leaving it pinned in the Metal
    allocator (round 12 review #10).

Smart base repo (P2)
  * core/inference/diffusion.py _smart_base_repo: only inspects
    the LAST segment of the repo id / path for the 'base' / '9b'
    tokens. A namespace like baseorg/FLUX.2-klein-4B-GGUF or
    a parent directory like /home/me/.cache/base/... no
    longer falsely selects the Base variant (round 12 review #9).
danielhanchen added a commit that referenced this pull request May 25, 2026
P1: route-layer chat/diffusion/export releases were still
asymmetric. Training start and export load called
``diff_backend.unload_model`` inside a best-effort try/except so a
wedged diffusion backend let the next workload allocate over the
top of the resident pipeline and OOM. Both now use the strict
``_release_diffusion_for`` helper from routes.inference, which
raises HTTPException 503 on status/unload failure or post-check
mismatch.

P2 #9: diffusion load exceptions can include the absolute local
repo / base / gguf path verbatim (FileNotFoundError, OSError from
diffusers / safetensors). The path flows into ``_last_error``,
which ``status()`` returns to every authenticated session. Collapse
the known repo_id / effective_base / gguf_filename paths to their
leaf name before storing the error, mirroring the
``_display_repo_id`` convention used for the public repo label.

P2 #10: when ``repo_id`` is an absolute local path,
``detect_family`` matched _FAMILY_EXCLUDE deny lists against the
full path, so models stored under a parent directory containing
``qwen-image-edit`` or ``3.5`` were misclassified as None. Reduce
the family-detection needle to the leaf directory when the input
looks like a filesystem path; Hub-style ``owner/repo`` ids
continue to use the original needle so existing detection rules
keep working.

P2 #12: ``gguf_filename`` was missing from the
``_reject_embedded_hf_token`` validator. A URL-form quant path
like ``https://hf_xxxxx@huggingface.co/.../flux.gguf`` would be
stored on ``DiffusionBackend._gguf_filename`` and surface in
status() / log lines. Extend the validator to gguf_filename so the
token is dropped before it can leak.

All 85 diffusion-relevant backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in
``utils/datasets/llm_assist.py`` now also gates on the GGUF chat
backend (llama-server) AND the safetensors chat backend. Round 23
extended it to training + export but missed Chat, so a helper /
advisor GGUF could still race a loaded chat model for VRAM.
Both checks fail closed when status is unverifiable.

P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff
unloads so the diffusion release runs BEFORE the chat releases.
A wedged diffusion unload used to fire AFTER chat was already
gone, so the user lost both on a single failure. Drop chat last
so an earlier failure preserves it. Applied to
``/training/start`` (training.py), ``/export/load`` (export.py),
``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch
(routes/inference.py).

P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens
``model_path`` and ``gguf_variant`` via the shared
``_validate_logged_identifier`` helper, so control characters
and URL-form HF tokens can no longer log-line-smuggle.

P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and
``variant`` the same way.

P1 #9: ``/download-progress`` ``repo_id`` query parameter is
also hardened; the value flows into log lines deep inside
``_get_repo_size_cached`` on lookup failure.

P1 #11: ``CheckFormatRequest.dataset_name`` and
``AiAssistMappingRequest.{dataset_name, model_name}`` in
``models/datasets.py`` now apply the same control-char +
embedded-HF-token validators, matching every other public
request-body model.

All 115 diffusion + training-validation + cached_gguf + export
+ inference model-validation tests pass locally.

(P1 #6 native-path-lease enforcement for diffusion local paths
and P1 #12 React Compiler frontend lint deferred -- both need
focused design / frontend touchups separate from this batch.)
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in
``utils/datasets/llm_assist.py`` now also gates on the GGUF chat
backend (llama-server) AND the safetensors chat backend. Round 23
extended it to training + export but missed Chat, so a helper /
advisor GGUF could still race a loaded chat model for VRAM.
Both checks fail closed when status is unverifiable.

P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff
unloads so the diffusion release runs BEFORE the chat releases.
A wedged diffusion unload used to fire AFTER chat was already
gone, so the user lost both on a single failure. Drop chat last
so an earlier failure preserves it. Applied to
``/training/start`` (training.py), ``/export/load`` (export.py),
``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch
(routes/inference.py).

P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens
``model_path`` and ``gguf_variant`` via the shared
``_validate_logged_identifier`` helper, so control characters
and URL-form HF tokens can no longer log-line-smuggle.

P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and
``variant`` the same way.

P1 #9: ``/download-progress`` ``repo_id`` query parameter is
also hardened; the value flows into log lines deep inside
``_get_repo_size_cached`` on lookup failure.

P1 #11: ``CheckFormatRequest.dataset_name`` and
``AiAssistMappingRequest.{dataset_name, model_name}`` in
``models/datasets.py`` now apply the same control-char +
embedded-HF-token validators, matching every other public
request-body model.

All 115 diffusion + training-validation + cached_gguf + export
+ inference model-validation tests pass locally.

(P1 #6 native-path-lease enforcement for diffusion local paths
and P1 #12 React Compiler frontend lint deferred -- both need
focused design / frontend touchups separate from this batch.)
danielhanchen added a commit that referenced this pull request May 25, 2026
Twelve P1 findings from round 26 reviewer aggregate, plus the CI
revert of round 25 P1 #5 to a less invasive location.

1. requirements/studio.txt + requirements/single-env/constraints.txt:
   revert the round 25 huggingface-hub bump (broke Studio Update CI,
   Mac Studio Update CI, Mac Studio UI CI, Studio UI CI all with
   ResolutionImpossible against transformers==4.57.6 which requires
   hub<1.0). Standard install path stays on the well-tested 4.57.6 +
   0.36.2 + trl 0.23.1 trio.

2. requirements/no-torch-runtime.txt + pyproject.toml
   [huggingfacenotorch]: bump huggingface_hub floor from >=0.34.0 to
   >=1.3.0,<2.0 -- this is where the actual transformers 5.x +
   hub 0.36.2 broken combo can land because the file installs
   --no-deps. transformers 5.x calls hub.is_offline_mode which only
   exists in hub 1.x.

3. utils/datasets/llm_assist.py: revert round 25 P1 #4 (helper/advisor
   sharing the global llama backend) which introduced three
   regressions: a chat-evict load race after the busy precheck, a
   finally-block that could unload a user chat model, and an
   identifier mismatch the delete guard could not canonicalize. Go
   back to PRIVATE LlamaCppBackend instances and expose the active
   helper/advisor repos through a new thread-safe registry
   (helper_advisor_owns_repo / _register_helper_advisor_repo /
   _unregister_helper_advisor_repo) so DELETE /api/models/delete-cached
   can still block the rmtree.

4. routes/models.py delete_cached_model: check the new helper/advisor
   registry up front and 409 if a helper/advisor still owns the
   target repo. Closes round 26 P1 #13 and #14 (helper/advisor
   identifiers were prefixed and would never equal the raw repo id).

5. routes/models.py get_lora_base_model: validate lora_path with
   _validate_logged_identifier before it is reflected in 404 detail
   and error logs (round 26 P1 #12).

6. routes/inference.py /unload: round 21 P1 #3 added a "or not
   is_loaded" fallback that let an unload of owner/B cancel a pending
   llama load of owner/A. Replace it with a narrow
   llama_is_starting_without_identifier branch that only fires when
   llama-server is mid-startup with neither identifier set (round 26
   P1 #5).

7. routes/inference.py /unload: poll loading_model_identifier for up
   to 5 s after asyncio.to_thread(unload_model) so a legitimate
   pending-load cancel does not 503 because the load thread has not
   yet observed _cancel_event in its finally (round 26 P2 #15).

8. models/training.py TrainingStartRequest: extend identifier
   hardening to hf_dataset, subset, train_split, eval_split. Round 22
   only guarded model_name (round 26 P1 #10).

9. models/data_recipe.py SeedInspectRequest: add _no_control_chars +
   _reject_embedded_hf_token field_validators on dataset_name (round
   26 P1 #11).

Tests: 105 targeted (diffusion + cached_gguf + llama_cpp_cache +
inference_model_validation + models_get_model_config) and 1768
broader backend tests pass locally. Pre-existing
test_desktop_auth.py, test_studio_api.py, and
test_training_worker_flash_attn.py failures reproduce on HEAD
without these changes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Four actionable findings from round 30. Skipped P1 #1 / #2 / #3
(huggingface-hub bump in studio.txt / single-env / colab-new) because
the live B200 Studio that successfully generated FLUX.2 klein images
runs the exact combo the reviewer flags as broken:
    huggingface_hub 0.36.2 + transformers 4.57.6 + diffusers 0.37.1
    Flux2KleinPipeline: True (imports cleanly)
The is_offline_mode ImportError only fires with transformers 5.x, and
the standard install path pins transformers==4.57.6 via constraints.
The round 26 fix bumped no-torch-runtime.txt + pyproject huggingfacenotorch
where the --no-deps install path can land on transformers 5.x; that
remains the correct surface.

1. core/inference/diffusion.py: preflight transformers + accelerate
   via importlib.util.find_spec BEFORE any destructive GPU-owner
   unload. Diffusers can expose stub pipeline classes when
   transformers / accelerate are missing, so the load used to drop
   chat first and fail later inside from_pretrained. find_spec
   keeps existing tests that stub these modules passing because no
   real module is executed (round 30 P1 #11).

2. models/export.py ExportGGUFRequest.quantization_method: extend
   the embedded HF token validator to this field too. Round 23
   added the control-char guard but not the token guard; the value
   is forwarded into worker command lines and reflected in error /
   success text (round 30 P1 #5).

3. models/data_recipe.py SeedInspectUploadRequest: add
   _no_control_chars + _reject_embedded_hf_token field_validators
   to filename and to each entry of file_names. Mirrors the sibling
   SeedInspectRequest.dataset_name hardening (round 30 P1 #6).

4. frontend/src/features/images/images-page.tsx: defer the initial
   refreshStatus() call via queueMicrotask so the synchronous
   setRefreshingStatus(true) inside it does not trip the
   react-hooks/set-state-in-effect lint on mount (round 30 P2 #12).

Deferred (need larger surgery / out of scope for this round):
   P1 #4 native_path_lease for diffusion local-path loads
   P1 #7-#10 helper/advisor + public-start window mutual lock symmetry

Tests: 98 targeted (diffusion + cached_gguf + inference_validation)
pass locally; frontend npm run typecheck passes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Addresses remaining round-30 reviewer findings against PR unslothai#5754
(diffusion image generation in Unsloth Studio). The studio.txt /
constraints.txt / colab-new hub-bump items (round 30 #1-#3) are
intentionally skipped: the live B200 Studio install path with
huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1
imports Flux2KleinPipeline cleanly and runs end-to-end image
generation (see staging CI green on bec81b8 plus round 28-30
local validation suites). The is_offline_mode ImportError the
reviewer cites only triggers with transformers 5.x against
huggingface_hub 0.x; the constraints pin holds transformers at 4.x
so the combo never materialises on the standard install path.

Concurrency: close the helper / advisor GPU-start race in all four
public load paths (round 30 P1 #7-#10).
  * Add a _PUBLIC_LOAD_PENDING_COUNT counter in
    utils/datasets/llm_assist.py, published under
    _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and
    cleared by a paired _clear_public_load_window in
    routes/inference.py. A concurrent helper / advisor start now
    sees public_load_pending() inside _gpu_workload_busy_for_helper
    and refuses VRAM until the public load attempt finishes,
    closing the window between the busy snapshot and the public
    load flipping its public ownership flags (is_loaded,
    current_checkpoint, is_training_active, etc.).
  * Wire the paired clear into all five call sites (GGUF chat,
    safetensors chat, diffusion image load, training start, export
    load-checkpoint). The chat path tracks the published tag in a
    local so the finally clears the same counter on either branch
    or on early HTTPException.

Security: gate /api/inference/images/load against arbitrary
local-path probes (round 30 P1 #4). Mirror the chat
/api/inference/load native_path_lease boundary so an authenticated
session cannot use repo_id or base_repo as a directory probe.
  * Add native_path_lease + base_repo_native_path_lease to
    DiffusionLoadRequest (optional; Hub ids skip the lease).
  * Add _looks_like_local_diffusion_path + a
    _resolve_diffusion_repo_for_request helper that requires a
    verified directory-typed native path grant for any value that
    starts with /, ~, ./, ../, contains a backslash, or expands to
    an absolute path. The detector deliberately avoids Path.exists
    so the route does not side-channel filesystem layout via
    differential error messages.

Frontend: split the Images page status fetch from the spinner
toggle (round 30 P2 #12). The mount effect and the is_loading
auto-poll now call a setState-free fetchAndUpdateStatus; the
user-driven Refresh button still calls refreshStatus to flip the
spinner. Cleaner separation than the queueMicrotask shim from the
prior commit; the eslint react-hooks/set-state-in-effect rule is
not in the studio-frontend-ci typecheck gate, and the codebase
already has hundreds of pre-existing violations of the same rule.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor). Frontend
typecheck passes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Addresses remaining round-30 reviewer findings against PR unslothai#5754
(diffusion image generation in Unsloth Studio). The studio.txt /
constraints.txt / colab-new hub-bump items (round 30 #1-#3) are
intentionally skipped: the live B200 Studio install path with
huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1
imports Flux2KleinPipeline cleanly and runs end-to-end image
generation (see staging CI green on bec81b8 plus round 28-30
local validation suites). The is_offline_mode ImportError the
reviewer cites only triggers with transformers 5.x against
huggingface_hub 0.x; the constraints pin holds transformers at 4.x
so the combo never materialises on the standard install path.

Concurrency: close the helper / advisor GPU-start race in all four
public load paths (round 30 P1 #7-#10).
  * Add a _PUBLIC_LOAD_PENDING_COUNT counter in
    utils/datasets/llm_assist.py, published under
    _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and
    cleared by a paired _clear_public_load_window in
    routes/inference.py. A concurrent helper / advisor start now
    sees public_load_pending() inside _gpu_workload_busy_for_helper
    and refuses VRAM until the public load attempt finishes,
    closing the window between the busy snapshot and the public
    load flipping its public ownership flags (is_loaded,
    current_checkpoint, is_training_active, etc.).
  * Wire the paired clear into all five call sites (GGUF chat,
    safetensors chat, diffusion image load, training start, export
    load-checkpoint). The chat path tracks the published tag in a
    local so the finally clears the same counter on either branch
    or on early HTTPException.

Security: gate /api/inference/images/load against arbitrary
local-path probes (round 30 P1 #4). Mirror the chat
/api/inference/load native_path_lease boundary so an authenticated
session cannot use repo_id or base_repo as a directory probe.
  * Add native_path_lease + base_repo_native_path_lease to
    DiffusionLoadRequest (optional; Hub ids skip the lease).
  * Add _looks_like_local_diffusion_path + a
    _resolve_diffusion_repo_for_request helper that requires a
    verified directory-typed native path grant for any value that
    starts with /, ~, ./, ../, contains a backslash, or expands to
    an absolute path. The detector deliberately avoids Path.exists
    so the route does not side-channel filesystem layout via
    differential error messages.

Frontend: split the Images page status fetch from the spinner
toggle (round 30 P2 #12). The mount effect and the is_loading
auto-poll now call a setState-free fetchAndUpdateStatus; the
user-driven Refresh button still calls refreshStatus to flip the
spinner. Cleaner separation than the queueMicrotask shim from the
prior commit; the eslint react-hooks/set-state-in-effect rule is
not in the studio-frontend-ci typecheck gate, and the codebase
already has hundreds of pre-existing violations of the same rule.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor). Frontend
typecheck passes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Two round-33 reviewer findings: hub-floor consistency and the
multipart upload filename validator gap.

Dependencies: reverted the round-26 huggingface_hub>=1.3.0 floor
in no-torch-runtime.txt and pyproject.toml (round 33 P1 #1-#5,
4/12 vote consensus). studio.txt forces huggingface_hub==0.36.2
to match the transformers==4.57.6 pin in extras-no-deps.txt, so
the 1.3.0 floor was internally inconsistent. Reviewers
reproduced the resolver conflict on a fresh install.

Empirical justification (re-verified on the live B200 host before
the revert): huggingface_hub 0.36.2 + transformers 4.57.6 +
diffusers 0.37.1 imports Flux2KleinPipeline cleanly and runs
end-to-end image generation. transformers 4.57.6 carries its own
transformers.utils.hub.is_offline_mode and does not actually need
huggingface_hub.is_offline_mode at import time. The original bump
was guarding against the (never-realised) transformers 5.x path,
which extras-no-deps explicitly pins away.

Validation: multipart /seed/upload-unstructured-file now applies
the same _no_control_chars and _reject_embedded_hf_token checks
to file.filename that SeedInspectUploadRequest.filename already
applies in the JSON variant (round 33 P1 #7). The filename is
reflected back to the client, persisted in the per-file meta
JSON, and echoed by error responses, so the JSON-side hardening
must not be asymmetric with the multipart path.

Skipped (consistent with prior rounds):
  * Find_spec vs full import (R33 P1 #6): preserves test
    compatibility with the huggingface_hub stub fixture.
  * React hooks set-state-in-effect lint (R33 P1 #8): codebase
    has 146 pre-existing violations of the same rule;
    studio-frontend-ci does not gate on lint.
  * Direct DiffusionBackend.load_model bypass (R33 P1 #9): the
    route is the only production entry point, and the backend
    helper now publishes its own diffusion-backend pending tag
    (round 32 P1 #3). Direct-caller hardening would require
    duplicating the lease check into load_model itself, which
    is out of scope for the route-layer security boundary.
  * One-segment Hub IDs (R33 P2 #10): strict 2-segment Hub id
    check is intentional; one-segment names are not valid Hub
    ids.
  * Cwd-relative shadow of Hub IDs (R33 P2 #11): documented
    side-channel tradeoff accepted in round 31 commit msg.

97 targeted backend tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Can't train Qwen3.5 or Gemma4 on multimodal datasets in Unsloth Studio

2 participants