Skip to content

build(deps): bump the pip group across 3 directories with 19 updates#3

Closed
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/pip-d357939a1d
Closed

build(deps): bump the pip group across 3 directories with 19 updates#3
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/pip-d357939a1d

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Apr 6, 2026

Copy link
Copy Markdown

Updates the requirements on datasets, setuptools, setuptools-scm, pandas, huggingface-hub, transformers, trl, data-designer-engine, pytest, pytest-rerunfailures, scikit-learn, torchao, chardet, faker, fsspec, python-json-logger, sqlfluff, data-designer and data-designer-config to permit the latest version.
Updates datasets to 4.5.0

Release notes

Sourced from datasets's releases.

4.5.0

Dataset Features

  • Add lance format support by @​eddyxu in huggingface/datasets#7913

    • Support for both Lance dataset (including metadata / manifests) and standalone .lance files
    • e.g. with lance-format/fineweb-edu
    from datasets import load_dataset
    ds = load_dataset("lance-format/fineweb-edu", streaming=True)
    for example in ds["train"]:
    ...

What's Changed

New Contributors

Full Changelog: huggingface/datasets@4.4.2...4.5.0

Commits

Updates setuptools from 80.9.0 to 82.0.1

Changelog

Sourced from setuptools's changelog.

v82.0.1

Bugfixes

  • Fix the loading of launcher manifest.xml file. (#5047)
  • Replaced deprecated json.__version__ with fixture in tests. (#5186)

Improved Documentation

  • Add advice about how to improve predictability when installing sdists. (#5168)

Misc

v82.0.0

Deprecations and Removals

  • pkg_resources has been removed from Setuptools. Most common uses of pkg_resources have been superseded by the importlib.resources <https://docs.python.org/3/library/importlib.resources.html>_ and importlib.metadata <https://docs.python.org/3/library/importlib.metadata.html>_ projects. Projects and environments relying on pkg_resources for namespace packages or other behavior should depend on older versions of setuptools. (#3085)

v81.0.0

Deprecations and Removals

  • Removed support for the --dry-run parameter to setup.py. This one feature by its nature threads through lots of core and ancillary functionality, adding complexity and friction. Removal of this parameter will help decouple the compiler functionality from distutils and thus the eventual full integration of distutils. These changes do affect some class and function signatures, so any derivative functionality may require some compatibility shims to support their expected interface. Please report any issues to the Setuptools project for investigation. (#4872)

v80.10.2

Bugfixes

  • Update vendored dependencies. (#5159)

Misc

... (truncated)

Commits
  • 5a13876 Bump version: 82.0.0 → 82.0.1
  • 51ab8f1 Avoid using (deprecated) 'json.version' in tests (#5194)
  • f9c37b2 Docs/CI: Fix intersphinx references (#5195)
  • 8173db2 Docs: Fix intersphinx references
  • 09bafbc Fix past tense on newsfragment
  • 461ea56 Add news fragment
  • c4ffe53 Avoid using (deprecated) 'json.version' in tests
  • 749258b Cleanup pkg_resources dependencies and configuration (#5175)
  • 2019c16 Parse ext-module.define-macros from pyproject.toml as list of tuples (#5169)
  • b809c86 Sync setuptools schema with validate-pyproject (#5157)
  • Additional commits viewable in compare view

Updates setuptools-scm from 9.2.0 to 9.2.2

Changelog

Sourced from setuptools-scm's changelog.

v9.2.2

Fixed

  • fix #1231: don't warn about tool.setuptools.dynamic.version when only using file finder. The warning about combining version guessing with setuptools dynamic versions should only be issued when setuptools-scm is performing version inference, not when it's only being used for its file finder functionality.

v9.2.1

Fixed

  • fix #1216: accept and create a warning for usages of version = attr: in setuptools config. unfortunately dozens of projects cargo-culted that antipattern
Commits
  • e56b78f Merge pull request #1232 from RonnyPfannschmidt/fix-1231-dont-warn-when-no-guess
  • 4f55e95 docs: update changelog for v9.2.2 patch release
  • 95a0c47 fix: don't warn about tool.setuptools.dynamic.version when only using file fi...
  • 338f562 Merge pull request #1226 from RonnyPfannschmidt/prepare-release
  • a893634 Prepare release v9.2.1
  • ad83282 Merge pull request #1225 from pypa/pre-commit-ci-update-config
  • 20a4464 [pre-commit.ci] pre-commit autoupdate
  • 70f6942 Merge pull request #1219 from RonnyPfannschmidt/fix-1216-explicitly-deprecate...
  • 14d85c0 Install Mercurial on Windows runners via Chocolatey
  • 8c5cec9 Fix API stability check workflow to install griffe and improve reporting
  • Additional commits viewable in compare view

Updates pandas to 3.0.2

Release notes

Sourced from pandas's releases.

pandas 3.0.2

We are pleased to announce the release of pandas 3.0.2. This is a patch release in the 3.0.x series and includes some regression fixes and bug fixes. We recommend that all users of the 3.0.x series upgrade to this version.

See the full whatsnew for a list of all the changes.

Pandas 3.0 supports Python 3.11 and higher. The release can be installed from PyPI:

python -m pip install --upgrade pandas==3.0.*

Or from conda-forge

conda install -c conda-forge pandas=3.0

Please report any issues with the release on the pandas issue tracker.

Thanks to all the contributors who made this release possible.

Commits
  • ab90747 RLS: 3.0.2 (#64934)
  • 6f27013 Backport PR #64931 on branch 3.0.x (DOC/BLD: temporary disable upload of docs...
  • 48ddc60 Backport PR #64664 on branch 3.0.x (BUG: DataFrame.sum() crashes on empty Dat...
  • 8774488 [backport 3.0.x] PERF: fix slow python loop in validation for ArrowStringArra...
  • 33af6cc Backport PR #64133 on branch 3.0.x (BUG: str.find returns byte offset instead...
  • 4ef49d8 [backport 3.0.x] BUG: fix convert_dtypes dropping values from sliced mixed-dt...
  • 0668f34 [backport 3.0.x] BUG: Fix HDFStore.put with StringDtype columns and compressi...
  • 23f2f44 [backport 3.0.x] BUG: Suppress unnecessary RuntimeWarning in to_datetime with...
  • 83ba804 Backport PR #64886: BUG: Compute Variance of Complex Numbers Correctly (#64892)
  • bb5ca1a Backport PR #64386 on branch 3.0.x (BUG: fix sort_index AssertionError with R...
  • Additional commits viewable in compare view

Updates datasets from 4.3.0 to 4.8.4

Release notes

Sourced from datasets's releases.

4.5.0

Dataset Features

  • Add lance format support by @​eddyxu in huggingface/datasets#7913

    • Support for both Lance dataset (including metadata / manifests) and standalone .lance files
    • e.g. with lance-format/fineweb-edu
    from datasets import load_dataset
    ds = load_dataset("lance-format/fineweb-edu", streaming=True)
    for example in ds["train"]:
    ...

What's Changed

New Contributors

Full Changelog: huggingface/datasets@4.4.2...4.5.0

Commits

Updates huggingface-hub from 0.36.2 to 1.9.0

Release notes

Sourced from huggingface-hub's releases.

[v1.9.0] Agent-Aware CLI, Spaces Volumes, and more

🚀 Spaces Volumes: Mount Models, Datasets, and Buckets Directly

Hugging Face Spaces now support mounting volumes, giving your Space direct filesystem access to models, datasets, and storage buckets. This replaces the deprecated persistent storage feature.

from huggingface_hub import HfApi, Volume
api = HfApi()
api.set_space_volumes(
repo_id="username/my-space",
volumes=[
Volume(type="model", source="username/my-model", mount_path="/models", read_only=True),
Volume(type="bucket", source="username/my-bucket", mount_path="/data"),
],
)

Volumes can also be set at creation time via create_repo(space_volumes=...) and duplicate_repo(space_volumes=...), and from the CLI with the --volume / -v flag:

# Create a Space with volumes mounted
hf repos create my-space --type space --space-sdk gradio \
    -v hf://gpt2:/models -v hf://buckets/org/b:/data
Duplicate a Space with volumes
hf repos duplicate org/my-space my-space --type space 
-v hf://gpt2:/models -v hf://buckets/org/b:/data

🤖 The hf CLI Now Auto-Detects AI Agents and Adapts Its Output

AI coding agents (Claude Code, Cursor, Codex, Copilot, Gemini, ...) increasingly use the hf CLI to interact with the Hub. Until now, the output was designed for humans - ANSI colors, padded tables, emoji booleans, truncated cells - making it hard for agents to parse reliably.

Starting with v1.9, the CLI automatically detects when it's running inside an agent and adapts its output: no ANSI, no truncation, tab-separated tables, compact JSON, full timestamps. No configuration needed - it just works. This is only a first step toward making the hf CLI the primary entry point to the Hugging Face Hub for AI agents!

Agent mode is auto-detected but you can also force a mode explicitly with --format:

hf models ls --limit 5                  # auto-detect
hf models ls --limit 5 --format agent   # force agent-friendly output
hf models ls --limit 5 --format json    # structured JSON
hf models ls --limit 5 --format quiet   # IDs only, great for piping

Here's what an agent sees compared to a human:

... (truncated)

Commits
  • b768bb2 Release: v1.9.0
  • 9d30ff2 Release: v1.9.0.rc0
  • 657b8b9 chore: remove claude.yml workflow file (#4031)
  • 38d48d9 [CLI] Migrate models, datasets, spaces, papers to out singleton (#4...
  • 4e2337d [CLI] enrich CLI errors with available options and commands (#4034)
  • ea1f4b7 Support volumes at repo creation and duplication (#4035)
  • 993d645 [FEAT] Support skills from hf skills (#3956)
  • bb7dc6e Add HF_HUB_DISABLE_SYMLINKS env variable to force no-symlink cache (#4032)
  • 2593ff8 Do not scan CACHEDIR.TAG file in cache (#4036)
  • b8d92a2 [Fix] Validate shard filenames in sharded checkpoint index files (#4033)
  • Additional commits viewable in compare view

Updates transformers from 4.57.6 to 5.5.0

Release notes

Sourced from transformers's releases.

Release v5.5.0

New Model additions

Gemma4

Gemma 4 is a multimodal model with pretrained and instruction-tuned variants, available in 1B, 13B, and 27B parameters. The architecture is mostly the same as the previous Gemma versions. The key differences are a vision processor that can output images of fixed token budget and a spatial 2D RoPE to encode vision-specific information across height and width axis.

You can find all the original Gemma 4 checkpoints under the Gemma 4 release.

The key difference from previous Gemma releases is the new design to process images of different sizes using a fixed-budget number of tokens. Unlike many models that squash every image into a fixed square (like 224×224), Gemma 4 keeps the image's natural aspect ratio while making it the right size. There a a couple constraints to follow:

  • The total number of pixels must fit within a patch budget
  • Both height and width must be divisible by 48 (= patch size 16 × pooling kernel 3)

[!IMPORTANT] Gemma 4 does not apply the standard ImageNet mean/std normalization that many other vision models use. The model's own patch embedding layer handles the final scaling internally (shifting values to the [-1, 1] range).

The number of "soft tokens" (aka vision tokens) an image processor can produce is configurable. The supported options are outlined below and the default is 280 soft tokens per image.

Soft Tokens Patches (before pooling) Approx. Image Area
70 630 ~161K pixels
140 1,260 ~323K pixels
280 2,520 ~645K pixels
560 5,040 ~1.3M pixels
1,120 10,080 ~2.6M pixels

To encode positional information for each patch in the image, Gemma 4 uses a learned 2D position embedding table. The position table stores up to 10,240 positions per axis, which allows the model to handle very large images. Each position is a learned vector of the same dimensions as the patch embedding. The 2D RoPE which Gemma 4 uses independently rotate half the attention head dimensions for the x-axis and the other half for the y-axis. This allows the model to understand spatial relationships like "above," "below," "left of," and "right of."

NomicBERT

NomicBERT is a BERT-inspired encoder model that applies Rotary Position Embeddings (RoPE) to create reproducible long context text embeddings. It is the first fully reproducible, open-source text embedding model with 8192 context length that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short-context MTEB and long context LoCo benchmarks. The model generates dense vector embeddings for various tasks including search, clustering, and classification using specific instruction prefixes.

Links: Documentation | Paper

MusicFlamingo

Music Flamingo is a fully open large audio–language model designed for robust understanding and reasoning over music. It builds upon the Audio Flamingo 3 architecture by including Rotary Time Embeddings (RoTE), which injects temporal position information to enable the model to handle audio sequences up to 20 minutes. The model features a unified audio encoder across speech, sound, and music with special sound boundary tokens for improved audio sequence modeling.

Links: Documentation | Paper

... (truncated)

Commits
  • c1c3424 update
  • 20bff68 update release workflow
  • 8956441 v5.5.0
  • 5135e5e casually dropping the most capable open weights on the planet (#45192)
  • a594e09 Internalise the NomicBERT model (#43067)
  • 4932e97 Fix resized LM head weights being overwritten by post_init (#45079)
  • 57e8413 [Qwen3.5 MoE] Add _tp_plan to ForConditionalGeneration (#45124)
  • b10552e Fix TypeError: 'NoneType' object is not iterable in GenerationMixin.generate ...
  • 423f2a3 fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel (#...
  • ade7a05 Generalize gemma vision mask to videos (#45185)
  • Additional commits viewable in compare view

Updates trl from 0.23.1 to 1.0.0

Release notes

Sourced from trl's releases.

v1.0.0

Read our blog post for an overview of TRL v1.

Features

Asynchronous GRPO

Asynchronous GRPO decouples generation from the gradient update loop by offloading rollouts to an external vLLM server. Generation runs in parallel while training continues, eliminating idle GPU time and improving hardware utilization.

from trl.experimental.async_grpo import AsyncGRPOTrainer
from trl.rewards import accuracy_reward
from datasets import load_dataset
dataset = load_dataset("trl-lib/DeepMath-103K", split="train")
trainer = AsyncGRPOTrainer(
model="Qwen/Qwen2.5-0.5B-Instruct",
reward_funcs=accuracy_reward,
train_dataset=dataset,
)
trainer.train()

by @​qgallouedec in huggingface/trl#5293

Variational Sequence-Level Soft Policy Optimization (VESPO)

VESPO addresses training instability in off-policy RL caused by policy staleness, asynchronous updates, and train-inference mismatches. Rather than relying on heuristic token-level clipping (GRPO) or sequence-length normalization (GSPO), VESPO derives a principled reshaping kernel from a variational framework. In practice, this yields a smooth, asymmetric Gamma weighting function that gracefully suppresses extreme sequence-level importance weights without introducing length bias. It can be enabled via the loss_type parameter of GRPOConfig:

from trl import GRPOConfig, GRPOTrainer
trainer = GRPOTrainer(
model="Qwen/Qwen3-0.6B",
args=GRPOConfig(loss_type="vespo"),
...
)

by @​casinca in huggingface/trl#5199

Divergence Proximal Policy Optimization (DPPO)

... (truncated)

Commits
  • f3e9ac1 Release: v1.0 (#5409)
  • e8d5dfc Add second version of Qwen 3.5 chat template to chat_template_utils (#5405)
  • 71ff6a2 Add HF_TOKEN environment variable to workflow files (#5397)
  • 1ee3975 Add vLLM inference to the Base Self-Distillation Trainer (#5388)
  • 79e6e79 Move disable_config=True from generate to GenerationConfig (#5384)
  • 83d68dd chore: update pr_template_check.yml (#5393)
  • 4cb7ab1 Enhance PR template check to exclude reopened PRs from first-time contributor...
  • 32a40bf Enforce PR template for first-time contributors and document AI usage policy ...
  • 8e69b68 Mark test_rloo[fsdp2] as xfail for transformers 5.4.0 (#5387)
  • c264266 Remove deprecated TRACKIO_SPACE_ID env var from all scripts (#5365)
  • Additional commits viewable in compare view

Updates data-designer-engine from 0.5.4 to 0.5.5

Updates pandas from 2.3.3 to 3.0.2

Release notes

Sourced from pandas's releases.

pandas 3.0.2

We are pleased to announce the release of pandas 3.0.2. This is a patch release in the 3.0.x series and includes some regression fixes and bug fixes. We recommend that all users of the 3.0.x series upgrade to this version.

See the full whatsnew for a list of all the changes.

Pandas 3.0 supports Python 3.11 and higher. The release can be installed from PyPI:

python -m pip install --upgrade pandas==3.0.*

Or from conda-forge

conda install -c conda-forge pandas=3.0

Please report any issues with the release on the pandas issue tracker.

Thanks to all the contributors who made this release possible.

Commits
  • ab90747 RLS: 3.0.2 (#64934)
  • 6f27013 Backport PR #64931 on branch 3.0.x (DOC/BLD: temporary disable upload of docs...
  • 48ddc60 Backport PR #64664 on branch 3.0.x (BUG: DataFrame.sum() crashes on empty Dat...
  • 8774488 [backport 3.0.x] PERF: fix slow python loop in validation for ArrowStringArra...
  • 33af6cc Backport PR #64133 on branch 3.0.x (BUG: str.find returns byte offset instead...
  • 4ef49d8 [backport 3.0.x] BUG: fix convert_dtypes dropping values from sliced mixed-dt...
  • 0668f34 [backport 3.0.x] BUG: Fix HDFStore.put with StringDtype columns and compressi...
  • 23f2f44 [backport 3.0.x] BUG: Suppress unnecessary RuntimeWarning in to_datetime with...
  • 83ba804 Backport PR #64886: BUG: Compute Variance of Complex Numbers Correctly (#64892)
  • bb5ca1a Backport PR #64386 on branch 3.0.x (BUG: fix sort_index AssertionError with R...
  • Additional commits viewable in compare view

Updates pytest to 9.0.2

Release notes

Sourced from pytest's releases.

9.0.2

pytest 9.0.2 (2025-12-06)

Bug fixes

  • #13896: The terminal progress feature added in pytest 9.0.0 has been disabled by default, except on Windows, due to compatibility issues with some terminal emulators.

    You may enable it again by passing -p terminalprogress. We may enable it by default again once compatibility improves in the future.

    Additionally, when the environment variable TERM is dumb, the escape codes are no longer emitted, even if the plugin is enabled.

  • #13904: Fixed the TOML type of the tmp_path_retention_count settings in the API reference from number to string.

  • #13946: The private config.inicfg attribute was changed in a breaking manner in pytest 9.0.0. Due to its usage in the ecosystem, it is now restored to working order using a compatibility shim. It will be deprecated in pytest 9.1 and removed in pytest 10.

  • #13965: Fixed quadratic-time behavior when handling unittest subtests in Python 3.10.

Improved documentation

  • #4492: The API Reference now contains cross-reference-able documentation of pytest's command-line flags <command-line-flags>.
Commits
  • 3d10b51 Prepare release version 9.0.2
  • 188750b Merge pull request #14030 from pytest-dev/patchback/backports/9.0.x/1e4b01d1f...
  • b7d7bef Merge pull request #14014 from bluetech/compat-note
  • bd08e85 Merge pull request #14013 from pytest-dev/patchback/backports/9.0.x/922b60377...
  • bc78386 Add CLI options reference documentation (#13930)
  • 5a4e398 Fix docs typo (#14005) (#14008)
  • d7ae6df Merge pull request #14006 from pytest-dev/maintenance/update-plugin-list-tmpl...
  • 556f6a2 pre-commit: fix rst-lint after new release (#13999) (#14001)
  • c60fbe6 Fix quadratic-time behavior when handling unittest subtests in Python 3.10 ...
  • 73d9b01 Merge pull request #13995 from nicoddemus/patchback/backports/9.0.x/1b5200c0f...
  • Additional commits viewable in compare view

Updates pytest-rerunfailures from 15.1 to 16.1

Changelog

Sourced from pytest-rerunfailures's changelog.

16.1 (2025-10-10)

  • Drop support for Python 3.9.

  • Changed "localhost" to "127.0.0.1" to avoid bad hostname resolution.

  • Added --force-reruns to override rerun count globally. Fixes [#306](https://github.com/pytest-dev/pytest-rerunfailures/issues/306) <https://github.com/pytest-dev/pytest-rerunfailures/issues/306>_.

16.0.1 (2025-09-02)

  • Reverted the ability to access error attributes because of an incompatibility with pytest-xdist <https://github.com/pytest-dev/pytest-xdist/issues/843>. Fixes [#302](https://github.com/pytest-dev/pytest-rerunfailures/issues/302) <https://github.com/pytest-dev/pytest-rerunfailures/issues/302>, [#303](https://github.com/pytest-dev/pytest-rerunfailures/issues/303) <https://github.com/pytest-dev/pytest-rerunfailures/issues/303>_.

16.0 (2025-08-29)

Breaking changes ++++++++++++++++

  • Drop support for pytest < 8.

Features ++++++++

  • Add support for pytest 8.4.x.

  • Add support for upcoming Python 3.14.

  • Allow @pytest.mark.flaky(condition) to accept a callable or a string to be evaluated. The evaluated string has access to the exception instance via the error object. ([#230](https://github.com/pytest-dev/pytest-rerunfailures/issues/230) <https://github.com/pytest-dev/pytest-rerunfailures/issues/230>_)

Commits
  • b015092 Preparing release 16.1
  • c1666dd Prepare release.
  • 8d04ad9 Fix NotImplementedError crash when using xdist schedulers without `mark_tes...
  • cb8ede7 Add a --force-reruns to override rerun count globally (#307)
  • 5e01132 Bump actions/setup-python from 5 to 6 in the actions group (#310)
  • 88e0023 Drop support for Python 3.9. (#308)
  • df47974 Change 'localhost' to '127.0.0.1' (#305)
  • f149c7d Back to development: 16.1
  • f97618f Preparing release 16.0.1
  • c60d17d Prepare release.
  • Additional commits viewable in compare view

Updates scikit-learn from 1.7.1 to 1.8.0

Release notes

Sourced from scikit-learn's releases.

Release 1.8.0

We're happy to announce the 1.8.0 release.

You can read the release highlights under https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_8_0.html and the long version of the change log under https://scikit-learn.org/stable/whats_new/v1.8.html

This version supports Python versions 3.11 to 3.14 and features support of free-threaded CPython.

You can upgrade with pip as usual:

pip install -U scikit-learn

The conda-forge builds can be installed using:

conda install -c conda-forge scikit-learn

Scikit-learn 1.7.2

We're happy to announce the 1.7.2 release.

This release contains a few bug fixes and is the first version supporting Python 3.14.

You can see the changelog here: https://scikit-learn.org/stable/whats_new/v1.7.html#version-1-7-2

You can upgrade with pip as usual:

pip install -U scikit-learn

The conda-forge builds can be installed using:

conda install -c conda-forge scikit-learn

Thanks to everyone who contributed to this release !

Commits
  • 646da0f [cd build]
  • 4f4f283 Generate changelog
  • 967dcde Set version
  • cb1424b DOC Release highlights for 1.8 (#32809)
  • 5645b27 🔒 🤖 CI Update lock files for main CI build(s) 🔒 🤖 (#32859)
  • 6b9fb11 🔒 🤖 CI Update lock files for free-threaded CI build(s) 🔒 :rob...
  • a0f6d88 🔒 🤖 CI Update lock files for array-api CI build(s) 🔒 🤖 ...
  • c1de8fc FIX Make get_namespace handle pandas dataframe input (#32838)
  • 764249a Fix _safe_indexing with non integer arrays on array API inputs (#32840)
  • eca5e0a FIX Add new default max_samples=None in Bagging estimators (#32825)
  • Additional commits viewable in compare view

Updates torchao from 0.14.0 to 0.17.0

Release notes

Sourced from torchao's releases.

v0.17.0

Highlights

We are excited to announce the 0.17 release of torchao! This release adds support for cuteDSL MXFP8 MoE kernels, per-head FP8 quantized low precision attention, ABI stability, and more!

CuteDSL MXFP8 MoE Kernels

We added a new CuteDSL MXFP8 quantization kernel for 3d expert weights that writes scale factors directly to blocked layout for tensorcores: pytorch/ao#4090

  • Used for scaling along dim1 in the backward pass of MoE training with grouped GEMMs.
  • ~12% speedup over previous 2 kernel “quantize then scale layout transformation” approach!

Per-Head FP8 Quantized Low Precision Attention

We added a new API for per-head fp8 quantized attention with FA3 as the backend (pytorch/ao#3959 and pytorch/ao#3857)

  • Users can either choose to use the elementary blocks as direct replacements for `F.scaled_dot_product_attention` or use the high-level wrapper, which replaces all F.SDPA calls within a module with the low precision attention variant.
  • Running torch.compile on a wrapped module will enable RoPE fusion where appropriate
  • Results show a 1.84x speedup on Wan2.1-T2V-1.3B, 1.23x speedup on LLaMA 3 prefill with high sequence lengths (131k), 1.07x speedup on flux.1-schnell with 2048x2048 image size

Example Usage of Direct Replacement:

from torchao.prototype.attention.fp8_fa3 import fp8_fa3_sdpa, fp8_fa3_rope_sdpa
out = fp8_fa3_sdpa(q, k, v)

Example Usage of Wrapper:

from torchao.prototype.attention import (
    AttentionBackend,
    LowPrecisionAttentionConfig,
    apply_low_precision_attention,
)
# Instantiate any nn.Module()
model = MyModel()
Simple SDPA replacement
config = LowPrecisionAttentionConfig(backend=AttentionBackend.FP8_FA3)
model = apply_low_precision_attention(model, config)
Flash activation is handled internally by the wrapper
output = model(inputs)
Torch.compile will enable rope fusion
model = torch.compile(model)

PyTorch ABI stability

... (truncated)

Commits

Updates chardet to 7.4.0.post2

Changelog

Sourced from chardet's changelog.

Changelog

.. note::

Entries marked "via Claude" were developed with Claude Code <https://claude.ai/code>_. Dan directed the design, reviewed all output, and takes responsibility for the result. Unmarked entries by Dan were written without AI assistance.

7.4.0 (2026-03-26)

Performance:

  • Switched to dense zlib-compressed model format (v2): models are now stored as contiguous memoryview slices of a single decompressed blob, eliminating per-model struct.unpack overhead. Cold start (import + first detect) dropped from ~75ms to ~13ms with mypyc. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#354](https://github.com/chardet/chardet/issues/354) <https://github.com/chardet/chardet/pull/354>_)

Accuracy:

  • Accuracy improved from 98.6% to 99.3% (2499/2517 files) through a combination of training and scoring improvements:

    • Eliminated train/test data overlap by content-fingerprinting test suite articles and excluding them from training data ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Added MADLAD-400 and Wikipedia as supplemental training sources to fill gaps left by exclusion filtering ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training (instead of being crushed by global normalization), and weighted by per-bigram IDF so encoding-specific byte patterns contribute proportionally to how discriminative they are ([#352](https://github.com/chardet/chardet/issues/352) <https://github.com/chardet/chardet/pull/352>_)
    • Added encoding-aware substitution filtering: character substitutions during training now only apply for ...

      Description has been truncated

Updates the requirements on [datasets](https://github.com/huggingface/datasets), [setuptools](https://github.com/pypa/setuptools), [setuptools-scm](https://github.com/pypa/setuptools-scm), [pandas](https://github.com/pandas-dev/pandas), [huggingface-hub](https://github.com/huggingface/huggingface_hub), [transformers](https://github.com/huggingface/transformers), [trl](https://github.com/huggingface/trl), data-designer-engine, [pytest](https://github.com/pytest-dev/pytest), [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures), [scikit-learn](https://github.com/scikit-learn/scikit-learn), [torchao](https://github.com/pytorch/ao), [chardet](https://github.com/chardet/chardet), [faker](https://github.com/joke2k/faker), [fsspec](https://github.com/fsspec/filesystem_spec), [python-json-logger](https://github.com/nhairs/python-json-logger), [sqlfluff](https://github.com/sqlfluff/sqlfluff), [data-designer](https://github.com/NVIDIA-NeMo/DataDesigner) and data-designer-config to permit the latest version.

Updates `datasets` to 4.5.0
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@3.4.1...4.5.0)

Updates `setuptools` from 80.9.0 to 82.0.1
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](pypa/setuptools@v80.9.0...v82.0.1)

Updates `setuptools-scm` from 9.2.0 to 9.2.2
- [Release notes](https://github.com/pypa/setuptools-scm/releases)
- [Changelog](https://github.com/pypa/setuptools-scm/blob/v9.2.2/CHANGELOG.md)
- [Commits](pypa/setuptools-scm@v9.2.0...v9.2.2)

Updates `pandas` to 3.0.2
- [Release notes](https://github.com/pandas-dev/pandas/releases)
- [Commits](pandas-dev/pandas@v2.0.0...v3.0.2)

Updates `datasets` from 4.3.0 to 4.8.4
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@3.4.1...4.5.0)

Updates `huggingface-hub` from 0.36.2 to 1.9.0
- [Release notes](https://github.com/huggingface/huggingface_hub/releases)
- [Commits](huggingface/huggingface_hub@v0.36.2...v1.9.0)

Updates `transformers` from 4.57.6 to 5.5.0
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](huggingface/transformers@v4.57.6...v5.5.0)

Updates `trl` from 0.23.1 to 1.0.0
- [Release notes](https://github.com/huggingface/trl/releases)
- [Changelog](https://github.com/huggingface/trl/blob/main/RELEASE.md)
- [Commits](huggingface/trl@v0.23.1...v1.0.0)

Updates `data-designer-engine` from 0.5.4 to 0.5.5

Updates `pandas` from 2.3.3 to 3.0.2
- [Release notes](https://github.com/pandas-dev/pandas/releases)
- [Commits](pandas-dev/pandas@v2.0.0...v3.0.2)

Updates `pytest` to 9.0.2
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@1.0.0b3...9.0.2)

Updates `pytest-rerunfailures` from 15.1 to 16.1
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](pytest-dev/pytest-rerunfailures@15.1...16.1)

Updates `scikit-learn` from 1.7.1 to 1.8.0
- [Release notes](https://github.com/scikit-learn/scikit-learn/releases)
- [Commits](scikit-learn/scikit-learn@1.7.1...1.8.0)

Updates `torchao` from 0.14.0 to 0.17.0
- [Release notes](https://github.com/pytorch/ao/releases)
- [Commits](https://github.com/pytorch/ao/commits/v0.17.0)

Updates `chardet` to 7.4.0.post2
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@3.0.2...7.4.0.post2)

Updates `faker` to 40.12.0
- [Release notes](https://github.com/joke2k/faker/releases)
- [Changelog](https://github.com/joke2k/faker/blob/master/CHANGELOG.md)
- [Commits](joke2k/faker@v20.1.0...v40.12.0)

Updates `fsspec` to 2026.3.0
- [Commits](fsspec/filesystem_spec@2025.3.0...2026.3.0)

Updates `python-json-logger` to 4.1.0
- [Release notes](https://github.com/nhairs/python-json-logger/releases)
- [Changelog](https://github.com/nhairs/python-json-logger/blob/main/docs/changelog.md)
- [Commits](nhairs/python-json-logger@v3.0.0...v4.1.0)

Updates `sqlfluff` to 4.1.0
- [Release notes](https://github.com/sqlfluff/sqlfluff/releases)
- [Changelog](https://github.com/sqlfluff/sqlfluff/blob/main/CHANGELOG.md)
- [Commits](sqlfluff/sqlfluff@3.2.0...4.1.0)

Updates `data-designer` from 0.5.4 to 0.5.5
- [Release notes](https://github.com/NVIDIA-NeMo/DataDesigner/releases)
- [Commits](NVIDIA-NeMo/DataDesigner@v0.5.4...v0.5.5)

Updates `data-designer-config` from 0.5.4 to 0.5.5

---
updated-dependencies:
- dependency-name: datasets
  dependency-version: 4.5.0
  dependency-type: direct:development
  dependency-group: pip
- dependency-name: setuptools
  dependency-version: 82.0.1
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: pip
- dependency-name: setuptools-scm
  dependency-version: 9.2.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: pip
- dependency-name: pandas
  dependency-version: 3.0.2
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: datasets
  dependency-version: 4.8.4
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: pip
- dependency-name: huggingface-hub
  dependency-version: 1.9.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: pip
- dependency-name: transformers
  dependency-version: 5.5.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: pip
- dependency-name: trl
  dependency-version: 1.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: pip
- dependency-name: data-designer-engine
  dependency-version: 0.5.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: pip
- dependency-name: pandas
  dependency-version: 3.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: pip
- dependency-name: pytest
  dependency-version: 9.0.2
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: pytest-rerunfailures
  dependency-version: '16.1'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: pip
- dependency-name: scikit-learn
  dependency-version: 1.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: pip
- dependency-name: torchao
  dependency-version: 0.17.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: pip
- dependency-name: chardet
  dependency-version: 7.4.0.post2
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: faker
  dependency-version: 40.12.0
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: fsspec
  dependency-version: 2026.3.0
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: python-json-logger
  dependency-version: 4.1.0
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: sqlfluff
  dependency-version: 4.1.0
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: data-designer
  dependency-version: 0.5.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: pip
- dependency-name: data-designer-config
  dependency-version: 0.5.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Apr 6, 2026
@dependabot @github

dependabot Bot commented on behalf of github Apr 6, 2026

Copy link
Copy Markdown
Author

Superseded by #21.

@dependabot dependabot Bot closed this Apr 6, 2026
@dependabot dependabot Bot deleted the dependabot/pip/pip-d357939a1d branch April 6, 2026 18:11
danielhanchen added a commit that referenced this pull request May 25, 2026
… MPS + base namespace for PR unslothai#5754

Round 12 reviewer findings.

Backend correctness (P1)
  * core/inference/diffusion.py load_model: GGUF branch now
    handles an absolute local directory passed as repo_id by
    joining Path(repo_id) / gguf_filename directly instead of
    handing the path to hf_hub_download (which raises
    HFValidationError because the path is not 'namespace/repo').
    Closes round 12 review #1 -- the load request advertised
    'local path' support but actually only worked for Hub repo ids.

Delete guard precision (P1)
  * routes/models.py /delete-finetuned + /delete-cached:
    diffusion guard now consults gguf_filename from status()
    and ALLOWS per-variant deletes that target a different quant
    than the one the loaded pipeline is reading. Loading
    'Q4_K_S' no longer blocks deleting 'Q8_0' from the same
    repo / export directory (round 12 reviews #3 and #4).

Accelerator (P2)
  * core/inference/diffusion.py _drain_cuda_cache: also calls
    torch.mps.empty_cache() when the MPS backend is the
    active accelerator. Apple Silicon swaps now actually return
    held VRAM instead of leaving it pinned in the Metal
    allocator (round 12 review #10).

Smart base repo (P2)
  * core/inference/diffusion.py _smart_base_repo: only inspects
    the LAST segment of the repo id / path for the 'base' / '9b'
    tokens. A namespace like baseorg/FLUX.2-klein-4B-GGUF or
    a parent directory like /home/me/.cache/base/... no
    longer falsely selects the Base variant (round 12 review #9).
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_release_llama_for()`` now verifies ``llama.unload_model``
did not return False AND that ``is_loaded`` / ``is_active`` /
``loading_model_identifier`` are all cleared after the call. The
previous version only treated raised exceptions as failure, so a
subprocess refusing to terminate or an in-flight GGUF download
let the next workload allocate on top.

P1 #2: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
now raises RuntimeError when ``exp._shutdown_subprocess`` fails on
a settled checkpoint. Direct backend callers used to log at debug
level and proceed toward diffusion allocation while the export
checkpoint still owned VRAM.

P1 #3 + P1 #7: ``/images/load`` no longer drops chat + idle export
before the cheap backend validation runs. ``DiffusionBackend.load_model``
already calls the strict ``_release_other_gpu_owners_for_diffusion``
and ``_release_chat_backend_for_diffusion`` helpers AFTER family
inference and GGUF filename checks pass, so the GPU is still
freed before allocation and a malformed payload no longer
silently unloads the user's chat / chat-export pair.

P1 #4: ``_release_chat_backend_for_diffusion`` now also rejects a
post-unload state where ``loading_model_identifier`` is still set,
matching the route-level ``_release_llama_for`` strictness. A GGUF
download mid-flight before the diffusion handoff used to slip
through and end up double-owning VRAM after diffusion allocated.

P1 #5: ``_release_diffusion_for`` no longer swallows a post-unload
``status()`` failure as ``after = {}``. Training / chat / export
handoffs need proof that the diffusion pipeline released VRAM;
the helper now raises HTTP 503 when the verification status call
itself raises, so the caller retries.

P1 #6: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
raises RuntimeError when ``get_export_backend()`` itself raises.
Direct backend callers used to silently ``return`` here and
proceed to GPU allocation without being able to verify export
ownership.

P1 #8: ``/training/start`` releases settled export BEFORE chat,
matching the chat-load helpers. If idle export shutdown fails the
user's chat model is preserved instead of being dropped for a
training run that never starts.

P2 #9: GGUF load-error scrubber also collapses ``local_gguf_path``,
the resolved HF cache path passed to
``transformer_cls.from_single_file()``. Without this an exception
like ``OSError: cannot load /home/alice/.cache/huggingface/.../flux.gguf``
would leak the operator's filesystem layout through ``last_error``
and ``/images/status``.

All 85 diffusion-relevant backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_release_safetensors_chat_for`` now re-reads
``active_model_name`` and ``loading_models`` after each unload AND
runs a final sweep against the initial owned-name set. The previous
helper trusted ``unload_model() -> True`` even though the
orchestrator can respond ``unloaded`` while still holding weights
or a concurrent ``load`` can repopulate the tracker between calls.
Per-name and global post-state mismatches now raise HTTP 503 so
the caller retries.

P1 #2: same post-state guarantee inside
``_release_chat_backend_for_diffusion`` for direct backend
callers. ``DiffusionBackend.load_model`` now raises RuntimeError
when the safetensors tracker still owns a previously-resident
name after the unload, matching the route-level helper. The route
layer's existing classifier maps the new wording to HTTP 503.

P1 #3: ``DiffusionBackend.load_model`` now preflights the full
diffusers repo (or explicit GGUF ``base_repo``) via
``hf_hub_download(filename="model_index.json")`` BEFORE the
chat / export unload runs. The GGUF path was already covered by
the existing ``hf_hub_download(gguf_filename)`` round-trip; the
full-repo path used to skip validation and let a typo / private /
gated repo only surface inside ``from_pretrained`` AFTER the
user's chat model was already dropped. Local paths are checked
structurally (must be a directory containing ``model_index.json``)
so we do not network-round-trip for an on-disk miss. Error
messages route through ``_display_repo_id`` so an absolute
filesystem path does not leak the operator's layout.

P1 #6: ``/api/inference/unload`` (the direct chat unload endpoint)
now treats ``unload_model() -> False`` AND a leftover state
(``is_loaded`` / ``is_active`` / ``loading_model_identifier`` for
GGUF, ``active_model_name`` / ``loading_models`` for safetensors)
as 503 instead of unconditionally responding
``status="unloaded"``. The UI used to show the model as gone while
the backend still owned VRAM.

P2 #7: extended the /images/load RuntimeError -> HTTPException
marker list with ``still active or loading after unload`` and
``still loading after unload``. Round 18 introduced these exact
phrasings on the backend side; without the extension a retryable
unload failure was returning HTTP 400 to the user instead of 503.

P2 #8: removed the unused ``unsloth_backend = get_inference_backend()``
eager construction in the GGUF chat-load branch. Eager
construction made the GGUF-only path needlessly fail or pay
startup cost when the safetensors backend was unavailable / lazy;
``_release_safetensors_chat_for`` already handles that case as a
no-op.

All 85 diffusion-relevant + 98 related backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_preflight_full_diffusers_repo(effective_base, hf_token)``
now runs for every load mode, including the GGUF-with-auto-base
path. Round 19 only preflighted the full repo or an explicit
``base_repo``, so an auto-picked companion that turned out to be
gated / private / missing still unloaded the user's chat model
before ``from_pretrained`` failed. ``effective_base`` is the same
value that feeds every downstream allocation, so preflighting it
unconditionally catches all three modes.

P1 #2: ``diffusers.GGUFQuantizationConfig`` (which imports the
``gguf`` package at construction time) is now built up front,
inside the same try block that surfaces "Re-run Studio setup".
Previously the missing-dependency exception fired AFTER
``_release_other_gpu_owners_for_diffusion`` and
``_release_chat_backend_for_diffusion`` had already taken the
chat / export models down. The downstream from_single_file call
reuses the same ``quant_config`` reference.

P1 #4: ``studio/backend/requirements/studio.txt`` now lists
``diffusers>=0.37.0`` and ``gguf>=0.10.0``. These were only in
the extras files, so fresh standard Studio installs failed on
/images/load with the round 20 P1 #2 dependency error message.

P1 #5: ``LoadRequest``, ``UnloadRequest``, and
``ValidateModelRequest`` now apply the same control-character +
embedded-HF-token validators that ``DiffusionLoadRequest``
already had. /api/inference/load, /api/inference/validate, and
/api/inference/unload used to accept newline / tab / control
characters in ``model_path`` (log-line smuggling) and URL-form
``https://hf_xxxxx@huggingface.co/...`` (credential leak through
structured log sinks).

P2 #6: ``_collapse_local`` in the diffusion load-error scrubber
now resolves relative candidates and adds the absolute form to
the substring set. A relative ``exports/my-flux`` used to leak
``/mnt/disks/.../exports/my-flux/...`` via downstream library
errors because the scrubber only matched the original literal.
Replacement is longest-first so a leaf-only context survives.

All 85 diffusion-relevant + 35 related model-validation tests
pass locally.

(P1 #3 cross-workload GPU handoff lock is deferred: deserves a
focused design pass across /images/load, /chat/load (both
branches), /training/start, and /export/load to pick a lock
boundary that does not deadlock against the backend load locks
or stall the SSE log stream.)
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1 + #2: ``LoadRequest._no_embedded_hf_tokens`` and
``ValidateModelRequest._no_embedded_hf_tokens`` now cover
``gguf_variant`` in addition to ``model_path``. A caller could
pass a variant like ``Q4_K_M-hf_xxxxxxxx`` that flowed into
structured log sinks via the GGUF resolver path; the matching
``DiffusionLoadRequest`` validator already covered every string
field, so this restores parity.

P1 #3: ``/api/inference/unload`` now also matches the llama
``loading_model_identifier`` when picking the GGUF branch. A
pending GGUF download (``is_active`` still False,
``loading_model_identifier`` populated) used to fall through to
the safetensors branch and respond ``status="unloaded"`` while
llama-server kept downloading.

P1 #4 + #5: the final safetensors-handoff sweeps (route-level
``_release_safetensors_chat_for`` and backend
``_release_chat_backend_for_diffusion``) now check ``active_model_name``
and ``loading_models`` WITHOUT the initial ``owned_names`` filter.
A concurrent ``/load`` that landed AFTER the snapshot was
previously ignored, so a chat model that began loading during the
unload window let training / export / GGUF chat / diffusion start
anyway and race the new chat for VRAM.

P2 #6: added ``_preflight_diffusers_subfolder_config`` and
invoked it for GGUF loads with a transformer class
(``effective_base``, ``"transformer"``). A custom base companion
that had ``model_index.json`` but lacked
``transformer/config.json`` previously passed the round 19
preflight, unloaded chat, then failed inside
``from_single_file``.

P2 #7: ``_scrub_validation_obj`` in main.py also scrubs string
dict KEYS. Pydantic ``string_type`` errors surface ``input``
verbatim, and a malformed payload like
``{"repo_id": {"hf_xxxxx": "owner/repo"}}`` would otherwise leak
the token through the 422 response body.

All 85 diffusion-relevant + 35 model-validation tests pass
locally. Existing fakes for ``hf_hub_download`` updated to
accept the new ``subfolder=`` kwarg the round 21 preflight uses.

(P1 #3 cross-workload GPU handoff lock from round 20 is still
deferred; round 21's P1 #4 / #5 raised the sweep-level guarantee,
which closes the most common race without the deadlock risk of
holding a process-wide lock across the entire load.)
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``TrainingStartRequest.model_name`` now runs the same
control-character and embedded-HF-token validators that the chat
and diffusion request models gained in rounds 5 / 15 / 20 / 21.
``/api/training/start`` previously accepted newline / tab /
control characters and URL-form ``hf_xxxxx`` tokens that flowed
into structured-log sinks via "Loading model %s" lines.

P1 #2: ``_run_with_helper`` in ``utils/datasets/llm_assist.py``
now skips the helper GGUF when the diffusion image backend
reports loaded / loading. The public chat / training / export
routes already do this through ``_release_diffusion_for``, but
this dataset-side helper loaded llama-server directly with no
diffusion guard, so an Images-page allocation would race the
helper for VRAM. New ``_diffusion_image_model_busy`` helper
fails closed (treats status() failure as busy) so the resident
image model is preserved instead of being overwritten.

P1 #3: same ``_diffusion_image_model_busy`` guard added to
``_run_multi_pass_advisor`` (the dataset conversion advisor),
which has the same direct llama.cpp load shape.

P2 #4: the early "Could not infer a diffusion family" RuntimeError
now routes ``repo_id`` through ``_display_repo_id`` before
formatting. A local absolute path that did not match any known
family used to leak the operator's filesystem layout via the 400
response body, last_error, and log line.

All 97 diffusion + training-validation + related tests pass
locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1 + #2 + #6: extended the chat / diffusion / training
identifier hardening to every export-side request model.
ExportCommonOptions (parent of ExportMergedModelRequest /
ExportBaseModelRequest / ExportLoRAAdapterRequest) now applies
_no_control_chars and _reject_embedded_hf_token to repo_id and
base_model_id; ExportGGUFRequest gets the same on its repo_id
plus a control-char check on quantization_method; and
LoadCheckpointRequest validates checkpoint_path. Previously
"/api/export/*" accepted newline-smuggled identifiers and
URL-form ``hf_xxxxx`` tokens that flowed into log lines.

P1 #3 + #4: ``_run_with_helper`` and ``_run_multi_pass_advisor``
now use a shared ``_gpu_workload_busy_for_helper`` that gates on
diffusion (round 22 already), training, AND export. The round 22
guard only checked diffusion, so the dataset helper / advisor
could still load llama-server on top of an active training run
or a resident export checkpoint. Each step fails closed
(unverifiable status counts as busy) so the user's primary
workload is preserved.

P1 #5: PublishDatasetRequest in models/data_recipe.py also
applies the identifier hardening to repo_id; the publish path
previously accepted control characters and URL-form tokens.

P1 #7-10: added _validate_logged_identifier helper to
routes/models.py and applied it to the path / query parameter
endpoints that flow into logger.info(...) calls --
``/config/{model_name}``, ``/check-vision/{model_name}``,
``/check-embedding/{model_name}``, ``/gguf-variants``. Mapped
the validator's ValueError to HTTP 422 so the client sees the
same shape as a Pydantic validation failure.

P2 #11 + #12: ``Loading diffusion model %s`` and
``Diffusion load failed for %s`` log lines route ``repo_id`` /
``effective_base`` through ``_display_repo_id`` (collapses
absolute local paths to the leaf, still scrubs HF tokens)
instead of plain ``_redact_hf_tokens``. The error path was
already collapsed in the user-facing 400 / RuntimeError, but
the structured-log lines kept the full path.

All 97 diffusion + training-validation + related tests pass
locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in
``utils/datasets/llm_assist.py`` now also gates on the GGUF chat
backend (llama-server) AND the safetensors chat backend. Round 23
extended it to training + export but missed Chat, so a helper /
advisor GGUF could still race a loaded chat model for VRAM.
Both checks fail closed when status is unverifiable.

P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff
unloads so the diffusion release runs BEFORE the chat releases.
A wedged diffusion unload used to fire AFTER chat was already
gone, so the user lost both on a single failure. Drop chat last
so an earlier failure preserves it. Applied to
``/training/start`` (training.py), ``/export/load`` (export.py),
``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch
(routes/inference.py).

P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens
``model_path`` and ``gguf_variant`` via the shared
``_validate_logged_identifier`` helper, so control characters
and URL-form HF tokens can no longer log-line-smuggle.

P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and
``variant`` the same way.

P1 #9: ``/download-progress`` ``repo_id`` query parameter is
also hardened; the value flows into log lines deep inside
``_get_repo_size_cached`` on lookup failure.

P1 #11: ``CheckFormatRequest.dataset_name`` and
``AiAssistMappingRequest.{dataset_name, model_name}`` in
``models/datasets.py`` now apply the same control-char +
embedded-HF-token validators, matching every other public
request-body model.

All 115 diffusion + training-validation + cached_gguf + export
+ inference model-validation tests pass locally.

(P1 #6 native-path-lease enforcement for diffusion local paths
and P1 #12 React Compiler frontend lint deferred -- both need
focused design / frontend touchups separate from this batch.)
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_release_llama_for()`` now verifies ``llama.unload_model``
did not return False AND that ``is_loaded`` / ``is_active`` /
``loading_model_identifier`` are all cleared after the call. The
previous version only treated raised exceptions as failure, so a
subprocess refusing to terminate or an in-flight GGUF download
let the next workload allocate on top.

P1 #2: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
now raises RuntimeError when ``exp._shutdown_subprocess`` fails on
a settled checkpoint. Direct backend callers used to log at debug
level and proceed toward diffusion allocation while the export
checkpoint still owned VRAM.

P1 #3 + P1 #7: ``/images/load`` no longer drops chat + idle export
before the cheap backend validation runs. ``DiffusionBackend.load_model``
already calls the strict ``_release_other_gpu_owners_for_diffusion``
and ``_release_chat_backend_for_diffusion`` helpers AFTER family
inference and GGUF filename checks pass, so the GPU is still
freed before allocation and a malformed payload no longer
silently unloads the user's chat / chat-export pair.

P1 #4: ``_release_chat_backend_for_diffusion`` now also rejects a
post-unload state where ``loading_model_identifier`` is still set,
matching the route-level ``_release_llama_for`` strictness. A GGUF
download mid-flight before the diffusion handoff used to slip
through and end up double-owning VRAM after diffusion allocated.

P1 #5: ``_release_diffusion_for`` no longer swallows a post-unload
``status()`` failure as ``after = {}``. Training / chat / export
handoffs need proof that the diffusion pipeline released VRAM;
the helper now raises HTTP 503 when the verification status call
itself raises, so the caller retries.

P1 #6: ``DiffusionBackend._release_other_gpu_owners_for_diffusion``
raises RuntimeError when ``get_export_backend()`` itself raises.
Direct backend callers used to silently ``return`` here and
proceed to GPU allocation without being able to verify export
ownership.

P1 #8: ``/training/start`` releases settled export BEFORE chat,
matching the chat-load helpers. If idle export shutdown fails the
user's chat model is preserved instead of being dropped for a
training run that never starts.

P2 #9: GGUF load-error scrubber also collapses ``local_gguf_path``,
the resolved HF cache path passed to
``transformer_cls.from_single_file()``. Without this an exception
like ``OSError: cannot load /home/alice/.cache/huggingface/.../flux.gguf``
would leak the operator's filesystem layout through ``last_error``
and ``/images/status``.

All 85 diffusion-relevant backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_release_safetensors_chat_for`` now re-reads
``active_model_name`` and ``loading_models`` after each unload AND
runs a final sweep against the initial owned-name set. The previous
helper trusted ``unload_model() -> True`` even though the
orchestrator can respond ``unloaded`` while still holding weights
or a concurrent ``load`` can repopulate the tracker between calls.
Per-name and global post-state mismatches now raise HTTP 503 so
the caller retries.

P1 #2: same post-state guarantee inside
``_release_chat_backend_for_diffusion`` for direct backend
callers. ``DiffusionBackend.load_model`` now raises RuntimeError
when the safetensors tracker still owns a previously-resident
name after the unload, matching the route-level helper. The route
layer's existing classifier maps the new wording to HTTP 503.

P1 #3: ``DiffusionBackend.load_model`` now preflights the full
diffusers repo (or explicit GGUF ``base_repo``) via
``hf_hub_download(filename="model_index.json")`` BEFORE the
chat / export unload runs. The GGUF path was already covered by
the existing ``hf_hub_download(gguf_filename)`` round-trip; the
full-repo path used to skip validation and let a typo / private /
gated repo only surface inside ``from_pretrained`` AFTER the
user's chat model was already dropped. Local paths are checked
structurally (must be a directory containing ``model_index.json``)
so we do not network-round-trip for an on-disk miss. Error
messages route through ``_display_repo_id`` so an absolute
filesystem path does not leak the operator's layout.

P1 #6: ``/api/inference/unload`` (the direct chat unload endpoint)
now treats ``unload_model() -> False`` AND a leftover state
(``is_loaded`` / ``is_active`` / ``loading_model_identifier`` for
GGUF, ``active_model_name`` / ``loading_models`` for safetensors)
as 503 instead of unconditionally responding
``status="unloaded"``. The UI used to show the model as gone while
the backend still owned VRAM.

P2 #7: extended the /images/load RuntimeError -> HTTPException
marker list with ``still active or loading after unload`` and
``still loading after unload``. Round 18 introduced these exact
phrasings on the backend side; without the extension a retryable
unload failure was returning HTTP 400 to the user instead of 503.

P2 #8: removed the unused ``unsloth_backend = get_inference_backend()``
eager construction in the GGUF chat-load branch. Eager
construction made the GGUF-only path needlessly fail or pay
startup cost when the safetensors backend was unavailable / lazy;
``_release_safetensors_chat_for`` already handles that case as a
no-op.

All 85 diffusion-relevant + 98 related backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_preflight_full_diffusers_repo(effective_base, hf_token)``
now runs for every load mode, including the GGUF-with-auto-base
path. Round 19 only preflighted the full repo or an explicit
``base_repo``, so an auto-picked companion that turned out to be
gated / private / missing still unloaded the user's chat model
before ``from_pretrained`` failed. ``effective_base`` is the same
value that feeds every downstream allocation, so preflighting it
unconditionally catches all three modes.

P1 #2: ``diffusers.GGUFQuantizationConfig`` (which imports the
``gguf`` package at construction time) is now built up front,
inside the same try block that surfaces "Re-run Studio setup".
Previously the missing-dependency exception fired AFTER
``_release_other_gpu_owners_for_diffusion`` and
``_release_chat_backend_for_diffusion`` had already taken the
chat / export models down. The downstream from_single_file call
reuses the same ``quant_config`` reference.

P1 #4: ``studio/backend/requirements/studio.txt`` now lists
``diffusers>=0.37.0`` and ``gguf>=0.10.0``. These were only in
the extras files, so fresh standard Studio installs failed on
/images/load with the round 20 P1 #2 dependency error message.

P1 #5: ``LoadRequest``, ``UnloadRequest``, and
``ValidateModelRequest`` now apply the same control-character +
embedded-HF-token validators that ``DiffusionLoadRequest``
already had. /api/inference/load, /api/inference/validate, and
/api/inference/unload used to accept newline / tab / control
characters in ``model_path`` (log-line smuggling) and URL-form
``https://hf_xxxxx@huggingface.co/...`` (credential leak through
structured log sinks).

P2 #6: ``_collapse_local`` in the diffusion load-error scrubber
now resolves relative candidates and adds the absolute form to
the substring set. A relative ``exports/my-flux`` used to leak
``/mnt/disks/.../exports/my-flux/...`` via downstream library
errors because the scrubber only matched the original literal.
Replacement is longest-first so a leaf-only context survives.

All 85 diffusion-relevant + 35 related model-validation tests
pass locally.

(P1 #3 cross-workload GPU handoff lock is deferred: deserves a
focused design pass across /images/load, /chat/load (both
branches), /training/start, and /export/load to pick a lock
boundary that does not deadlock against the backend load locks
or stall the SSE log stream.)
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1 + #2: ``LoadRequest._no_embedded_hf_tokens`` and
``ValidateModelRequest._no_embedded_hf_tokens`` now cover
``gguf_variant`` in addition to ``model_path``. A caller could
pass a variant like ``Q4_K_M-hf_xxxxxxxx`` that flowed into
structured log sinks via the GGUF resolver path; the matching
``DiffusionLoadRequest`` validator already covered every string
field, so this restores parity.

P1 #3: ``/api/inference/unload`` now also matches the llama
``loading_model_identifier`` when picking the GGUF branch. A
pending GGUF download (``is_active`` still False,
``loading_model_identifier`` populated) used to fall through to
the safetensors branch and respond ``status="unloaded"`` while
llama-server kept downloading.

P1 #4 + #5: the final safetensors-handoff sweeps (route-level
``_release_safetensors_chat_for`` and backend
``_release_chat_backend_for_diffusion``) now check ``active_model_name``
and ``loading_models`` WITHOUT the initial ``owned_names`` filter.
A concurrent ``/load`` that landed AFTER the snapshot was
previously ignored, so a chat model that began loading during the
unload window let training / export / GGUF chat / diffusion start
anyway and race the new chat for VRAM.

P2 #6: added ``_preflight_diffusers_subfolder_config`` and
invoked it for GGUF loads with a transformer class
(``effective_base``, ``"transformer"``). A custom base companion
that had ``model_index.json`` but lacked
``transformer/config.json`` previously passed the round 19
preflight, unloaded chat, then failed inside
``from_single_file``.

P2 #7: ``_scrub_validation_obj`` in main.py also scrubs string
dict KEYS. Pydantic ``string_type`` errors surface ``input``
verbatim, and a malformed payload like
``{"repo_id": {"hf_xxxxx": "owner/repo"}}`` would otherwise leak
the token through the 422 response body.

All 85 diffusion-relevant + 35 model-validation tests pass
locally. Existing fakes for ``hf_hub_download`` updated to
accept the new ``subfolder=`` kwarg the round 21 preflight uses.

(P1 #3 cross-workload GPU handoff lock from round 20 is still
deferred; round 21's P1 #4 / #5 raised the sweep-level guarantee,
which closes the most common race without the deadlock risk of
holding a process-wide lock across the entire load.)
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``TrainingStartRequest.model_name`` now runs the same
control-character and embedded-HF-token validators that the chat
and diffusion request models gained in rounds 5 / 15 / 20 / 21.
``/api/training/start`` previously accepted newline / tab /
control characters and URL-form ``hf_xxxxx`` tokens that flowed
into structured-log sinks via "Loading model %s" lines.

P1 #2: ``_run_with_helper`` in ``utils/datasets/llm_assist.py``
now skips the helper GGUF when the diffusion image backend
reports loaded / loading. The public chat / training / export
routes already do this through ``_release_diffusion_for``, but
this dataset-side helper loaded llama-server directly with no
diffusion guard, so an Images-page allocation would race the
helper for VRAM. New ``_diffusion_image_model_busy`` helper
fails closed (treats status() failure as busy) so the resident
image model is preserved instead of being overwritten.

P1 #3: same ``_diffusion_image_model_busy`` guard added to
``_run_multi_pass_advisor`` (the dataset conversion advisor),
which has the same direct llama.cpp load shape.

P2 #4: the early "Could not infer a diffusion family" RuntimeError
now routes ``repo_id`` through ``_display_repo_id`` before
formatting. A local absolute path that did not match any known
family used to leak the operator's filesystem layout via the 400
response body, last_error, and log line.

All 97 diffusion + training-validation + related tests pass
locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1 + #2 + #6: extended the chat / diffusion / training
identifier hardening to every export-side request model.
ExportCommonOptions (parent of ExportMergedModelRequest /
ExportBaseModelRequest / ExportLoRAAdapterRequest) now applies
_no_control_chars and _reject_embedded_hf_token to repo_id and
base_model_id; ExportGGUFRequest gets the same on its repo_id
plus a control-char check on quantization_method; and
LoadCheckpointRequest validates checkpoint_path. Previously
"/api/export/*" accepted newline-smuggled identifiers and
URL-form ``hf_xxxxx`` tokens that flowed into log lines.

P1 #3 + #4: ``_run_with_helper`` and ``_run_multi_pass_advisor``
now use a shared ``_gpu_workload_busy_for_helper`` that gates on
diffusion (round 22 already), training, AND export. The round 22
guard only checked diffusion, so the dataset helper / advisor
could still load llama-server on top of an active training run
or a resident export checkpoint. Each step fails closed
(unverifiable status counts as busy) so the user's primary
workload is preserved.

P1 #5: PublishDatasetRequest in models/data_recipe.py also
applies the identifier hardening to repo_id; the publish path
previously accepted control characters and URL-form tokens.

P1 #7-10: added _validate_logged_identifier helper to
routes/models.py and applied it to the path / query parameter
endpoints that flow into logger.info(...) calls --
``/config/{model_name}``, ``/check-vision/{model_name}``,
``/check-embedding/{model_name}``, ``/gguf-variants``. Mapped
the validator's ValueError to HTTP 422 so the client sees the
same shape as a Pydantic validation failure.

P2 #11 + #12: ``Loading diffusion model %s`` and
``Diffusion load failed for %s`` log lines route ``repo_id`` /
``effective_base`` through ``_display_repo_id`` (collapses
absolute local paths to the leaf, still scrubs HF tokens)
instead of plain ``_redact_hf_tokens``. The error path was
already collapsed in the user-facing 400 / RuntimeError, but
the structured-log lines kept the full path.

All 97 diffusion + training-validation + related tests pass
locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
P1 #1: ``_gpu_workload_busy_for_helper`` in
``utils/datasets/llm_assist.py`` now also gates on the GGUF chat
backend (llama-server) AND the safetensors chat backend. Round 23
extended it to training + export but missed Chat, so a helper /
advisor GGUF could still race a loaded chat model for VRAM.
Both checks fail closed when status is unverifiable.

P1 #2 / #3 / #4 / #5: re-ordered the route-level GPU-handoff
unloads so the diffusion release runs BEFORE the chat releases.
A wedged diffusion unload used to fire AFTER chat was already
gone, so the user lost both on a single failure. Drop chat last
so an earlier failure preserves it. Applied to
``/training/start`` (training.py), ``/export/load`` (export.py),
``/chat/load`` GGUF branch and ``/chat/load`` safetensors branch
(routes/inference.py).

P1 #7 + P2 #13: ``/delete-finetuned`` body now hardens
``model_path`` and ``gguf_variant`` via the shared
``_validate_logged_identifier`` helper, so control characters
and URL-form HF tokens can no longer log-line-smuggle.

P1 #8 + #10: ``/delete-cached`` body hardens ``repo_id`` and
``variant`` the same way.

P1 #9: ``/download-progress`` ``repo_id`` query parameter is
also hardened; the value flows into log lines deep inside
``_get_repo_size_cached`` on lookup failure.

P1 #11: ``CheckFormatRequest.dataset_name`` and
``AiAssistMappingRequest.{dataset_name, model_name}`` in
``models/datasets.py`` now apply the same control-char +
embedded-HF-token validators, matching every other public
request-body model.

All 115 diffusion + training-validation + cached_gguf + export
+ inference model-validation tests pass locally.

(P1 #6 native-path-lease enforcement for diffusion local paths
and P1 #12 React Compiler frontend lint deferred -- both need
focused design / frontend touchups separate from this batch.)
danielhanchen added a commit that referenced this pull request May 25, 2026
Twelve P1 findings from round 26 reviewer aggregate, plus the CI
revert of round 25 P1 #5 to a less invasive location.

1. requirements/studio.txt + requirements/single-env/constraints.txt:
   revert the round 25 huggingface-hub bump (broke Studio Update CI,
   Mac Studio Update CI, Mac Studio UI CI, Studio UI CI all with
   ResolutionImpossible against transformers==4.57.6 which requires
   hub<1.0). Standard install path stays on the well-tested 4.57.6 +
   0.36.2 + trl 0.23.1 trio.

2. requirements/no-torch-runtime.txt + pyproject.toml
   [huggingfacenotorch]: bump huggingface_hub floor from >=0.34.0 to
   >=1.3.0,<2.0 -- this is where the actual transformers 5.x +
   hub 0.36.2 broken combo can land because the file installs
   --no-deps. transformers 5.x calls hub.is_offline_mode which only
   exists in hub 1.x.

3. utils/datasets/llm_assist.py: revert round 25 P1 #4 (helper/advisor
   sharing the global llama backend) which introduced three
   regressions: a chat-evict load race after the busy precheck, a
   finally-block that could unload a user chat model, and an
   identifier mismatch the delete guard could not canonicalize. Go
   back to PRIVATE LlamaCppBackend instances and expose the active
   helper/advisor repos through a new thread-safe registry
   (helper_advisor_owns_repo / _register_helper_advisor_repo /
   _unregister_helper_advisor_repo) so DELETE /api/models/delete-cached
   can still block the rmtree.

4. routes/models.py delete_cached_model: check the new helper/advisor
   registry up front and 409 if a helper/advisor still owns the
   target repo. Closes round 26 P1 #13 and #14 (helper/advisor
   identifiers were prefixed and would never equal the raw repo id).

5. routes/models.py get_lora_base_model: validate lora_path with
   _validate_logged_identifier before it is reflected in 404 detail
   and error logs (round 26 P1 #12).

6. routes/inference.py /unload: round 21 P1 #3 added a "or not
   is_loaded" fallback that let an unload of owner/B cancel a pending
   llama load of owner/A. Replace it with a narrow
   llama_is_starting_without_identifier branch that only fires when
   llama-server is mid-startup with neither identifier set (round 26
   P1 #5).

7. routes/inference.py /unload: poll loading_model_identifier for up
   to 5 s after asyncio.to_thread(unload_model) so a legitimate
   pending-load cancel does not 503 because the load thread has not
   yet observed _cancel_event in its finally (round 26 P2 #15).

8. models/training.py TrainingStartRequest: extend identifier
   hardening to hf_dataset, subset, train_split, eval_split. Round 22
   only guarded model_name (round 26 P1 #10).

9. models/data_recipe.py SeedInspectRequest: add _no_control_chars +
   _reject_embedded_hf_token field_validators on dataset_name (round
   26 P1 #11).

Tests: 105 targeted (diffusion + cached_gguf + llama_cpp_cache +
inference_model_validation + models_get_model_config) and 1768
broader backend tests pass locally. Pre-existing
test_desktop_auth.py, test_studio_api.py, and
test_training_worker_flash_attn.py failures reproduce on HEAD
without these changes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Twelve actionable P1/P2 findings from round 28 reviewer aggregate.
Skipped #3 (studio.txt huggingface-hub bump) because the empirical
CI evidence in round 26 contradicts that suggestion: bumping the
pin there breaks installs that apply constraints.txt
(transformers==4.57.6 requires hub<1.0). The actual broken combo
only happens via the --no-deps no-torch path which is already
bumped in no-torch-runtime.txt and pyproject.toml huggingfacenotorch.

1. utils/datasets/llm_assist.py: split _HELPER_ADVISOR_REFCOUNT
   into CACHE vs GPU counters. helper_advisor_owns_repo (used by
   delete-cache) reads CACHE; helper_advisor_busy (used by public
   handoffs) reads GPU. precache_helper_gguf now registers with
   gpu_owner=False so a background pre-cache download does not
   503 every chat / training / export / diffusion load.

2. utils/datasets/llm_assist.py: introduce _HELPER_ADVISOR_START_LOCK
   and wrap the busy precheck + register pair in _run_with_helper
   and _run_multi_pass_advisor. Two concurrent helper / advisor
   invocations could both pass _gpu_workload_busy_for_helper before
   either registered, then OOM each other.

3. utils/datasets/llm_assist.py: _gpu_workload_busy_for_helper now
   also returns True when another helper/advisor already holds the
   private LlamaCppBackend.

4. routes/inference.py: add _raise_if_helper_advisor_busy(workload)
   that 503s when AI Assist owns the GPU. Wire it into both chat
   load branches (GGUF + safetensors) BEFORE the existing
   _release_export_for / _release_diffusion_for calls so we do not
   first tear down an idle export / diffusion just to fail on the
   helper check.

5. routes/training.py + routes/export.py + diffusion.load_model:
   call the helper-busy check FIRST before any release helper
   fires. Mirrors the chat-load ordering.

6. routes/inference.py _release_llama_for: poll
   loading_model_identifier for up to 5 s after unload_model() so a
   cancelled pending GGUF download has time to clear its
   identifier. Mirrors the same wait round 26 added to the explicit
   /api/inference/unload route.

7. core/inference/diffusion.py _release_chat_backend_for_diffusion:
   same 5 s settling wait for cancelled pending GGUF downloads.

8. models/inference.py LoadRequest: validate every llama_extra_args
   entry through _no_control_chars + _reject_embedded_hf_token.
   The list was forwarded verbatim to a logged llama-server command
   line, so a smuggled control char or hf_... token would land in
   logs and subprocess args.

9. routes/models.py /gguf-download-progress: apply
   _validate_logged_identifier to repo_id and variant, matching the
   round 24 hardening on the adjacent generic /download-progress.

10. routes/inference.py diffusion-load RuntimeError classifier:
    treat "AI Assist ..." messages as retryable 503 instead of 400
    (round 28 P2 #15). Mirrors the round 18/19 markers for chat
    unload failures.

Tests: 105 targeted + 1768 broader backend tests pass locally.
danielhanchen added a commit that referenced this pull request May 25, 2026
Four actionable findings from round 30. Skipped P1 #1 / #2 / #3
(huggingface-hub bump in studio.txt / single-env / colab-new) because
the live B200 Studio that successfully generated FLUX.2 klein images
runs the exact combo the reviewer flags as broken:
    huggingface_hub 0.36.2 + transformers 4.57.6 + diffusers 0.37.1
    Flux2KleinPipeline: True (imports cleanly)
The is_offline_mode ImportError only fires with transformers 5.x, and
the standard install path pins transformers==4.57.6 via constraints.
The round 26 fix bumped no-torch-runtime.txt + pyproject huggingfacenotorch
where the --no-deps install path can land on transformers 5.x; that
remains the correct surface.

1. core/inference/diffusion.py: preflight transformers + accelerate
   via importlib.util.find_spec BEFORE any destructive GPU-owner
   unload. Diffusers can expose stub pipeline classes when
   transformers / accelerate are missing, so the load used to drop
   chat first and fail later inside from_pretrained. find_spec
   keeps existing tests that stub these modules passing because no
   real module is executed (round 30 P1 #11).

2. models/export.py ExportGGUFRequest.quantization_method: extend
   the embedded HF token validator to this field too. Round 23
   added the control-char guard but not the token guard; the value
   is forwarded into worker command lines and reflected in error /
   success text (round 30 P1 #5).

3. models/data_recipe.py SeedInspectUploadRequest: add
   _no_control_chars + _reject_embedded_hf_token field_validators
   to filename and to each entry of file_names. Mirrors the sibling
   SeedInspectRequest.dataset_name hardening (round 30 P1 #6).

4. frontend/src/features/images/images-page.tsx: defer the initial
   refreshStatus() call via queueMicrotask so the synchronous
   setRefreshingStatus(true) inside it does not trip the
   react-hooks/set-state-in-effect lint on mount (round 30 P2 #12).

Deferred (need larger surgery / out of scope for this round):
   P1 #4 native_path_lease for diffusion local-path loads
   P1 #7-#10 helper/advisor + public-start window mutual lock symmetry

Tests: 98 targeted (diffusion + cached_gguf + inference_validation)
pass locally; frontend npm run typecheck passes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Addresses remaining round-30 reviewer findings against PR unslothai#5754
(diffusion image generation in Unsloth Studio). The studio.txt /
constraints.txt / colab-new hub-bump items (round 30 #1-#3) are
intentionally skipped: the live B200 Studio install path with
huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1
imports Flux2KleinPipeline cleanly and runs end-to-end image
generation (see staging CI green on bec81b8 plus round 28-30
local validation suites). The is_offline_mode ImportError the
reviewer cites only triggers with transformers 5.x against
huggingface_hub 0.x; the constraints pin holds transformers at 4.x
so the combo never materialises on the standard install path.

Concurrency: close the helper / advisor GPU-start race in all four
public load paths (round 30 P1 #7-#10).
  * Add a _PUBLIC_LOAD_PENDING_COUNT counter in
    utils/datasets/llm_assist.py, published under
    _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and
    cleared by a paired _clear_public_load_window in
    routes/inference.py. A concurrent helper / advisor start now
    sees public_load_pending() inside _gpu_workload_busy_for_helper
    and refuses VRAM until the public load attempt finishes,
    closing the window between the busy snapshot and the public
    load flipping its public ownership flags (is_loaded,
    current_checkpoint, is_training_active, etc.).
  * Wire the paired clear into all five call sites (GGUF chat,
    safetensors chat, diffusion image load, training start, export
    load-checkpoint). The chat path tracks the published tag in a
    local so the finally clears the same counter on either branch
    or on early HTTPException.

Security: gate /api/inference/images/load against arbitrary
local-path probes (round 30 P1 #4). Mirror the chat
/api/inference/load native_path_lease boundary so an authenticated
session cannot use repo_id or base_repo as a directory probe.
  * Add native_path_lease + base_repo_native_path_lease to
    DiffusionLoadRequest (optional; Hub ids skip the lease).
  * Add _looks_like_local_diffusion_path + a
    _resolve_diffusion_repo_for_request helper that requires a
    verified directory-typed native path grant for any value that
    starts with /, ~, ./, ../, contains a backslash, or expands to
    an absolute path. The detector deliberately avoids Path.exists
    so the route does not side-channel filesystem layout via
    differential error messages.

Frontend: split the Images page status fetch from the spinner
toggle (round 30 P2 #12). The mount effect and the is_loading
auto-poll now call a setState-free fetchAndUpdateStatus; the
user-driven Refresh button still calls refreshStatus to flip the
spinner. Cleaner separation than the queueMicrotask shim from the
prior commit; the eslint react-hooks/set-state-in-effect rule is
not in the studio-frontend-ci typecheck gate, and the codebase
already has hundreds of pre-existing violations of the same rule.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor). Frontend
typecheck passes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Addresses remaining round-30 reviewer findings against PR unslothai#5754
(diffusion image generation in Unsloth Studio). The studio.txt /
constraints.txt / colab-new hub-bump items (round 30 #1-#3) are
intentionally skipped: the live B200 Studio install path with
huggingface_hub==0.36.2, transformers==4.57.6 and diffusers==0.37.1
imports Flux2KleinPipeline cleanly and runs end-to-end image
generation (see staging CI green on bec81b8 plus round 28-30
local validation suites). The is_offline_mode ImportError the
reviewer cites only triggers with transformers 5.x against
huggingface_hub 0.x; the constraints pin holds transformers at 4.x
so the combo never materialises on the standard install path.

Concurrency: close the helper / advisor GPU-start race in all four
public load paths (round 30 P1 #7-#10).
  * Add a _PUBLIC_LOAD_PENDING_COUNT counter in
    utils/datasets/llm_assist.py, published under
    _HELPER_ADVISOR_START_LOCK by _raise_if_helper_advisor_busy and
    cleared by a paired _clear_public_load_window in
    routes/inference.py. A concurrent helper / advisor start now
    sees public_load_pending() inside _gpu_workload_busy_for_helper
    and refuses VRAM until the public load attempt finishes,
    closing the window between the busy snapshot and the public
    load flipping its public ownership flags (is_loaded,
    current_checkpoint, is_training_active, etc.).
  * Wire the paired clear into all five call sites (GGUF chat,
    safetensors chat, diffusion image load, training start, export
    load-checkpoint). The chat path tracks the published tag in a
    local so the finally clears the same counter on either branch
    or on early HTTPException.

Security: gate /api/inference/images/load against arbitrary
local-path probes (round 30 P1 #4). Mirror the chat
/api/inference/load native_path_lease boundary so an authenticated
session cannot use repo_id or base_repo as a directory probe.
  * Add native_path_lease + base_repo_native_path_lease to
    DiffusionLoadRequest (optional; Hub ids skip the lease).
  * Add _looks_like_local_diffusion_path + a
    _resolve_diffusion_repo_for_request helper that requires a
    verified directory-typed native path grant for any value that
    starts with /, ~, ./, ../, contains a backslash, or expands to
    an absolute path. The detector deliberately avoids Path.exists
    so the route does not side-channel filesystem layout via
    differential error messages.

Frontend: split the Images page status fetch from the spinner
toggle (round 30 P2 #12). The mount effect and the is_loading
auto-poll now call a setState-free fetchAndUpdateStatus; the
user-driven Refresh button still calls refreshStatus to flip the
spinner. Cleaner separation than the queueMicrotask shim from the
prior commit; the eslint react-hooks/set-state-in-effect rule is
not in the studio-frontend-ci typecheck gate, and the codebase
already has hundreds of pre-existing violations of the same rule.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor). Frontend
typecheck passes.
danielhanchen added a commit that referenced this pull request May 25, 2026
Two universal-consensus round-31 reviewer findings.

Concurrency: /images/load was leaking the public-load pending
counter on any pre-finally HTTPException (round 31 P1 #1, 11/12
votes). _raise_if_helper_advisor_busy("diffusion") published the
counter, then _resolve_diffusion_repo_for_request ran outside the
clearing try/finally. A request like repo_id="/tmp/model" with no
native_path_lease returned 400 and left public_load_pending() true
until process restart, permanently blocking AI Assist. Fix mirrors
the training / export pattern: track diffusion_load_window_published
in an outer try, publish the flag right after the helper-busy
check succeeds, and clear in an outer finally that only fires when
the flag is set. This also closes round 31 P1 #6: a second
request's failure can no longer decrement a still-active first
request's counter, because the second request has not yet flipped
its own publish flag.

Security: _looks_like_local_diffusion_path missed cwd-relative
directories (round 31 P1 #2, 8/12 votes). DiffusionBackend.
load_model accepts repo_id="exports/my-flux" as a local directory
via Path(repo_id).expanduser().is_dir(), but the detector only
flagged values starting with /, ~, ./, ../, backslash, or
absolute. Tightened the detector to also reject:
  * weight-file suffixes (.gguf / .safetensors / .bin / .pt / .pth)
  * non-2-segment values (`owner`, `a/b/c`, `owner/`, `/repo`, `//`)
  * 2-segment values whose parts are `.` or `..`
  * 2-segment values that actually resolve to an existing local
    path under backend CWD (last-resort exists() probe).
The existence probe is a minor side-channel for an already-
authenticated caller, accepted in exchange for closing the silent
bypass of the new lease boundary. Valid Hub ids like
unsloth/FLUX.2-klein-base-4B-GGUF, microsoft/Phi-3.5-mini-instruct
still pass through unchanged.

Skipped (consistent with prior rounds):
  * R31 P1 #3 (Tauri / native lease enum missing
    `load-diffusion-model` op): architectural surface; defer until
    the Images page actually surfaces a local-path picker.
  * R31 P1 #4-#5, #8: studio.txt / constraints.txt / pyproject hub
    pins. Live B200 install path with huggingface_hub==0.36.2,
    transformers==4.57.6, diffusers==0.37.1 imports
    Flux2KleinPipeline cleanly. The is_offline_mode import error
    only triggers when transformers 5.x is paired with hub 0.x,
    which the constraints pin prevents.
  * R31 P1 #7 (find_spec vs real import): a full transformers
    import at module load breaks tests that stub huggingface_hub;
    find_spec is the existing tradeoff.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor).
danielhanchen added a commit that referenced this pull request May 25, 2026
Three round-32 reviewer findings, plus documentation cleanup for
the local-path Tauri/FE plumbing gap.

Concurrency: direct DiffusionBackend.load_model callers now publish
the helper/advisor pending marker symmetrically (round 32 P1 #3).
_raise_if_helper_advisor_busy_for_diffusion gains an optional
publish_pending flag; load_model passes True so the destructive
unload window is gated by a "diffusion-backend" tag published
under _HELPER_ADVISOR_START_LOCK. The route layer's "diffusion"
tag and the backend's "diffusion-backend" tag refcount
independently (sum > 0 still blocks helper starts), so neither
side's clear can erase the other's still-active marker. The
existing _release_chat_backend_for_diffusion(check_helper_advisor=
True) path stays snapshot-only (publish_pending defaults False) so
test / direct callers of that helper do not leak a counter.

Validation: export save_directory now rejects ALL ASCII control
characters (round 32 P1, save_directory tab finding). The earlier
CR / LF only guard missed TAB / VT / FF / DEL, which a caller
could smuggle past the export worker's logged subprocess argv.

Documentation: DiffusionLoadRequest.repo_id and base_repo updated
to reflect that local-path support is gated on a Tauri /
frontend load-diffusion-model directory lease producer that has
not shipped yet (round 32 P1 #1 from multiple reviewers). The
backend lease boundary is correct; what is missing is the FE /
native side that mints the matching grant. Until that lands,
local paths through the Images route always 400 with "Native
path grant is required", which the docstring now spells out.

Skipped (consistent with prior rounds):
  * Hub-pin findings (R32 P1 #4-#6): live B200 install with
    huggingface_hub==0.36.2 + transformers==4.57.6 + diffusers==
    0.37.1 verifiably imports Flux2KleinPipeline. Empirical
    justification documented in R30 / R30 follow-up commit msgs.
  * Tauri / native enum surgery (R32 P1 #1, 6 votes): real
    architectural work but out of scope for this PR's Python
    surface. Documented now; FE / Rust ticket to follow.

98 targeted backend tests pass (test_diffusion_routes,
test_diffusion_backend, test_inference_model_validation,
test_models_get_model_config_case_resolution, test_data_recipe_seed,
test_training_raw_support, test_export_log_cursor).
danielhanchen added a commit that referenced this pull request May 25, 2026
Two round-33 reviewer findings: hub-floor consistency and the
multipart upload filename validator gap.

Dependencies: reverted the round-26 huggingface_hub>=1.3.0 floor
in no-torch-runtime.txt and pyproject.toml (round 33 P1 #1-#5,
4/12 vote consensus). studio.txt forces huggingface_hub==0.36.2
to match the transformers==4.57.6 pin in extras-no-deps.txt, so
the 1.3.0 floor was internally inconsistent. Reviewers
reproduced the resolver conflict on a fresh install.

Empirical justification (re-verified on the live B200 host before
the revert): huggingface_hub 0.36.2 + transformers 4.57.6 +
diffusers 0.37.1 imports Flux2KleinPipeline cleanly and runs
end-to-end image generation. transformers 4.57.6 carries its own
transformers.utils.hub.is_offline_mode and does not actually need
huggingface_hub.is_offline_mode at import time. The original bump
was guarding against the (never-realised) transformers 5.x path,
which extras-no-deps explicitly pins away.

Validation: multipart /seed/upload-unstructured-file now applies
the same _no_control_chars and _reject_embedded_hf_token checks
to file.filename that SeedInspectUploadRequest.filename already
applies in the JSON variant (round 33 P1 #7). The filename is
reflected back to the client, persisted in the per-file meta
JSON, and echoed by error responses, so the JSON-side hardening
must not be asymmetric with the multipart path.

Skipped (consistent with prior rounds):
  * Find_spec vs full import (R33 P1 #6): preserves test
    compatibility with the huggingface_hub stub fixture.
  * React hooks set-state-in-effect lint (R33 P1 #8): codebase
    has 146 pre-existing violations of the same rule;
    studio-frontend-ci does not gate on lint.
  * Direct DiffusionBackend.load_model bypass (R33 P1 #9): the
    route is the only production entry point, and the backend
    helper now publishes its own diffusion-backend pending tag
    (round 32 P1 #3). Direct-caller hardening would require
    duplicating the lease check into load_model itself, which
    is out of scope for the route-layer security boundary.
  * One-segment Hub IDs (R33 P2 #10): strict 2-segment Hub id
    check is intentional; one-segment names are not valid Hub
    ids.
  * Cwd-relative shadow of Hub IDs (R33 P2 #11): documented
    side-channel tradeoff accepted in round 31 commit msg.

97 targeted backend tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants