Skip to content

[Jobs] Decouple Job hardware from Spaces, auto-sync enums with Hub API#4266

Merged
Wauplin merged 14 commits into
mainfrom
decouple-job-hardware-from-spaces
May 27, 2026
Merged

[Jobs] Decouple Job hardware from Spaces, auto-sync enums with Hub API#4266
Wauplin merged 14 commits into
mainfrom
decouple-job-hardware-from-spaces

Conversation

@Wauplin

@Wauplin Wauplin commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Jobs hardware was previously tied to SpaceHardware — the same enum, same values, same code paths. As the two diverge (e.g. Jobs don't support ZeroGPU, Spaces don't have the same GPU catalog), this coupling causes friction in the UX with badly documented commands.

In addition, the hardcoded enum means every infra change requires a manual PR like #4259.

This PR decouples them and makes future hardware updates fully automated:

  • Own enum for JobsJobHardware(str, Enum) in _jobs_api.py, independent of SpaceHardware. The existing JobHardware dataclass (return type of list_jobs_hardware()) is renamed to JobHardwareInfo to free the name. ⚠️ this is a breaking change but perfectly fine IMO
  • Future-proof CLI — new SoftChoice(click.Choice) type that shows known values for docs/autocomplete but accepts any string, so older CLI versions don't reject new server-side flavors. Used for --flavor in jobs, spaces, and repos CLIs.
  • Automated syncutils/check_hardware_flavors.py fetches the live hardware catalog from the Hub API and patches both enums. New flavors are appended; removed ones get a # legacy comment (kept for backward compat). A daily CI workflow (.github/workflows/update-hardware-flavors.yaml) runs the script and opens a PR via huggingface-hub-bot when something changes.
  • Docs — the hardcoded hardware list in docs/source/en/guides/jobs.md is replaced with a pointer to hf jobs hardware / list_jobs_hardware() so it stays current automatically.
$ hf jobs run --flavor future-gpu-x99 python:3.12 echo "works with unknown flavors"
Job started with ID: ...
>>> from huggingface_hub import JobHardware, JobHardwareInfo
>>> JobHardware.CPU_BASIC
<JobHardware.CPU_BASIC: 'cpu-basic'>
>>> from huggingface_hub import list_jobs_hardware
>>> list_jobs_hardware()[0]
JobHardwareInfo(name='cpu-basic', pretty_name='CPU Basic', ...)
$ python utils/check_hardware_flavors.py
Fetching hardware flavors from the Hub API...
  Spaces: 29 flavors
  Jobs:   28 flavors

✅ All good! Hardware enums are up to date.

🤖 Generated with Claude Code


Note

Medium Risk
Breaking rename of JobHardware dataclass to JobHardwareInfo and enum/catalog splits affect public API and CLI flavor validation; automated enum edits need human review before merge.

Overview
Jobs hardware is decoupled from Spaces: a dedicated JobHardware enum drives job APIs and types, while SpaceHardware is trimmed to the Spaces catalog. The old JobHardware dataclass from list_jobs_hardware() is renamed to JobHardwareInfo (breaking rename for consumers who imported the dataclass).

CLI and docs use new SoftChoice for --flavor / --hardware so help lists known values but unknown Hub flavors still pass through. Jobs guide hardware is an auto-generated table plus pointers to hf jobs hardware / list_jobs_hardware().

Automation: utils/check_hardware_flavors.py syncs both enums (and the jobs doc table) from the live Hub API—new flavors appended, removed ones kept with # legacy. A daily GitHub Actions workflow runs --update, make style, and opens a bot PR when anything drifts.

Reviewed by Cursor Bugbot for commit 26fc3b2. Bugbot is set up for automated code reviews on this repo. Configure here.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bot-ci-comment

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread utils/check_hardware_flavors.py Outdated
Comment thread .github/workflows/update-hardware-flavors.yaml Fixed
@Wauplin Wauplin marked this pull request as draft May 26, 2026 14:13

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this one I reviewed the code quickly
I mainly focused on testing it locally with all sorts of situation (adding 1, removing 1, adding multiples while removing multiple, etc.). Always worked correctly so I think "it's fine"

@Wauplin Wauplin requested a review from hanouticelina May 26, 2026 14:32
@Wauplin Wauplin marked this pull request as ready for review May 26, 2026 14:33
github-actions[bot]

This comment was marked as low quality.

Comment thread src/huggingface_hub/hf_api.py

@hanouticelina hanouticelina left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

env: dict[str, Any] | None,
secrets: dict[str, Any] | None,
flavor: SpaceHardware | None,
flavor: JobHardware | str | None,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str is redundant I think, JobHardware already inherits from str

Suggested change
flavor: JobHardware | str | None,
flavor: JobHardware | None,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can do flavor="something" where "something" is not a JobHardware so type should reflect that. We could do flavor: str | None, instead since as you said JobHardware is a str but not the other way around. Though I still prefer to be explicit on JobHardware for self-documentation purposes

"arguments": [],
"environment": env or {},
"flavor": flavor or SpaceHardware.CPU_BASIC,
"flavor": flavor or JobHardware.CPU_BASIC,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we delegate the default to the server instead?

@Wauplin Wauplin May 27, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could but at the moment flavor is required by the server otherwise you get a

Bad request:
* Invalid option: expected one of "cpu-basic"|"cpu-upgrade"|"cpu-performance"|"cpu-xl"|"sprx8"|"zero-a10g"|"t4-small"|"t4-medium"|"l4x1"|"l4x4"|"l40sx1"|"l40sx4"|"l40sx8"|"a10g-small"|"a10g-large"|"a10g-largex2"|"a10g-largex4"|"a100-large"|"a100x4"|"a100x8"|"h200"|"h200x2"|"h200x4"|"h200x8"|"rtx-pro-6000"|"rtx-pro-6000x2"|"rtx-pro-6000x4"|"rtx-pro-6000x8"|"inf2x6" * at flavor

(just tested it myself)

Comment thread src/huggingface_hub/cli/_cli_utils.py
Comment thread utils/check_hardware_flavors.py Outdated
on:
workflow_dispatch:
schedule:
- cron: "0 4 * * *" # Every day at 4am

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe once a week is enough no? no strong opinion

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinact found unpinned actions in this repo.

3 inline suggestion(s) below — click Apply suggestion on each.

Comment thread .github/workflows/update-hardware-flavors.yaml Outdated
Comment thread .github/workflows/update-hardware-flavors.yaml Outdated
Comment thread .github/workflows/update-hardware-flavors.yaml Outdated
Comment thread utils/check_hardware_flavors.py
Comment on lines -211 to -213
- CPU: `cpu-basic`, `cpu-upgrade`
- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only drawback is we're losing a bit of SEO/built-in info for agents (w/o invocation of a script)

maybe we can maintain a list in the same way we maintain the docstring? (in addition to documenting the python code to get it)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea
pushed 26fc3b2 to address it. Now the output of hf jobs hardware is listed in the docs with hardware + cost

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Co-authored-by: célina <hanouticelina@gmail.com>
Comment thread utils/check_hardware_flavors.py
Comment thread utils/check_hardware_flavors.py
@Wauplin Wauplin requested a review from hanouticelina May 27, 2026 14:16
@Wauplin Wauplin requested a review from julien-c May 27, 2026 14:18

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 26fc3b2. Configure here.

CPU_UPGRADE = "cpu-upgrade"
CPU_PERFORMANCE = "cpu-performance"
CPU_XL = "cpu-xl"
SPRX8 = "sprx8"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpaceHardware members deleted instead of marked legacy

Medium Severity

Twelve SpaceHardware enum members (including CPU_PERFORMANCE, CPU_XL, SPRX8, H200, H200X2, H200X4, H200X8, RTX_PRO_6000 variants, and INF2X6) are outright deleted. The companion script check_hardware_flavors.py is designed to keep removed flavors with a # legacy comment for backward compatibility, but the manual changes in this PR don't follow that pattern. Any downstream code referencing e.g. SpaceHardware.H200 will raise AttributeError.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 26fc3b2. Configure here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's on purpose yes

@hanouticelina hanouticelina left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good, thank you!

> For comprehensive guidance on running model training jobs with TRL on Hugging Face infrastructure, check out the [TRL Jobs Training documentation](https://huggingface.co/docs/trl/main/en/jobs_training). It covers fine-tuning recipes, hardware selection, and best practices for training models efficiently.

Available `flavor` options:
Here is the full list of available hardware to run Jobs:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@Wauplin Wauplin merged commit ffc07b6 into main May 27, 2026
26 checks passed
@Wauplin Wauplin deleted the decouple-job-hardware-from-spaces branch May 27, 2026 14:48
@huggingface-hub-bot

Copy link
Copy Markdown
Contributor

This PR has been shipped as part of the v1.17.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants