[Jobs] Decouple Job hardware from Spaces, auto-sync enums with Hub API#4266
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
for this one I reviewed the code quickly
I mainly focused on testing it locally with all sorts of situation (adding 1, removing 1, adding multiples while removing multiple, etc.). Always worked correctly so I think "it's fine"
| env: dict[str, Any] | None, | ||
| secrets: dict[str, Any] | None, | ||
| flavor: SpaceHardware | None, | ||
| flavor: JobHardware | str | None, |
There was a problem hiding this comment.
str is redundant I think, JobHardware already inherits from str
| flavor: JobHardware | str | None, | |
| flavor: JobHardware | None, |
There was a problem hiding this comment.
Here we can do flavor="something" where "something" is not a JobHardware so type should reflect that. We could do flavor: str | None, instead since as you said JobHardware is a str but not the other way around. Though I still prefer to be explicit on JobHardware for self-documentation purposes
| "arguments": [], | ||
| "environment": env or {}, | ||
| "flavor": flavor or SpaceHardware.CPU_BASIC, | ||
| "flavor": flavor or JobHardware.CPU_BASIC, |
There was a problem hiding this comment.
should we delegate the default to the server instead?
There was a problem hiding this comment.
we could but at the moment flavor is required by the server otherwise you get a
Bad request:
* Invalid option: expected one of "cpu-basic"|"cpu-upgrade"|"cpu-performance"|"cpu-xl"|"sprx8"|"zero-a10g"|"t4-small"|"t4-medium"|"l4x1"|"l4x4"|"l40sx1"|"l40sx4"|"l40sx8"|"a10g-small"|"a10g-large"|"a10g-largex2"|"a10g-largex4"|"a100-large"|"a100x4"|"a100x8"|"h200"|"h200x2"|"h200x4"|"h200x8"|"rtx-pro-6000"|"rtx-pro-6000x2"|"rtx-pro-6000x4"|"rtx-pro-6000x8"|"inf2x6" * at flavor
(just tested it myself)
| on: | ||
| workflow_dispatch: | ||
| schedule: | ||
| - cron: "0 4 * * *" # Every day at 4am |
There was a problem hiding this comment.
maybe once a week is enough no? no strong opinion
| - CPU: `cpu-basic`, `cpu-upgrade` | ||
| - GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large` | ||
| - TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4` |
There was a problem hiding this comment.
only drawback is we're losing a bit of SEO/built-in info for agents (w/o invocation of a script)
maybe we can maintain a list in the same way we maintain the docstring? (in addition to documenting the python code to get it)
There was a problem hiding this comment.
good idea
pushed 26fc3b2 to address it. Now the output of hf jobs hardware is listed in the docs with hardware + cost
Co-authored-by: célina <hanouticelina@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 26fc3b2. Configure here.
| CPU_UPGRADE = "cpu-upgrade" | ||
| CPU_PERFORMANCE = "cpu-performance" | ||
| CPU_XL = "cpu-xl" | ||
| SPRX8 = "sprx8" |
There was a problem hiding this comment.
SpaceHardware members deleted instead of marked legacy
Medium Severity
Twelve SpaceHardware enum members (including CPU_PERFORMANCE, CPU_XL, SPRX8, H200, H200X2, H200X4, H200X8, RTX_PRO_6000 variants, and INF2X6) are outright deleted. The companion script check_hardware_flavors.py is designed to keep removed flavors with a # legacy comment for backward compatibility, but the manual changes in this PR don't follow that pattern. Any downstream code referencing e.g. SpaceHardware.H200 will raise AttributeError.
Reviewed by Cursor Bugbot for commit 26fc3b2. Configure here.
There was a problem hiding this comment.
that's on purpose yes
hanouticelina
left a comment
There was a problem hiding this comment.
all good, thank you!
| > For comprehensive guidance on running model training jobs with TRL on Hugging Face infrastructure, check out the [TRL Jobs Training documentation](https://huggingface.co/docs/trl/main/en/jobs_training). It covers fine-tuning recipes, hardware selection, and best practices for training models efficiently. | ||
|
|
||
| Available `flavor` options: | ||
| Here is the full list of available hardware to run Jobs: |
|
This PR has been shipped as part of the v1.17.0 release. |



Summary
Jobs hardware was previously tied to
SpaceHardware— the same enum, same values, same code paths. As the two diverge (e.g. Jobs don't support ZeroGPU, Spaces don't have the same GPU catalog), this coupling causes friction in the UX with badly documented commands.In addition, the hardcoded enum means every infra change requires a manual PR like #4259.
This PR decouples them and makes future hardware updates fully automated:
JobHardware(str, Enum)in_jobs_api.py, independent ofSpaceHardware. The existingJobHardwaredataclass (return type oflist_jobs_hardware()) is renamed toJobHardwareInfoto free the name.SoftChoice(click.Choice)type that shows known values for docs/autocomplete but accepts any string, so older CLI versions don't reject new server-side flavors. Used for--flavorin jobs, spaces, and repos CLIs.utils/check_hardware_flavors.pyfetches the live hardware catalog from the Hub API and patches both enums. New flavors are appended; removed ones get a# legacycomment (kept for backward compat). A daily CI workflow (.github/workflows/update-hardware-flavors.yaml) runs the script and opens a PR viahuggingface-hub-botwhen something changes.docs/source/en/guides/jobs.mdis replaced with a pointer tohf jobs hardware/list_jobs_hardware()so it stays current automatically.🤖 Generated with Claude Code
Note
Medium Risk
Breaking rename of
JobHardwaredataclass toJobHardwareInfoand enum/catalog splits affect public API and CLI flavor validation; automated enum edits need human review before merge.Overview
Jobs hardware is decoupled from Spaces: a dedicated
JobHardwareenum drives job APIs and types, whileSpaceHardwareis trimmed to the Spaces catalog. The oldJobHardwaredataclass fromlist_jobs_hardware()is renamed toJobHardwareInfo(breaking rename for consumers who imported the dataclass).CLI and docs use new
SoftChoicefor--flavor/--hardwareso help lists known values but unknown Hub flavors still pass through. Jobs guide hardware is an auto-generated table plus pointers tohf jobs hardware/list_jobs_hardware().Automation:
utils/check_hardware_flavors.pysyncs both enums (and the jobs doc table) from the live Hub API—new flavors appended, removed ones kept with# legacy. A daily GitHub Actions workflow runs--update,make style, and opens a bot PR when anything drifts.Reviewed by Cursor Bugbot for commit 26fc3b2. Bugbot is set up for automated code reviews on this repo. Configure here.