[CLI] Support hf:// URIs in hf upload and hf download#4297
Conversation
`hf upload` and `hf download` now accept a single `hf://` URI as repo id, reading the repo type, revision and path from the URI. Passing `--repo-type` or `--revision` alongside a URI is rejected, and bucket URIs are not supported by these commands. Plain repo ids keep working as before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| # Same as `RepoTypeOpt` but optional (defaults to `None` rather than `model`). Used by commands that | ||
| # accept an `hf://` URI as repo id: a `None` default lets us tell apart "user did not pass --repo-type" | ||
| # from "user explicitly passed --repo-type model", which is required to detect conflicts with the URI. | ||
| RepoTypeOptionalOpt = Annotated[ |
There was a problem hiding this comment.
this part I'll need to double-check (likely that we don't need RepoTypeOpt to default to "model" in the first place
There was a problem hiding this comment.
reverted in 9bf05aa, too complex to get it working without changing too many places
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
and this one more specifically @mishig25 |
|
Nice call rejecting |
Branch on the `hf://` prefix instead of `is_hf_uri()` so a malformed URI surfaces a precise HfUriError (already formatted in cli/_errors.py) rather than silently falling through to the plain-repo-id path and failing later. Also preserve a trailing '/' from the URI path so `hf download` routes folder URIs through the subfolder download code path (e.g. `.../data/` -> `data/**`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| # `repo_id` may be a plain repo id or an `hf://` URI (e.g. `hf://datasets/my-org/my-dataset@v1.0/data/`). | ||
| # When a URI is provided, it is authoritative for the repo type, revision and (optionally) path in repo, | ||
| # so explicit `--repo-type` / `--revision` options are forbidden alongside it. | ||
| # We branch on the `hf://` prefix (the user's *intent*) rather than on whether the string parses as a | ||
| # valid URI: a malformed URI then surfaces a precise `HfUriError` (formatted globally in `cli/_errors.py`) | ||
| # instead of silently falling through to the plain-repo-id path and failing later with an opaque error. | ||
| if repo_id.startswith(constants.HF_PROTOCOL): | ||
| if repo_type is not None: | ||
| raise CLIError(f"'--repo-type' cannot be used with an 'hf://' URI ('{repo_id}').") | ||
| if revision is not None: | ||
| raise CLIError(f"'--revision' cannot be used with an 'hf://' URI ('{repo_id}').") | ||
| uri = parse_hf_uri(repo_id) | ||
| if uri.is_bucket: | ||
| raise CLIError("Buckets are not supported by `hf upload`. Use `hf sync` instead.") | ||
| repo_id, repo_type_str, revision = uri.id, uri.type, uri.revision | ||
| if uri.path_in_repo: | ||
| if path_in_repo is not None: | ||
| raise CLIError( | ||
| f"Cannot combine a path in the hf:// URI ('{uri.path_in_repo}') with the `path_in_repo` argument ('{path_in_repo}')." | ||
| ) | ||
| path_in_repo = uri.path_in_repo | ||
| else: | ||
| repo_type_str = (repo_type or RepoType.model).value |
There was a problem hiding this comment.
this logic is duplicated between upload and download on purpose. I first tested a factorized version but it was bloating the code / making things more complex than what they should. Hence why it's like this now. Happy to revisit in the future if it was a bad decision
|
This PR has been shipped as part of the v1.19.0 release. |
This PR lets
hf uploadandhf downloadaccept a singlehf://URI in place of the positional repo id. The URI follows the usualhf://[<TYPE>/]<ID>[@<REVISION>][/<PATH>]grammar, so the repo type, revision and (optionally) the file path are all read straight from it. This is mostly a convenience to avoid having to pass--repo-type/--revision/ filename arguments.When a URI is provided, it is the single source of truth: passing
--repo-typeor--revisionon top of it raises an error. A path embedded in the URI can't be combined with positional filenames (download) or thepath_in_repoargument (upload) either.No breaking changes expected.
Examples
The download/upload guides and the CLI reference are updated accordingly.
Note
Low Risk
CLI-only convenience layer on existing download/upload APIs with explicit validation; default behavior for plain repo IDs is unchanged.
Overview
hf downloadandhf uploadnow accept anhf://URI as the repo argument, encoding repo type, optional revision, and optional path in one string (e.g.hf://datasets/org/repo@branch/data/file.parquet).When the argument starts with
hf://,parse_hf_uridrives routing: single-file vs snapshot vs subfolder (trailing/preserved for downloads).--repo-typeand--revisionare rejected if set alongside a URI; embedded paths cannot be combined with extra positional filenames (download) orpath_in_repo(upload). Bucket URIs error with a pointer tohf sync.RepoTypeOptionalOptreplaces the defaultmodelrepo type on these commands so “no--repo-type” is distinguishable from an explicit--repo-type model, which matters for URI conflict checks. Plain repo IDs still default to models when--repo-typeis omitted.Docs (CLI, download, upload guides and CLI reference) and tests cover URI examples, validation, and malformed
hf://handling viaHfUriError.Reviewed by Cursor Bugbot for commit 4ca3210. Bugbot is set up for automated code reviews on this repo. Configure here.