Skip to content

[CLI] Support hf:// URIs in hf upload and hf download#4297

Merged
Wauplin merged 8 commits into
mainfrom
cli-support-hf-uri-upload-download
Jun 5, 2026
Merged

[CLI] Support hf:// URIs in hf upload and hf download#4297
Wauplin merged 8 commits into
mainfrom
cli-support-hf-uri-upload-download

Conversation

@Wauplin

@Wauplin Wauplin commented May 29, 2026

Copy link
Copy Markdown
Collaborator

This PR lets hf upload and hf download accept a single hf:// URI in place of the positional repo id. The URI follows the usual hf://[<TYPE>/]<ID>[@<REVISION>][/<PATH>] grammar, so the repo type, revision and (optionally) the file path are all read straight from it. This is mostly a convenience to avoid having to pass --repo-type / --revision / filename arguments.

When a URI is provided, it is the single source of truth: passing --repo-type or --revision on top of it raises an error. A path embedded in the URI can't be combined with positional filenames (download) or the path_in_repo argument (upload) either.

No breaking changes expected.

Examples

# Download a single file from a dataset at a given revision
hf download hf://datasets/HuggingFaceM4/FineVision@refs/pr/1/data/train.parquet

# Download an entire repo
hf download hf://datasets/google/fleurs

# Upload a single file to a dataset on a specific branch
hf upload hf://datasets/Wauplin/my-cool-dataset@my-branch/data/train.csv ./train.csv
# Trying to also pass --repo-type / --revision is rejected
$ hf download hf://datasets/google/fleurs --revision main
Error: '--revision' cannot be used with an 'hf://' URI ('hf://datasets/google/fleurs').

The download/upload guides and the CLI reference are updated accordingly.


Note

Low Risk
CLI-only convenience layer on existing download/upload APIs with explicit validation; default behavior for plain repo IDs is unchanged.

Overview
hf download and hf upload now accept an hf:// URI as the repo argument, encoding repo type, optional revision, and optional path in one string (e.g. hf://datasets/org/repo@branch/data/file.parquet).

When the argument starts with hf://, parse_hf_uri drives routing: single-file vs snapshot vs subfolder (trailing / preserved for downloads). --repo-type and --revision are rejected if set alongside a URI; embedded paths cannot be combined with extra positional filenames (download) or path_in_repo (upload). Bucket URIs error with a pointer to hf sync.

RepoTypeOptionalOpt replaces the default model repo type on these commands so “no --repo-type” is distinguishable from an explicit --repo-type model, which matters for URI conflict checks. Plain repo IDs still default to models when --repo-type is omitted.

Docs (CLI, download, upload guides and CLI reference) and tests cover URI examples, validation, and malformed hf:// handling via HfUriError.

Reviewed by Cursor Bugbot for commit 4ca3210. Bugbot is set up for automated code reviews on this repo. Configure here.

`hf upload` and `hf download` now accept a single `hf://` URI as repo id,
reading the repo type, revision and path from the URI. Passing `--repo-type`
or `--revision` alongside a URI is rejected, and bucket URIs are not supported
by these commands. Plain repo ids keep working as before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Same as `RepoTypeOpt` but optional (defaults to `None` rather than `model`). Used by commands that
# accept an `hf://` URI as repo id: a `None` default lets us tell apart "user did not pass --repo-type"
# from "user explicitly passed --repo-type model", which is required to detect conflicts with the URI.
RepoTypeOptionalOpt = Annotated[

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part I'll need to double-check (likely that we don't need RepoTypeOpt to default to "model" in the first place

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed it 3071ec7

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted in 9bf05aa, too complex to get it working without changing too many places

@bot-ci-comment

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@julien-c

Copy link
Copy Markdown
Member

and this one more specifically @mishig25

@julien-c

julien-c commented May 30, 2026

Copy link
Copy Markdown
Member

Nice call rejecting --repo-type/--revision when passing a URI

Branch on the `hf://` prefix instead of `is_hf_uri()` so a malformed URI
surfaces a precise HfUriError (already formatted in cli/_errors.py) rather
than silently falling through to the plain-repo-id path and failing later.

Also preserve a trailing '/' from the URI path so `hf download` routes folder
URIs through the subfolder download code path (e.g. `.../data/` -> `data/**`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines +153 to +175
# `repo_id` may be a plain repo id or an `hf://` URI (e.g. `hf://datasets/my-org/my-dataset@v1.0/data/`).
# When a URI is provided, it is authoritative for the repo type, revision and (optionally) path in repo,
# so explicit `--repo-type` / `--revision` options are forbidden alongside it.
# We branch on the `hf://` prefix (the user's *intent*) rather than on whether the string parses as a
# valid URI: a malformed URI then surfaces a precise `HfUriError` (formatted globally in `cli/_errors.py`)
# instead of silently falling through to the plain-repo-id path and failing later with an opaque error.
if repo_id.startswith(constants.HF_PROTOCOL):
if repo_type is not None:
raise CLIError(f"'--repo-type' cannot be used with an 'hf://' URI ('{repo_id}').")
if revision is not None:
raise CLIError(f"'--revision' cannot be used with an 'hf://' URI ('{repo_id}').")
uri = parse_hf_uri(repo_id)
if uri.is_bucket:
raise CLIError("Buckets are not supported by `hf upload`. Use `hf sync` instead.")
repo_id, repo_type_str, revision = uri.id, uri.type, uri.revision
if uri.path_in_repo:
if path_in_repo is not None:
raise CLIError(
f"Cannot combine a path in the hf:// URI ('{uri.path_in_repo}') with the `path_in_repo` argument ('{path_in_repo}')."
)
path_in_repo = uri.path_in_repo
else:
repo_type_str = (repo_type or RepoType.model).value

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is duplicated between upload and download on purpose. I first tested a factorized version but it was bloating the code / making things more complex than what they should. Hence why it's like this now. Happy to revisit in the future if it was a bad decision

@Wauplin Wauplin requested a review from hanouticelina June 1, 2026 16:15
@Wauplin Wauplin marked this pull request as ready for review June 1, 2026 16:15
Wauplin and others added 6 commits June 2, 2026 10:57
Reverts:
- 52efb10 better
- 3b347bdf good
- 4ad8936 a
- 3071ec7 remove RepoTypeOptionalOpt in favor of RepoTypeOpt

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@hanouticelina hanouticelina left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect!

@Wauplin Wauplin merged commit 9fd9bce into main Jun 5, 2026
26 checks passed
@Wauplin Wauplin deleted the cli-support-hf-uri-upload-download branch June 5, 2026 11:23
@huggingface-hub-bot

Copy link
Copy Markdown
Contributor

This PR has been shipped as part of the v1.19.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants