Skip to content

[Core] Migrate hf:// URI parsing to centralized parse_hf_uri#4189

Merged
Wauplin merged 33 commits into
mainfrom
use-parse-hf-uri
May 19, 2026
Merged

[Core] Migrate hf:// URI parsing to centralized parse_hf_uri#4189
Wauplin merged 33 commits into
mainfrom
use-parse-hf-uri

Conversation

@Wauplin

@Wauplin Wauplin commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #4158. Replaces all scattered ad-hoc hf:// URI parsers with the centralized parse_hf_uri/parse_hf_mount helpers.

Not touched (as mentioned in #4158): repo_type_and_id_from_hf_id and RepoUrl — these are mixed parsers accepting both hf:// URIs and https://huggingface.co/... URLs and need a separate migration plan.

Breaking changes

  1. BucketUrl.handleBucketUrl.uri — Attribute renamed and type changed from str to HfUri. Any code accessing bucket_url.handle must switch to bucket_url.uri (use .to_uri() for the string form).

  2. Volume.to_hf_handle()Volume.to_uri() — Method renamed. Any code calling .to_hf_handle() must switch to .to_uri().

  3. Single-segment repo IDs no longer supported in HfFileSystem — Paths like gpt2, gpt2/file.txt, or gpt2@dev/file.txt no longer resolve. Repo IDs must use the namespace/name format (e.g. username/gpt2).

  4. Single-segment repo IDs rejected in CLI -v flagshf://gpt2:/data is no longer accepted. Must use hf://namespace/name:/mount form.

Improvements

  • @ in filenames is now treated as literal (e.g. hf://a/b/file@v1.txt parses correctly instead of erroring).
  • HfUriError is now caught by the CLI error handler and shows a clean message instead of a raw traceback.
  • New is_hf_uri public helper for validating hf:// URIs.

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk because this is a cross-cutting refactor of URI parsing that introduces breaking API/CLI behavior changes (renamed fields/methods and stricter URI formats), which could impact downstream integrations and edge-case path resolution.

Overview
Centralizes all hf:// parsing on parse_hf_uri/parse_hf_mount and removes multiple ad-hoc parsers across buckets, copy helpers, CLI volume parsing, and HfFileSystem.

Introduces breaking API/CLI changes: BucketUrl.handle is replaced by BucketUrl.uri (HfUri), Volume.to_hf_handle() is renamed to to_uri(), and CLI output/docs/examples are updated from “handle” wording to canonical “URI” formatting.

Tightens URI validation/semantics: HfFileSystem and CLI -v/--volume now reject single-segment repo IDs (must be namespace/name), CLI errors now format HfUriError cleanly, and URI parsing is improved to treat @ in filenames as literal (while still supporting revision markers where valid).

Reviewed by Cursor Bugbot for commit a471279. Bugbot is set up for automated code reviews on this repo. Configure here.

Replace scattered ad-hoc hf:// URI parsers with the centralized
`parse_hf_uri`/`parse_hf_mount` helpers introduced in #4158.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Wauplin Wauplin marked this pull request as draft May 5, 2026 09:55
@bot-ci-comment

bot-ci-comment Bot commented May 5, 2026

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Add explicit type annotations for `repo_type` and `revision_in_path`
to satisfy mypy's narrowing across branches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/huggingface_hub/cli/_cli_utils.py
Comment thread src/huggingface_hub/_space_api.py
Comment thread src/huggingface_hub/cli/_cli_utils.py
Comment thread src/huggingface_hub/cli/buckets.py
Comment thread src/huggingface_hub/_buckets.py Outdated
return False


@functools.lru_cache

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added since some logic is now parsing multiple time the same URI (once to check if it's an URI, then to check if it's a bucket, then to reuse its parsed values, etc.). Easier to use a LRU cache rather than parsing the value once and passing it everywhere

namespace: str = field(init=False)
bucket_id: str = field(init=False)
handle: str = field(init=False)
uri: HfUri = field(init=False)

@Wauplin Wauplin May 5, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional breaking change

IMO ok to break as I don't expect anyone to use it at the moment (and it's nice to consistently use the 'uri' naming)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes makes sense!

Comment thread src/huggingface_hub/_space_api.py Outdated
revision = f"@{self.revision}" if self.revision else ""
ro = {True: ":ro", False: ":rw", None: ""}.get(self.read_only, "")
return f"hf://{self.type}s/{self.source}{revision}{path}:{self.mount_path}{ro}"
def to_hf_mount_uri(self) -> str:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional breaking change

Comment thread src/huggingface_hub/hf_file_system.py
Wauplin and others added 8 commits May 5, 2026 17:30
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `parsed` variable was reused across two branches with incompatible
types (HfUri vs None). Renamed to `parsed_or_none` with an explicit
`HfUri | None` annotation in the second branch to satisfy mypy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Wauplin Wauplin marked this pull request as ready for review May 6, 2026 12:59
@Wauplin Wauplin requested review from hanouticelina and lhoestq May 6, 2026 13:00
@Wauplin

Wauplin commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

PR should finally be ready for review! I tried to account for all possible use cases that were previously handled. This PR is introducing a few breaking changes (see PR description) that are intentional. Hopefully won't impact anyone 🤞 Parsing logic should be must cleaner now 😃

@lhoestq

lhoestq commented May 6, 2026

Copy link
Copy Markdown
Member

Could HfUri work like RepoUrl and be a string subclass ? This would avoid bucket.uri.to_uri() which I find a bit weird

@Wauplin

Wauplin commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

Could HfUri work like RepoUrl and be a string subclass ? This would avoid bucket.uri.to_uri() which I find a bit weird

I'd rather not no. RepoUrl is a str subclass for legacy reasons rather than a desired choice. What we can do is adding an alias method BucketUrl.to_uri() to avoid the BucketUrk.uri.to_uri().

The reason why I don't want to implement something like str(HfUri) is that in the future I'm planning to add a to_url() method to get the url on the Hub of any resource described by a URI (e.g. a repo, a folder in a bucket, a file in a specific revision of a dataset, etc.). That should help simplify some other scattered logic we have + easily print URLs in the CLI.

@lhoestq

lhoestq commented May 6, 2026

Copy link
Copy Markdown
Member

I see ! in that case I have a small pref for bucket.uri.to_string() but not a big deal

@Wauplin

Wauplin commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

I see ! in that case I have a small pref for bucket.uri.to_string() but not a big deal

hmm, but I want to be able to do bucket.uri.to_uri() and bucket.uri.to_url() actually

@hanouticelina hanouticelina left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for simplifying the logic here! the PR looks globally good, I left a couple of comments and I will take another look before merging.

up to you if you want to do it in another PR, but let's drop the local try/except ValueError blocks in cli/buckets.py now that the global handler exists, non-URI parse failures still get a reasonable error message.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the cleaning done here! ❤️

Comment thread src/huggingface_hub/utils/_hf_uris.py Outdated
Comment thread src/huggingface_hub/_buckets.py Outdated
namespace: str = field(init=False)
bucket_id: str = field(init=False)
handle: str = field(init=False)
uri: HfUri = field(init=False)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes makes sense!

Comment thread src/huggingface_hub/_buckets.py Outdated
Comment thread src/huggingface_hub/hf_api.py
Comment thread src/huggingface_hub/hf_api.py
Comment thread src/huggingface_hub/hf_file_system.py Outdated
@Wauplin

Wauplin commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

up to you if you want to do it in another PR, but let's drop the local try/except ValueError blocks in cli/buckets.py now that the global handler exists, non-URI parse failures still get a reasonable error message.

yes correct 👍 addressed in c843f32

@Wauplin

Wauplin commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

@hanouticelina Addressed all of your comments, thanks for the review!

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 19cdfc0. Configure here.

Comment thread src/huggingface_hub/hf_file_system.py

@hanouticelina hanouticelina left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 🔥 thanks again!

@Wauplin

Wauplin commented May 19, 2026

Copy link
Copy Markdown
Collaborator Author

Youhou! Merging 🙌

@Wauplin Wauplin merged commit 70e6fd9 into main May 19, 2026
21 checks passed
@Wauplin Wauplin deleted the use-parse-hf-uri branch May 19, 2026 12:49
@huggingface-hub-bot

Copy link
Copy Markdown
Contributor

This PR has been shipped as part of the v1.16.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants