[Core] Migrate hf:// URI parsing to centralized parse_hf_uri#4189
Conversation
Replace scattered ad-hoc hf:// URI parsers with the centralized `parse_hf_uri`/`parse_hf_mount` helpers introduced in #4158. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Add explicit type annotations for `repo_type` and `revision_in_path` to satisfy mypy's narrowing across branches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| return False | ||
|
|
||
|
|
||
| @functools.lru_cache |
There was a problem hiding this comment.
added since some logic is now parsing multiple time the same URI (once to check if it's an URI, then to check if it's a bucket, then to reuse its parsed values, etc.). Easier to use a LRU cache rather than parsing the value once and passing it everywhere
| namespace: str = field(init=False) | ||
| bucket_id: str = field(init=False) | ||
| handle: str = field(init=False) | ||
| uri: HfUri = field(init=False) |
There was a problem hiding this comment.
intentional breaking change
IMO ok to break as I don't expect anyone to use it at the moment (and it's nice to consistently use the 'uri' naming)
| revision = f"@{self.revision}" if self.revision else "" | ||
| ro = {True: ":ro", False: ":rw", None: ""}.get(self.read_only, "") | ||
| return f"hf://{self.type}s/{self.source}{revision}{path}:{self.mount_path}{ro}" | ||
| def to_hf_mount_uri(self) -> str: |
There was a problem hiding this comment.
intentional breaking change
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `parsed` variable was reused across two branches with incompatible types (HfUri vs None). Renamed to `parsed_or_none` with an explicit `HfUri | None` annotation in the second branch to satisfy mypy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
PR should finally be ready for review! I tried to account for all possible use cases that were previously handled. This PR is introducing a few breaking changes (see PR description) that are intentional. Hopefully won't impact anyone 🤞 Parsing logic should be must cleaner now 😃 |
|
Could HfUri work like RepoUrl and be a string subclass ? This would avoid |
I'd rather not no. The reason why I don't want to implement something like |
|
I see ! in that case I have a small pref for |
hmm, but I want to be able to do |
hanouticelina
left a comment
There was a problem hiding this comment.
Thank you for simplifying the logic here! the PR looks globally good, I left a couple of comments and I will take another look before merging.
up to you if you want to do it in another PR, but let's drop the local try/except ValueError blocks in cli/buckets.py now that the global handler exists, non-URI parse failures still get a reasonable error message.
There was a problem hiding this comment.
I love the cleaning done here! ❤️
| namespace: str = field(init=False) | ||
| bucket_id: str = field(init=False) | ||
| handle: str = field(init=False) | ||
| uri: HfUri = field(init=False) |
yes correct 👍 addressed in c843f32 |
|
@hanouticelina Addressed all of your comments, thanks for the review! |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 19cdfc0. Configure here.
hanouticelina
left a comment
There was a problem hiding this comment.
Awesome 🔥 thanks again!
|
Youhou! Merging 🙌 |
|
This PR has been shipped as part of the v1.16.0 release. |

Summary
Follow-up to #4158. Replaces all scattered ad-hoc
hf://URI parsers with the centralizedparse_hf_uri/parse_hf_mounthelpers.Not touched (as mentioned in #4158):
repo_type_and_id_from_hf_idandRepoUrl— these are mixed parsers accepting bothhf://URIs andhttps://huggingface.co/...URLs and need a separate migration plan.Breaking changes
BucketUrl.handle→BucketUrl.uri— Attribute renamed and type changed fromstrtoHfUri. Any code accessingbucket_url.handlemust switch tobucket_url.uri(use.to_uri()for the string form).Volume.to_hf_handle()→Volume.to_uri()— Method renamed. Any code calling.to_hf_handle()must switch to.to_uri().Single-segment repo IDs no longer supported in
HfFileSystem— Paths likegpt2,gpt2/file.txt, orgpt2@dev/file.txtno longer resolve. Repo IDs must use thenamespace/nameformat (e.g.username/gpt2).Single-segment repo IDs rejected in CLI
-vflags —hf://gpt2:/datais no longer accepted. Must usehf://namespace/name:/mountform.Improvements
@in filenames is now treated as literal (e.g.hf://a/b/file@v1.txtparses correctly instead of erroring).HfUriErroris now caught by the CLI error handler and shows a clean message instead of a raw traceback.is_hf_uripublic helper for validatinghf://URIs.🤖 Generated with Claude Code
Note
Medium Risk
Medium risk because this is a cross-cutting refactor of URI parsing that introduces breaking API/CLI behavior changes (renamed fields/methods and stricter URI formats), which could impact downstream integrations and edge-case path resolution.
Overview
Centralizes all
hf://parsing onparse_hf_uri/parse_hf_mountand removes multiple ad-hoc parsers across buckets, copy helpers, CLI volume parsing, andHfFileSystem.Introduces breaking API/CLI changes:
BucketUrl.handleis replaced byBucketUrl.uri(HfUri),Volume.to_hf_handle()is renamed toto_uri(), and CLI output/docs/examples are updated from “handle” wording to canonical “URI” formatting.Tightens URI validation/semantics:
HfFileSystemand CLI-v/--volumenow reject single-segment repo IDs (must benamespace/name), CLI errors now formatHfUriErrorcleanly, and URI parsing is improved to treat@in filenames as literal (while still supporting revision markers where valid).Reviewed by Cursor Bugbot for commit a471279. Bugbot is set up for automated code reviews on this repo. Configure here.