[CLI] Add hf models card and hf datasets card commands#4118
Conversation
Add commands to fetch model/dataset cards (README) from the Hub with three output modes: full card (default), --metadata (YAML frontmatter as JSON), and --text (markdown body only). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4118 +/- ##
==========================================
+ Coverage 75.00% 77.01% +2.00%
==========================================
Files 145 167 +22
Lines 13978 18948 +4970
==========================================
+ Hits 10484 14592 +4108
- Misses 3494 4356 +862 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Wauplin
left a comment
There was a problem hiding this comment.
Thanks for the addition. Could you also add hf spaces card? Can be useful to quickly check the Space metadata
| assert kwargs["sort"] == "downloads" | ||
|
|
||
|
|
||
| class TestModelsCardCommand: |
There was a problem hiding this comment.
can you replace all tests with real world ones e.g.
def test_models_card_full(self, runner: CliRunner) -> None:
result = runner.invoke(app, ["models", "card", "Qwen/Qwen3.6-35B-A3B"])
assert "library_name: transformers" in result.stdout
assert "# Qwen3.6-35B-A3B" in result.stdout
?
no mocks, no need to check exit code, makes the whole test more readable IMO
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
- Add `hf spaces card` command to complete the models/datasets/spaces trio - Replace mocked unit tests for models/datasets card with single live tests using @with_production_testing (Wauplin's preferred pattern) - Add live test for spaces card - Document hf spaces card in CLI guide - Regenerate package_reference/cli.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous examples used enzostvs/deepsite (consistent with hf spaces info) but that Space has no public README, so `hf spaces card enzostvs/deepsite` returns 404. Switch examples, docs, and the live test to mteb/leaderboard, which has a public card. Also tighten the dataset live-test assertion to check for the body heading rather than just "FineWeb" (which appears in both YAML and body). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
cc @Wauplin Updated tests and added support for Spaces cards. |
|
This PR has been shipped as part of the v1.13.0 release. |
Summary
hf models card <model_id>andhf datasets card <dataset_id>commands that print the repo card (README) to stdout--metadata(just the YAML frontmatter as JSON),--text(just the markdown body)--metadataand--textare mutually exclusiveMotivation
hf models infogives you structured Hub metadata (downloads, tags, pipeline_tag, siblings, etc.) but not the human-authored card content. The card text is where you find the stuff thatinfodoesn't surface: usage examples with actual code, training details, known limitations, intended use cases, benchmark results with context, and architecture descriptions. Put simply —infotells you what a model is,cardtells you how to use it and why.infodoes include acard_datafield, but it's the raw YAML string, not parsed.--metadatareturns the same data as structured JSON viaout.dict(), so it works with--formatand is easy to pipe intojqor consume programmatically.Agents and humans can already get card content via
hf download <repo_id> README.mdor curl, but that writes to a file and gives you the raw README with no way to split the YAML frontmatter from the prose.hf models cardoutputs directly to stdout and the--metadata/--textflags let you grab just the part you need.For agents specifically, having a low-friction way to read model documentation helps reduce hallucination. Agents tend to default to recommending models they've memorised from training data (often outdated — e.g. still reaching for early Llama models), and fabricate usage details rather than checking the actual card. A single command that returns the real card content makes it easy for agents to look things up rather than guess. This is particularly valuable for newer models that post-date the agent's training cutoff.
For humans, it's a quick way to check a model's docs from the terminal without opening a browser — useful when comparing models or scripting.
Examples
Design notes
--metadatanot--yaml— We considered--yaml(the source format) and--frontmatter(the structural term) but went with--metadatabecause it describes what you're extracting rather than where it lives. It also pairs cleanly with--text— both flags describe the kind of content you want. And it avoids confusion with--format, which controls output format:--metadata --format jsonreads clearly as "give me the metadata, formatted as JSON".--revisionsupport —RepoCard.load()doesn't currently passrevisiontohf_hub_download. Could be added toRepoCard.load()in a follow-up and then wired through here.--formatis accepted even though the default and--textmodes output free-form text (where--format jsonproduces no output, same ashf papers read). We kept it because--metadatagoes throughout.dict()and genuinely benefits from it (e.g.--format humanfor pretty-printed JSON). This follows the majority CLI pattern —hf papers readis the only command that omits--format.🤖 Generated with Claude Code
Note
Low Risk
Low risk: adds new read-only CLI subcommands that fetch and print repo card content, with minimal impact on existing command behavior.
Overview
Adds new
hf models cardandhf datasets cardCLI subcommands to fetch a repo card (README) and print it to stdout, with--metadata(YAML frontmatter parsed to structured JSON viaout.dict) or--text(markdown body only) modes and a mutual-exclusion check.Updates the CLI docs/reference to document these new commands and adds CLI tests covering full/metadata/text outputs and invalid flag combinations.
Reviewed by Cursor Bugbot for commit 524fc2c. Bugbot is set up for automated code reviews on this repo. Configure here.