Skip to content

[CLI] Add file listing to models/datasets/spaces ls#4166

Merged
Wauplin merged 12 commits into
mainfrom
cursor/repo-file-listing-f6e6
Apr 29, 2026
Merged

[CLI] Add file listing to models/datasets/spaces ls#4166
Wauplin merged 12 commits into
mainfrom
cursor/repo-file-listing-f6e6

Conversation

@Wauplin

@Wauplin Wauplin commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Context: we can list tree from buckets but not from repos. This PR adds support for this.

For reviewers: most of the code is a move from the existing code in buckets.py to a "repo/bucket-agnostic version". For tests, I've added a few ones for model repos for the main use cases but only 1 for datasets/spaces (same logic anyway).

Note for a follow-up PR: when too many files (>1000?) IMO we should truncate the output and put a warning "Output has been truncated. Pass --full to get full list". This would be a change for both buckets and repos so I think it's out of scope for this PR.


Summary

hf models ls, hf datasets ls, and hf spaces ls can now list files from an individual repo when called with a repo ID matching the existing behavior of hf buckets ls <bucket_id>. When a positional repo_id argument is given, the ls command switches from "list repos" to "list files in that repo" mode.

Supports the same options as bucket file listing: --tree, -R (recursive), -h (human-readable sizes), plus --revision.

Examples:

$ hf models ls meta-llama/Llama-3.2-1B-Instruct --format json | jq ".[].path"
Hint: Use -R to list files recursively.
"original"
".gitattributes"
"LICENSE.txt"
"README.md"
"USE_POLICY.md"
"config.json"
"generation_config.json"
"model.safetensors"
"special_tokens_map.json"
"tokenizer.json"
"tokenizer_config.json"
$ hf datasets ls nvidia/Nemotron-Personas-Korea -R 
              2026-04-20 21:09:08  data/
              2026-04-23 07:42:48  images/
        2504  2026-04-20 21:07:51  .gitattributes
       35929  2026-04-23 07:42:48  README.md
   220263908  2026-04-20 21:09:08  data/train-00000-of-00009.parquet
   220182609  2026-04-20 21:09:08  data/train-00001-of-00009.parquet
   220265631  2026-04-20 21:09:08  data/train-00002-of-00009.parquet
   220283830  2026-04-20 21:09:08  data/train-00003-of-00009.parquet
   220304978  2026-04-20 21:09:08  data/train-00004-of-00009.parquet
   220183836  2026-04-20 21:09:08  data/train-00005-of-00009.parquet
   220384182  2026-04-20 21:09:08  data/train-00006-of-00009.parquet
   220272087  2026-04-20 21:09:08  data/train-00007-of-00009.parquet
   220254045  2026-04-20 21:09:08  data/train-00008-of-00009.parquet
        6148  2026-04-20 23:13:09  images/.DS_Store
       49060  2026-04-20 23:13:09  images/nemotron_personas_korea_age_group_distribution.png
      183123  2026-04-20 23:13:09  images/nemotron_personas_korea_approach.png
      458678  2026-04-20 23:13:09  images/nemotron_personas_korea_education_distribution.png
      108946  2026-04-20 23:13:09  images/nemotron_personas_korea_education_map.png
      152334  2026-04-20 23:13:09  images/nemotron_personas_korea_field_stats.png
      307342  2026-04-20 23:13:09  images/nemotron_personas_korea_household_type_distribution.png
       97610  2026-04-20 23:13:09  images/nemotron_personas_korea_marital_status_distribution.png
      225771  2026-04-20 23:13:09  images/nemotron_personas_korea_occupation.png
      185838  2026-04-20 23:13:09  images/nemotron_personas_korea_schema_en.png
      197596  2026-04-23 07:42:48  images/nemotron_personas_korea_schema_ko.png

Slack Thread

Open in Web Open in Cursor 

When called with a repo ID, 'hf models ls', 'hf datasets ls', and
'hf spaces ls' now list files in the corresponding repo, matching
the behavior of 'hf buckets ls <bucket_id>'.

Supports --tree, -R (recursive), -h (human-readable), and --revision.

Shared file listing helpers are factored into _file_listing.py, and
buckets.py is refactored to use them too.

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
@bot-ci-comment

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cursoragent and others added 8 commits April 29, 2026 07:38
The tree_bucket fixture was patching the now-removed _format_mtime
in buckets.py. Updated to patch format_date in _file_listing.py.

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
Use @with_production_testing against real repos:
- t5-small (model): JSON, quiet, tree, recursive outputs
- rajpurkar/squad (dataset): JSON output
- gradio/theme_builder (space): JSON output

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
@Wauplin Wauplin requested a review from hanouticelina April 29, 2026 12:19
@Wauplin Wauplin marked this pull request as ready for review April 29, 2026 12:19

@Wauplin Wauplin left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ (should be ready for review)

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 25a82b2. Configure here.

Comment thread src/huggingface_hub/cli/models.py
Comment thread src/huggingface_hub/cli/buckets.py
Comment on lines +121 to +134
if search is not None:
raise typer.BadParameter("Cannot use --search when listing files.")
if author is not None:
raise typer.BadParameter("Cannot use --author when listing files.")
if filter is not None:
raise typer.BadParameter("Cannot use --filter when listing files.")
if num_parameters is not None:
raise typer.BadParameter("Cannot use --num-parameters when listing files.")
if sort is not None:
raise typer.BadParameter("Cannot use --sort when listing files.")
if limit != 10:
raise typer.BadParameter("Cannot use --limit when listing files.")
if expand is not None:
raise typer.BadParameter("Cannot use --expand when listing files.")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is largely duplicated between models, datasets and spaces but I didn't find a nice way to factorize it while been easy to read so I kept the duplicated logic

@hanouticelina hanouticelina left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

expand: ExpandOpt = None,
human_readable: Annotated[
bool,
typer.Option("--human-readable", "-h", help="Show sizes in human readable format (only for listing files)."),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already pre-existing in buckets and I didn't notice before, but -h will collide with --help, no?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a problem IMO (the -h for human-readable takes precedence)

@Wauplin Wauplin merged commit bc4069b into main Apr 29, 2026
21 checks passed
@Wauplin Wauplin deleted the cursor/repo-file-listing-f6e6 branch April 29, 2026 15:24
@huggingface-hub-bot

Copy link
Copy Markdown
Contributor

This PR has been shipped as part of the v1.13.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants