Skip to content

Avoid reuploading preuploaded LFS files in upload-large-folder#4165

Merged
Wauplin merged 1 commit into
huggingface:mainfrom
Dev-Jahn:fix-upload-large-folder-skip-reupload
Apr 29, 2026
Merged

Avoid reuploading preuploaded LFS files in upload-large-folder#4165
Wauplin merged 1 commit into
huggingface:mainfrom
Dev-Jahn:fix-upload-large-folder-skip-reupload

Conversation

@Dev-Jahn

@Dev-Jahn Dev-Jahn commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

upload_large_folder stores whether a file has already been preuploaded in LocalUploadFileMetadata.is_uploaded, but _build_hacky_operation did not restore that state on the CommitOperationAdd created for the commit step.

As a result, create_commit could call preupload_lfs_files again for files that were already uploaded by the large-folder preupload step. This makes the commit phase unnecessarily re-read/re-upload LFS files after resuming or after preupload completion.

This PR restores _is_uploaded from the local metadata and clears path_or_fileobj for already uploaded LFS files, matching the behavior of preupload_lfs_files after a successful upload.

Tests

  • uv run python -m compileall -q src/huggingface_hub/_upload_large_folder.py tests/test_upload_large_folder.py
  • uv run --with pytest --with pytest-mock --with pytest-env --with-editable . pytest tests/test_upload_large_folder.py -q
  • uv run --with ruff ruff format --check src/huggingface_hub/_upload_large_folder.py tests/test_upload_large_folder.py
  • uv run --with ruff ruff check src/huggingface_hub/_upload_large_folder.py tests/test_upload_large_folder.py

Copilot AI review requested due to automatic review settings April 29, 2026 06:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes upload_large_folder resume behavior by restoring the “already preuploaded” LFS state when rebuilding commit operations from local metadata, preventing redundant LFS uploads during the commit phase.

Changes:

  • Restore CommitOperationAdd._is_uploaded from LocalUploadFileMetadata.is_uploaded when building operations.
  • For already-uploaded LFS files, clear path_or_fileobj to match the post-upload “free memory” behavior.
  • Add a regression test ensuring _build_hacky_operation preserves LFS preupload state.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/huggingface_hub/_upload_large_folder.py Propagates is_uploaded into rebuilt commit operations and clears content for preuploaded LFS items to avoid reupload.
tests/test_upload_large_folder.py Adds coverage for preserving LFS preupload state when rebuilding operations from local metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bot-ci-comment

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@codecov

codecov Bot commented Apr 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.97%. Comparing base (1daa48b) to head (f48dd76).
⚠️ Report is 304 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4165      +/-   ##
==========================================
+ Coverage   75.00%   76.97%   +1.96%     
==========================================
  Files         145      168      +23     
  Lines       13978    19060    +5082     
==========================================
+ Hits        10484    14671    +4187     
- Misses       3494     4389     +895     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Wauplin Wauplin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this out and the PR @Dev-Jahn ! I don't get why we never caught this issue but after carefully reviewing this PR + #2254 and #3100, I can confirm this looks like a complete oversight from us. And thanks for the minimal test as well :)

Note that if the metadata is older than 20h, we automatically set is_uploaded to False even if metadata says otherwise because the uploaded Xet file might have been garbage-collected in the meantime (if uploaded but not committed).

Anyway, thanks for the fix!

@Wauplin Wauplin merged commit c1775a5 into huggingface:main Apr 29, 2026
20 of 21 checks passed
@huggingface-hub-bot

Copy link
Copy Markdown
Contributor

This PR has been shipped as part of the v1.13.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants