Skip to content

feat: pass skip_sha256=True to hf_xet for bucket uploads#3900

Merged
Wauplin merged 8 commits intomainfrom
skip-sha256-bucket-upload
Mar 13, 2026
Merged

feat: pass skip_sha256=True to hf_xet for bucket uploads#3900
Wauplin merged 8 commits intomainfrom
skip-sha256-bucket-upload

Conversation

@Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Mar 10, 2026

Bucket uploads don't need SHA-256 in the shard metadata (the sha_index GSI is only used for LFS pointer resolution, which doesn't apply to buckets). Pass skip_sha256=True to hf_xet.upload_files() and upload_bytes() in the bucket upload path to skip the SHA-256 computation, removing the main CPU bottleneck on non-SHA-NI instances.

Depends on: huggingface/xet-core#679

This PR is orthogonal to #3876 (which passes sha256 to hf-xet to avoid recomputation on model/dataset upload).

cc @XciD


Note

Medium Risk
Touches the bucket upload path and changes parameters passed to hf_xet, which could affect upload metadata/compatibility if downstream expects SHA-256; scope is limited to buckets.

Overview
Bucket uploads via Xet now skip SHA-256 computation. The bucket upload flow in hf_api.py passes skip_sha256=True to hf_xet.upload_files and hf_xet.upload_bytes to avoid hashing overhead.

Adds an integration test (TestBucketXetUploadSkipSha256) that spies on hf_xet calls during batch_bucket_files to assert skip_sha256 is set for both file-path and in-memory byte uploads, and verifies the objects land in the bucket.

Written by Cursor Bugbot for commit 1fddd7e. This will update automatically on new commits. Configure here.

Bucket uploads don't need SHA-256 in the shard metadata (the sha_index
GSI is only used for LFS pointer resolution, which doesn't apply to
buckets). Pass skip_sha256=True to hf_xet.upload_files() and
upload_bytes() in the bucket upload path to skip the SHA-256
computation, removing the main CPU bottleneck on non-SHA-NI instances.

Depends on: huggingface/xet-core#679

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
@bot-ci-comment
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Replace the two mock-based tests with a single integration test that:
- Creates a real Bucket on staging Hub
- Uploads files from both filepath and bytes in a single batch
- Wraps (not mocks) hf_xet.upload_files and hf_xet.upload_bytes to
  verify skip_sha256=True is passed
- Verifies files are actually uploaded by listing the bucket tree

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
@Wauplin
Copy link
Contributor Author

Wauplin commented Mar 10, 2026

Let's wait for next hf-xet release + update setup.py once it's done.

The test wraps the real hf_xet functions, so it fails when the
installed hf_xet predates the skip_sha256 parameter (xet-core#679).
Use inspect.signature to detect support and pytest.skip accordingly.

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
@XciD
Copy link
Member

XciD commented Mar 10, 2026

cc @rajatarya for viz

cursoragent and others added 2 commits March 10, 2026 09:22
hf_xet.upload_files is a compiled built-in function, so
inspect.signature() raises ValueError. Catch it and skip the test
when the signature can't be introspected (older hf_xet).

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
Use try/except TypeError around upload_files/upload_bytes calls with
skip_sha256=True, falling back to calls without it for older hf_xet
versions. TypeError for unknown kwargs on compiled functions is raised
before any I/O, so the fallback is safe.

Update test to check call_args_list[0] (the first attempt always
includes skip_sha256=True) instead of requiring the function to
accept it.

Co-authored-by: Lucain <Wauplin@users.noreply.github.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! thank you

@Wauplin
Copy link
Contributor Author

Wauplin commented Mar 12, 2026

I've updated the logic to remove the try/except (let's always consider skip_sha256 is exposed) + clean-up bucket after test

Cannot merge until skip_sha256 is exposed publicly in hf-xet (see here)

Copy link
Contributor Author

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hf-xet 1.4.2 got released with the fix => will merge as soon as CI is ✔️

@Wauplin Wauplin merged commit 787603e into main Mar 13, 2026
21 of 22 checks passed
@Wauplin Wauplin deleted the skip-sha256-bucket-upload branch March 13, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants