Streamline and aggregate file updates for reporting to python#340
Merged
Streamline and aggregate file updates for reporting to python#340
Conversation
Collaborator
Author
|
Note that this depends on #328; Will update this PR once that is merged. |
assafvayner
approved these changes
May 19, 2025
Contributor
assafvayner
left a comment
There was a problem hiding this comment.
Can you provide a sample python snippet of how to use the new constructs?
rajatarya
pushed a commit
that referenced
this pull request
Feb 27, 2026
## Summary Wrap download progress updaters in `AggregatingProgressUpdater` to eliminate GIL contention when Python callers provide per-file progress callbacks. The upload path has had this aggregation since v1.1.3 (PR #340), but the download path was missed. Without aggregation, each XORB chunk triggers a `spawn_blocking` + `Python::with_gil()` callback. With many concurrent file downloads, this causes severe GIL contention — measured as a **4x throughput reduction** (3000 MB/s → 750 MB/s on a 25 Gbps link). The fix wraps the caller-provided `TrackingProgressUpdater` in an `AggregatingProgressUpdater` (200ms flush interval) inside `download_file_with_updater()`, matching the pattern already used by `FileUploadSession`. This reduces Python callback frequency from thousands/sec to ~5/sec per file while preserving progress bar feedback. ## Root cause When `huggingface_hub` calls `hf_xet.download_files()`, it passes a per-file Python callback for progress bar updates. On the Rust side, each callback invocation goes through: ``` report_bytes_written() / report_transfer_progress() → tokio::spawn(register_updates()) → spawn_blocking(Python::with_gil(callback)) ``` With the detailed download progress tracking added in PR #645 (hf-xet v1.3.0), both `report_bytes_written` and `report_transfer_progress` fire per chunk, roughly doubling callback frequency. With 8+ concurrent file downloads, each spawning dozens of concurrent XORB streams, the GIL becomes a severe bottleneck. ## History The problem has existed since xet download support was introduced, but worsened over time: | Version | Date | Impact | |---------|------|--------| | `huggingface_hub v0.30.0` / `hf-xet 0.1.x` | Mar 2025 | Moderate — synchronous `with_gil()` per chunk, but hf_xet was an optional extra | | `huggingface_hub v0.31.0` / `hf-xet >=1.1.0` | May 2025 | Moderate — hf-xet became a hard dependency on x86_64/arm64 | | `hf-xet v1.1.3` | Jun 2025 | Upload path fixed with `AggregatingProgressUpdater` (PR #340); download path left unprotected | | `hf-xet v1.3.0` | Feb 2026 | **Severe** — PR #645 added detailed per-chunk progress tracking to downloads, doubling callback frequency without aggregation | PR #340 explicitly noted: *"each [update] has to acquire a global GIL lock. This negatively affects the upload speed on fast connections"* — the same problem, but only the upload side was addressed. ## Benchmarks Downloading 3 safetensors files (16.1 GB total) from `Qwen/Qwen3.5-35B-A3B` on a 25 Gbps machine: | Test | Before | After | |------|--------|-------| | `download_files()` with `progress_updater=None` (baseline) | 3119 MB/s | 3119 MB/s | | `download_files()` with per-file Python callbacks | **746 MB/s** | **1789 MB/s** | | `snapshot_download()` (full Python CLI path with tqdm) | ~750 MB/s | **2395 MB/s** | Progress callback overhead drops from **4x slowdown to <1%**.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With the current incremental progress updates, the amount of updates going to python is substantial, and each has to acquire a global GIL lock. This negatively affects the upload speed on fast connections.
This PR introduces an intermediate aggregation class that quickly aggregates all the incoming progress updates, then sends the aggregated update list to hf_xet once every 200 ms. With this, the thread contention experienced by the frequent incremental updates is eliminated while still reporting accurate progress to the user.