fix: wrap TrackingProgressUpdater in AggregatingProgressUpdater#668
Merged
rajatarya merged 2 commits intohuggingface:mainfrom Feb 27, 2026
Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wrap download progress updaters in
AggregatingProgressUpdaterto eliminate GIL contention when Python callers provide per-file progress callbacks.The upload path has had this aggregation since v1.1.3 (PR #340), but the download path was missed. Without aggregation, each XORB chunk triggers a
spawn_blocking+Python::with_gil()callback. With many concurrent file downloads, this causes severe GIL contention — measured as a 4x throughput reduction (3000 MB/s → 750 MB/s on a 25 Gbps link).The fix wraps the caller-provided
TrackingProgressUpdaterin anAggregatingProgressUpdater(200ms flush interval) insidedownload_file_with_updater(), matching the pattern already used byFileUploadSession. This reduces Python callback frequency from thousands/sec to ~5/sec per file while preserving progress bar feedback.Root cause
When
huggingface_hubcallshf_xet.download_files(), it passes a per-file Python callback for progress bar updates. On the Rust side, each callback invocation goes through:With the detailed download progress tracking added in PR #645 (hf-xet v1.3.0), both
report_bytes_writtenandreport_transfer_progressfire per chunk, roughly doubling callback frequency. With 8+ concurrent file downloads, each spawning dozens of concurrent XORB streams, the GIL becomes a severe bottleneck.History
The problem has existed since xet download support was introduced, but worsened over time:
huggingface_hub v0.30.0/hf-xet 0.1.xwith_gil()per chunk, but hf_xet was an optional extrahuggingface_hub v0.31.0/hf-xet >=1.1.0hf-xet v1.1.3AggregatingProgressUpdater(PR #340); download path left unprotectedhf-xet v1.3.0PR #340 explicitly noted: "each [update] has to acquire a global GIL lock. This negatively affects the upload speed on fast connections" — the same problem, but only the upload side was addressed.
Benchmarks
Downloading 3 safetensors files (16.1 GB total) from
Qwen/Qwen3.5-35B-A3Bon a 25 Gbps machine:download_files()withprogress_updater=None(baseline)download_files()with per-file Python callbackssnapshot_download()(full Python CLI path with tqdm)Progress callback overhead drops from 4x slowdown to <1%.