Skip to content

fix: wrap TrackingProgressUpdater in AggregatingProgressUpdater#668

Merged
rajatarya merged 2 commits intohuggingface:mainfrom
Hugoch:fix/gil-contention-low-bandwidth
Feb 27, 2026
Merged

fix: wrap TrackingProgressUpdater in AggregatingProgressUpdater#668
rajatarya merged 2 commits intohuggingface:mainfrom
Hugoch:fix/gil-contention-low-bandwidth

Conversation

@Hugoch
Copy link
Member

@Hugoch Hugoch commented Feb 26, 2026

Summary

Wrap download progress updaters in AggregatingProgressUpdater to eliminate GIL contention when Python callers provide per-file progress callbacks.

The upload path has had this aggregation since v1.1.3 (PR #340), but the download path was missed. Without aggregation, each XORB chunk triggers a spawn_blocking + Python::with_gil() callback. With many concurrent file downloads, this causes severe GIL contention — measured as a 4x throughput reduction (3000 MB/s → 750 MB/s on a 25 Gbps link).

The fix wraps the caller-provided TrackingProgressUpdater in an AggregatingProgressUpdater (200ms flush interval) inside download_file_with_updater(), matching the pattern already used by FileUploadSession. This reduces Python callback frequency from thousands/sec to ~5/sec per file while preserving progress bar feedback.

Root cause

When huggingface_hub calls hf_xet.download_files(), it passes a per-file Python callback for progress bar updates. On the Rust side, each callback invocation goes through:

report_bytes_written() / report_transfer_progress()
  → tokio::spawn(register_updates())
    → spawn_blocking(Python::with_gil(callback))

With the detailed download progress tracking added in PR #645 (hf-xet v1.3.0), both report_bytes_written and report_transfer_progress fire per chunk, roughly doubling callback frequency. With 8+ concurrent file downloads, each spawning dozens of concurrent XORB streams, the GIL becomes a severe bottleneck.

History

The problem has existed since xet download support was introduced, but worsened over time:

Version Date Impact
huggingface_hub v0.30.0 / hf-xet 0.1.x Mar 2025 Moderate — synchronous with_gil() per chunk, but hf_xet was an optional extra
huggingface_hub v0.31.0 / hf-xet >=1.1.0 May 2025 Moderate — hf-xet became a hard dependency on x86_64/arm64
hf-xet v1.1.3 Jun 2025 Upload path fixed with AggregatingProgressUpdater (PR #340); download path left unprotected
hf-xet v1.3.0 Feb 2026 Severe — PR #645 added detailed per-chunk progress tracking to downloads, doubling callback frequency without aggregation

PR #340 explicitly noted: "each [update] has to acquire a global GIL lock. This negatively affects the upload speed on fast connections" — the same problem, but only the upload side was addressed.

Benchmarks

Downloading 3 safetensors files (16.1 GB total) from Qwen/Qwen3.5-35B-A3B on a 25 Gbps machine:

Test Before After
download_files() with progress_updater=None (baseline) 3119 MB/s 3119 MB/s
download_files() with per-file Python callbacks 746 MB/s 1789 MB/s
snapshot_download() (full Python CLI path with tqdm) ~750 MB/s 2395 MB/s

Progress callback overhead drops from 4x slowdown to <1%.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants