Streamline and aggregate file updates for reporting to python by hoytak · Pull Request #340 · huggingface/xet-core

hoytak · 2025-05-16T20:41:39Z

With the current incremental progress updates, the amount of updates going to python is substantial, and each has to acquire a global GIL lock. This negatively affects the upload speed on fast connections.

This PR introduces an intermediate aggregation class that quickly aggregates all the incoming progress updates, then sends the aggregated update list to hf_xet once every 200 ms. With this, the thread contention experienced by the frequent incremental updates is eliminated while still reporting accurate progress to the user.

hoytak · 2025-05-16T20:42:57Z

Note that this depends on #328; Will update this PR once that is merged.

assafvayner

Can you provide a sample python snippet of how to use the new constructs?

hf_xet/src/progress_update.rs

…te-updates

## Summary Wrap download progress updaters in `AggregatingProgressUpdater` to eliminate GIL contention when Python callers provide per-file progress callbacks. The upload path has had this aggregation since v1.1.3 (PR #340), but the download path was missed. Without aggregation, each XORB chunk triggers a `spawn_blocking` + `Python::with_gil()` callback. With many concurrent file downloads, this causes severe GIL contention — measured as a **4x throughput reduction** (3000 MB/s → 750 MB/s on a 25 Gbps link). The fix wraps the caller-provided `TrackingProgressUpdater` in an `AggregatingProgressUpdater` (200ms flush interval) inside `download_file_with_updater()`, matching the pattern already used by `FileUploadSession`. This reduces Python callback frequency from thousands/sec to ~5/sec per file while preserving progress bar feedback. ## Root cause When `huggingface_hub` calls `hf_xet.download_files()`, it passes a per-file Python callback for progress bar updates. On the Rust side, each callback invocation goes through: ``` report_bytes_written() / report_transfer_progress() → tokio::spawn(register_updates()) → spawn_blocking(Python::with_gil(callback)) ``` With the detailed download progress tracking added in PR #645 (hf-xet v1.3.0), both `report_bytes_written` and `report_transfer_progress` fire per chunk, roughly doubling callback frequency. With 8+ concurrent file downloads, each spawning dozens of concurrent XORB streams, the GIL becomes a severe bottleneck. ## History The problem has existed since xet download support was introduced, but worsened over time: | Version | Date | Impact | |---------|------|--------| | `huggingface_hub v0.30.0` / `hf-xet 0.1.x` | Mar 2025 | Moderate — synchronous `with_gil()` per chunk, but hf_xet was an optional extra | | `huggingface_hub v0.31.0` / `hf-xet >=1.1.0` | May 2025 | Moderate — hf-xet became a hard dependency on x86_64/arm64 | | `hf-xet v1.1.3` | Jun 2025 | Upload path fixed with `AggregatingProgressUpdater` (PR #340); download path left unprotected | | `hf-xet v1.3.0` | Feb 2026 | **Severe** — PR #645 added detailed per-chunk progress tracking to downloads, doubling callback frequency without aggregation | PR #340 explicitly noted: *"each [update] has to acquire a global GIL lock. This negatively affects the upload speed on fast connections"* — the same problem, but only the upload side was addressed. ## Benchmarks Downloading 3 safetensors files (16.1 GB total) from `Qwen/Qwen3.5-35B-A3B` on a 25 Gbps machine: | Test | Before | After | |------|--------|-------| | `download_files()` with `progress_updater=None` (baseline) | 3119 MB/s | 3119 MB/s | | `download_files()` with per-file Python callbacks | **746 MB/s** | **1789 MB/s** | | `snapshot_download()` (full Python CLI path with tqdm) | ~750 MB/s | **2395 MB/s** | Progress callback overhead drops from **4x slowdown to <1%**.

hoytak added 4 commits May 15, 2025 19:57

Track total bytes and transfer bytes seperately.

30f141d

Deleted unneeded file.

f490012

Checkpoint.

0ef4b37

Streamline and aggregate python updates.

e48c0b3

hoytak requested review from assafvayner and seanses May 16, 2025 20:41

assafvayner approved these changes May 19, 2025

View reviewed changes

hf_xet/src/progress_update.rs Show resolved Hide resolved

hoytak added 5 commits May 20, 2025 10:29

Merge remote-tracking branch 'origin/main' into hoytak/250516-aggrega…

5a98445

…te-updates

Added documentation.

3c33f63

Updated aggregation window.

e2b7f25

Updated interface to export python modules.

dc12add

Updated format.

076e1e0

hoytak merged commit 4faec0b into main May 20, 2025
4 checks passed

hoytak deleted the hoytak/250516-aggregate-updates branch May 20, 2025 18:35

Hugoch mentioned this pull request Feb 26, 2026

fix: wrap TrackingProgressUpdater in AggregatingProgressUpdater #668

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline and aggregate file updates for reporting to python#340

Streamline and aggregate file updates for reporting to python#340
hoytak merged 9 commits intomainfrom
hoytak/250516-aggregate-updates

hoytak commented May 16, 2025

Uh oh!

hoytak commented May 16, 2025

Uh oh!

assafvayner left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hoytak commented May 16, 2025

Uh oh!

hoytak commented May 16, 2025

Uh oh!

assafvayner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants