Conversation
Review NotesReviewed the full diff (+1,824 / -661 across 42 files). The architecture is clean — separating transfer bytes (network) from decompressed bytes (disk) through independent call sites is the right approach, and extracting progress from Items worth discussing
What looks good
|
|
(I'm still working through Claude-assisted review of this PR - will update with more comments/questions as I get through it. But this summary raises some good suggestions around adding comments.) |
|
Good feedback; updated in response to a couple things there (the atomic ordering, the merge_in change, and the unrelated changes). |
rajatarya
left a comment
There was a problem hiding this comment.
Love seeing these changes. Did as thorough a review as I could with all the file changes - and overall I LOVE these changes. Minor comments on adding docstrings and comments.
| } | ||
| } | ||
|
|
||
| pub fn correctness_verification_tracker() -> Arc<Self> { |
There was a problem hiding this comment.
As a debug only checker, can this function be annotated that way?
| pub xorb_hash: MerkleHash, | ||
| pub chunk_range: ChunkRange, | ||
| pub xorb_block_index: usize, | ||
| /// All file-term chunk ranges covered by this xorb block, sorted by range start. |
There was a problem hiding this comment.
Needs a comment for the pub struct here.
| progress_updater: None, | ||
| progress_updater: { | ||
| if cfg!(debug_assertions) { | ||
| Some(DownloadTaskUpdater::correctness_verification_tracker()) |
There was a problem hiding this comment.
nit: I'd love it if there was a comment explaining that this is a test hook for unit-tests and is compiled out.
|
|
||
| #[cfg(debug_assertions)] | ||
| { | ||
| let refs = &file_term.xorb_block.references; |
There was a problem hiding this comment.
It would be good if there was a comment describing this assert. Not sure if folks down the road will remember why these debug assertions exist.
| reconstruct_and_verify_full(&client, &file_contents, test_config()).await; | ||
| } | ||
|
|
||
| // ==================== Progress tracker tests ==================== |
There was a problem hiding this comment.
Love seeing all the new tests!
| let self_ = Arc::new(self); | ||
| let try_count = AtomicUsize::new(0); | ||
|
|
||
| // Extract the partial progress reporting function from the connection permit if it exists |
There was a problem hiding this comment.
Love seeing this progress reporting getting pulled out of retry wrapping. Definitely cleaner abstractions this way!
| match result { | ||
| Ok((_compressed_len, chunk_byte_indices)) => { | ||
| if let Some(expected) = uncompressed_size_if_known { | ||
| debug_assert_eq!( |
There was a problem hiding this comment.
I think this should log error!() or warn!() if the expected bytes are not returned.
This code still returns Ok() for that case, does that mess things later on downstream?
There was a problem hiding this comment.
In this case, it could mean that the remote data is corrupted somehow, so I'm not sure what to do at that point. It's one of the things that should never happen. Practically, however, it will still use the remote data as reference and proceed as normal...
| serialized_data.clone(), | ||
| block_size, | ||
| upload_reporter.clone(), | ||
| ); |
There was a problem hiding this comment.
Love seeing the progress reporting have its own clean abstraction and having Upload and Download have their own ProgressStream implementations.
Co-authored-by: Cursor <cursoragent@cursor.com>
## Summary Wrap download progress updaters in `AggregatingProgressUpdater` to eliminate GIL contention when Python callers provide per-file progress callbacks. The upload path has had this aggregation since v1.1.3 (PR #340), but the download path was missed. Without aggregation, each XORB chunk triggers a `spawn_blocking` + `Python::with_gil()` callback. With many concurrent file downloads, this causes severe GIL contention — measured as a **4x throughput reduction** (3000 MB/s → 750 MB/s on a 25 Gbps link). The fix wraps the caller-provided `TrackingProgressUpdater` in an `AggregatingProgressUpdater` (200ms flush interval) inside `download_file_with_updater()`, matching the pattern already used by `FileUploadSession`. This reduces Python callback frequency from thousands/sec to ~5/sec per file while preserving progress bar feedback. ## Root cause When `huggingface_hub` calls `hf_xet.download_files()`, it passes a per-file Python callback for progress bar updates. On the Rust side, each callback invocation goes through: ``` report_bytes_written() / report_transfer_progress() → tokio::spawn(register_updates()) → spawn_blocking(Python::with_gil(callback)) ``` With the detailed download progress tracking added in PR #645 (hf-xet v1.3.0), both `report_bytes_written` and `report_transfer_progress` fire per chunk, roughly doubling callback frequency. With 8+ concurrent file downloads, each spawning dozens of concurrent XORB streams, the GIL becomes a severe bottleneck. ## History The problem has existed since xet download support was introduced, but worsened over time: | Version | Date | Impact | |---------|------|--------| | `huggingface_hub v0.30.0` / `hf-xet 0.1.x` | Mar 2025 | Moderate — synchronous `with_gil()` per chunk, but hf_xet was an optional extra | | `huggingface_hub v0.31.0` / `hf-xet >=1.1.0` | May 2025 | Moderate — hf-xet became a hard dependency on x86_64/arm64 | | `hf-xet v1.1.3` | Jun 2025 | Upload path fixed with `AggregatingProgressUpdater` (PR #340); download path left unprotected | | `hf-xet v1.3.0` | Feb 2026 | **Severe** — PR #645 added detailed per-chunk progress tracking to downloads, doubling callback frequency without aggregation | PR #340 explicitly noted: *"each [update] has to acquire a global GIL lock. This negatively affects the upload speed on fast connections"* — the same problem, but only the upload side was addressed. ## Benchmarks Downloading 3 safetensors files (16.1 GB total) from `Qwen/Qwen3.5-35B-A3B` on a 25 Gbps machine: | Test | Before | After | |------|--------|-------| | `download_files()` with `progress_updater=None` (baseline) | 3119 MB/s | 3119 MB/s | | `download_files()` with per-file Python callbacks | **746 MB/s** | **1789 MB/s** | | `snapshot_download()` (full Python CLI path with tqdm) | ~750 MB/s | **2395 MB/s** | Progress callback overhead drops from **4x slowdown to <1%**.
This PR adds detailed progress reporting to the download path.
To support this, the UploadProgressStream was split into three classes; a common StreamProgressReporter and download and upload specific versions. This also allows us to simplify the API to RetryWrapper.
More tracking was added to the file reconstruction paths to properly report progress.