Skip to content

Scale download buffer memory limit by number of active downloads#666

Merged
hoytak merged 1 commit intomainfrom
hoytak/260219-adjust-download-totals
Feb 27, 2026
Merged

Scale download buffer memory limit by number of active downloads#666
hoytak merged 1 commit intomainfrom
hoytak/260219-adjust-download-totals

Conversation

@hoytak
Copy link
Collaborator

@hoytak hoytak commented Feb 25, 2026

Currently, the maximum number of downloaded files is fixed, regardless of the number of downloads currently in flight. However, as the number of downloads increases, a fixed size total could lead to waiting on individual segments that download out-of-order or don't have enough turnaround time to saturate the output. While writing to disk or the download itself often becomes the bottleneck before these effects, planned features such as streaming files and caching could be affected by this limit. The default formula for the download buffer size now is (2GB + 512MB * number of concurrent downloads) up to a maximum of 8GB (these are adjustable).

This PR alleviates this by allocating an additional 512MB buffer allocation per file, prioritized to the specific download, releasing that capacity when the file finishes downloading. This is done using the AdjustableSemaphore class, first introduced for the concurrent scaling, which allows the number of total permits in a semaphore to be incremented or decremented; on decrement, permits are discarded upon return until the total permits is at the target number.

@rajatarya
Copy link
Collaborator

(just need to point out this is PR 666 in xet-core.)

Copy link
Collaborator

@rajatarya rajatarya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Scale download buffer memory limit by number of active downloads

Well-designed PR that solves a real problem — fixed buffer sizes limiting throughput as concurrent downloads scale. The AdjustableSemaphore rewrite is thorough and the virtual permit pattern for bypassing the FIFO queue is elegant. Clean removal of ResourceSemaphore with its functionality properly absorbed. Test coverage is strong.

Suggestions

  • Virtual permit Drop: add debug_assert! to guard against underflow in the virtual-permit branch
  • split() with n=0: returns a semantically odd zero-permit with None inner — consider returning None
  • Typo: "bipasses" → "bypasses" in doc comment
  • Discarded result: let _ = decrement_total_permits(1) — brief comment on why result is intentionally unused
  • Relaxed ordering on active_downloads: concurrent download starts can see stale counts, computing lower targets — seed permit may not be granted to second concurrent starter (benign but worth documenting)
  • Breaking env var removal: HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_PERMIT_BASIS is gone — worth mentioning in PR description
  • use_vectored_write default → true: unrelated behavioral change — should be called out in PR description or split out

} else {
// Virtual permit (from increment_total_permits): release non-consumed
// permits into the semaphore.
let to_return = (num_permits - decreases_resolved) as usize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential underflow if decreases_resolved > num_permits. While attempt_sub should clamp, a debug_assert!(decreases_resolved <= num_permits) here would make the invariant explicit and catch bugs early.

/// Returns a permit holding the newly added capacity, or `None`
/// if no permits could be added (already at max). The permits enter the
/// semaphore when the returned permit is dropped. This allows a user to
/// acquire a permit immediately that bipasses the FIFO queue so the caller
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "bipasses" → "bypasses"

}
let physical_n = physical_n as u32;

self.num_permits -= physical_n;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When n == 0, this returns Some(AdjustableSemaphorePermit { permit: None, num_permits: 0, .. }) — a zero-permit virtual permit. On drop it harmlessly calls attempt_sub(0) + add_permits(0). Consider returning None for n == 0 to avoid the odd semantics.

if state_lg.last_adjustment_time.elapsed() > self.min_concurrency_decrease_delay {
let old_concurrency = self.concurrency_semaphore.total_permits();
self.concurrency_semaphore.decrement_total_permits();
let _ = self.concurrency_semaphore.decrement_total_permits(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the old code also silently ignored the bool return, but now that it returns Option<u64> with richer info, a brief // Already at min; ok to ignore comment would help readers.

@rajatarya
Copy link
Collaborator

I had Claude Code actually to the PR review and then add the comments as if I would have - by adding them to line numbers in files as separate comments. This way we can discuss more easily in thread in the PR. Most of these comments are minor nits/typos, but I think the relaxed ordering comment is worth considering.

@hoytak
Copy link
Collaborator Author

hoytak commented Feb 27, 2026

I had Claude Code actually to the PR review and then add the comments as if I would have - by adding them to line numbers in files as separate comments. This way we can discuss more easily in thread in the PR. Most of these comments are minor nits/typos, but I think the relaxed ordering comment is worth considering.

Fixed the small stuff. I think the Relaxed ordering comment is actually wrong -- you won't see that behavior with atomics, even with Relaxed ordering. See my comment above.

@hoytak hoytak force-pushed the hoytak/260219-adjust-download-totals branch from 4651ea1 to 6e97bac Compare February 27, 2026 01:07
Copy link
Collaborator

@rajatarya rajatarya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for quickly addressing the feedback! And for explaining how relaxed atomics work to me.

@hoytak hoytak merged commit 543914d into main Feb 27, 2026
7 checks passed
@hoytak hoytak deleted the hoytak/260219-adjust-download-totals branch February 27, 2026 19:35
hoytak added a commit that referenced this pull request Feb 28, 2026
Resolves conflicts between V2 multi-range reconstruction and main's
streaming data writer (#656) and dynamic buffer scaling (#666):
- Keeps main's RunState, cancellation, DownloadStream, and dynamic
  buffer scaling infrastructure intact.
- Applies V2 multi-range XorbBlockData format (chunk_offsets with
  chunk indices for disjoint ranges) on top.
- Preserves both V2 tests (V1 fallback, max ranges, disjoint ranges)
  and main's cancellation tests.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants