Skip to content

Streaming data writer#656

Merged
hoytak merged 2 commits intomainfrom
hoytak/260211-streaming-data-writer
Feb 27, 2026
Merged

Streaming data writer#656
hoytak merged 2 commits intomainfrom
hoytak/260211-streaming-data-writer

Conversation

@hoytak
Copy link
Collaborator

@hoytak hoytak commented Feb 13, 2026

This PR adds an integrated API for streaming downloads, exposing a DownloadStream object that is integrated with the file reconstructor. It also uses the same memory management buffer limiting process to work with the stream object.

It also introduces cancellation support to the FileReconstructor to ensure that tasks waiting on a long running download or semaphore wait don't cause things to hang when an error is reported or the user drops the stream.

Base automatically changed from hoytak/260211-download-group to main February 20, 2026 01:43
@hoytak hoytak force-pushed the hoytak/260211-streaming-data-writer branch from 1afdd5c to 4ac2baa Compare February 23, 2026 22:54
@hoytak hoytak marked this pull request as ready for review February 23, 2026 23:01
/// The provided `source_range` is interpreted against the original file; output
/// starts at the writer's current position.
///
/// This path does not acquire the session-level file download semaphore.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, and I'm willing to change this. First, currently our Box interface doesn't use it -- which may be an oversight. Second, it strikes me that the current parallel limit is mainly due to how resources have been previously handled, and those are controlled using other methods now. Inefficient disk access, however, is a valid reason, so it makes some sense in the file downloads, especially since the typical behavior there is to give it a large group of files and have those files be downloaded as group as efficiently as possible.

With streams, it's usually that the user is controlling the group behavior above this API, so if someone is streaming something then the max parallel limit is just another unneeded constraint and would likely just serve to delay the start of the download stream.

However, I am definitely open to changing this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this makes sense -- but also there is no way for this function to know if the write: W is writing to a file or a stream. So I'm actually thinking in another direction: how about we move the all semaphore controls (introduced to download_file and download_file_with_updater) to a higher level? The reason is that the new XetSession can report accurately if a task is in a queued state or a running state by managing the semaphore control itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocking issue to merge this PR

@hoytak hoytak force-pushed the hoytak/260211-streaming-data-writer branch from f3db814 to 8b9fb3c Compare February 27, 2026 16:12
@hoytak hoytak merged commit 9b3278a into main Feb 27, 2026
7 checks passed
@hoytak hoytak deleted the hoytak/260211-streaming-data-writer branch February 27, 2026 23:08
hoytak added a commit that referenced this pull request Feb 28, 2026
Resolves conflicts between V2 multi-range reconstruction and main's
streaming data writer (#656) and dynamic buffer scaling (#666):
- Keeps main's RunState, cancellation, DownloadStream, and dynamic
  buffer scaling infrastructure intact.
- Applies V2 multi-range XorbBlockData format (chunk_offsets with
  chunk indices for disjoint ranges) on top.
- Preserves both V2 tests (V1 fallback, max ranges, disjoint ranges)
  and main's cancellation tests.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants