Skip to content

Conversation

@badeend
Copy link
Member

@badeend badeend commented Aug 14, 2024

Fixes #9058


All input streams are implemented using the HostInputStream trait. Except files. They got special treatment. This can be seen in the definition of InputStream:

pub enum InputStream {
Host(Box<dyn HostInputStream>),
File(FileInputStream),
}

The special case was introduced to work around the fact that OS'es don't actually provide any true async APIs for files. A more detailed explanation can be read in the PR that introduced this setup: #6556


This PR properly implements the HostInputStream & HostOutputStream traits for files. And with "properly" I mean:

  • methods that shouldn't block (e.g. read & write) now really don't block.
  • methods that are allowed to block (e.g. blocking_read & blocking_write_and_flush) continue to block and still take advantage of allow_blocking_current_thread.

Primary changes:

  • InputStream has been changed from an enum to a type alias, just like Outputstream already is.
  • The Host(Input/Output)Stream traits have been extended to allow implementors to provide specialized blocking_* implementations. In this PR, only the File streams actually override these.
  • Added an async fn cancel(&mut self) {} method to the Host(Input/Output)Stream traits. This is called right before the stream is dropped and can be used to wait for asynchronous cleanup.
    • On File streams, this is used to wait on any in flight read or write. Operating systems don't provide real asynchronous APIs for files, so we end up spawning uncancellable blocking background tasks. In the new cancel method we wait for that read/write to run to completion. From the guest's point of view, input/output-stream::drop then appears to block. Certainly less than ideal, but arguably still better than letting the guest rack up an unbounded number of background tasks. Also, the guest is only blocked if the stream was dropped mid-read or mid-write, which is not expected to occur frequently.
    • I also implemented it for various other stream types. Unlike Files, the background tasks of these stream types are truly async. So aborting without awaiting them was probably already good enough in practice. Nonetheless, waiting for the background task to actually shut down feels more "hygienic" to me in terms of resource management.

Slightly related changes:

  • In Preview1 adapter: ignore BlockingMode and always perform blocking I/O. That's what preview1 did anyway.
  • Remove [method]output-stream.forward from bindgen config. That method does not exist.
  • Refactor blocking_splice to take advantage of specialized blocking_read & blocking_write_and_flush implementations
  • Rename filesystem's spawn_blocking -> run_blocking

badeend added 13 commits August 8, 2024 11:40
In the component model, `resource.drop` is a canonical built-in without a proper name. So I invented a custom naming scheme for the component bindgen config. I went with:
`"[drop]{resource-name}"` where `{resource-name}` is the name as defined in WIT. e.g. `"[drop]input-stream"`.

This shouldn't conflict with anything existing in the wild as WIT identifiers are not allowed to contain square brackets.
…f background tasks through the FileOutputStream
Unlike FileOutputStream, the background tasks of these stream types are truly async. So aborting _without_ awaiting them was probably already good enough in practice. Nonetheless, waiting for the background to actually shut down just seems like good resource management "hygiene" to me.
- `write` now always spawns the syscall on a background task, regardless of `allow_blocking_current_thread`.
- `blocking_write_and_flush` is newly overridden and continues to do the `allow_blocking_current_thread` trickery that `write` used to do.
- `read` always spawns the syscall on a background task, regardless of `allow_blocking_current_thread`.
- `blocking_read` performs the `run_blocking`/`allow_blocking_current_thread` trickery.
…_read` & `blocking_write_and_flush` implementations
@badeend badeend requested a review from a team as a code owner August 14, 2024 16:10
@badeend badeend requested review from pchickey and removed request for a team August 14, 2024 16:10
@github-actions github-actions bot added the wasi Issues pertaining to WASI label Aug 14, 2024
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on this refactoring, I really like how it's turned out! I think that this will work quite well.

One thing I can note which I was originally worried about but am no longer is that it's still possible for guests to queue up a lot of tasks. For example if a guest queues a write and then cancels it that'll block the guest waiting to cancel that write. The host may then cancel the host guest which will cancel-the-cancel in a way meaning that the write's just sitting in the ether on a blocking thread waiting to be completed. That initially sounded bad to me but I think this is ok because Tokio has a fixed size of blocking calls which can be "in the ether" and otherwise cancellation of a job in the queue that hasn't actually run yet is successful. This means that it's not possible to get an unbounded number of writes pending at any point with this cancellation.

I've got some minor stylistic thoughts below but otherwise this looks good-to-go to me 👍

Comment on lines +411 to +420
if let ReadState::Idle = self.state {
// The guest hasn't initiated any read, but is nonetheless waiting
// for data to be available. We'll start a read for them:

const DEFAULT_READ_SIZE: usize = 4096;
let p = self.position;
self.state = ReadState::Waiting(
self.file
.spawn_blocking(move |f| Self::blocking_read(f, p, DEFAULT_READ_SIZE)),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this, is it required? I would expect that reads are only initiated from the read method above (or blocking-read) and the ready bits here only wait for completion of an already pending read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it required?

Not sure, but the current wasmtime-wasi implementation would push me towards a hesitant "Yes?"
I kinda stole the idea from the stdin implementation that also initiates a new read if there's no data available:

async fn ready(&mut self) {
let g = GlobalStdin::get();
// Scope the synchronous `state.lock()` to this block which does not
// `.await` inside of it.
let notified = {
let mut locked = g.state.lock().unwrap();
match *locked {
// If a read isn't requested yet
StdinState::ReadNotRequested => {
g.read_requested.notify_one();
*locked = StdinState::ReadRequested;
g.read_completed.notified()
}
StdinState::ReadRequested => g.read_completed.notified(),
StdinState::Data(_) | StdinState::Closed | StdinState::Error(_) => return,
}
};
notified.await;

This, combined with the fact that there exist multiple places in the code that blindly assume that calling .ready().await will yield readable data:

self.table().get_mut(&dest)?.ready().await;
self.table().get_mut(&src)?.ready().await;
self.splice(dest, src, len).await

if let InputStream::Host(s) = self.table().get_mut(&stream)? {
s.ready().await;
}
self.read(stream, len).await

if let InputStream::Host(s) = self.table().get_mut(&stream)? {
s.ready().await;
}
self.skip(stream, len).await

self.ready().await;
self.check_write()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok, I'm not sure if this will survive the test of time but following suit with stdin doesn't seem unreasonable. This shouldn't get hit most of the time for the time being as WASIp1 shouldn't use this. Let's go ahead and land it 👍

@alexcrichton alexcrichton added this pull request to the merge queue Aug 15, 2024
Merged via the queue into bytecodealliance:main with commit 28b3cb1 Aug 15, 2024
@pchickey
Copy link
Contributor

@badeend sorry I didn't get time to review this before it landed, but thanks very much for working on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wasi Issues pertaining to WASI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove special case for InputStream::File ?

3 participants