-
Notifications
You must be signed in to change notification settings - Fork 1.6k
wasi-filesystem: Implement HostInputStream & HostOutputStream
#9129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasi-filesystem: Implement HostInputStream & HostOutputStream
#9129
Conversation
In the component model, `resource.drop` is a canonical built-in without a proper name. So I invented a custom naming scheme for the component bindgen config. I went with:
`"[drop]{resource-name}"` where `{resource-name}` is the name as defined in WIT. e.g. `"[drop]input-stream"`.
This shouldn't conflict with anything existing in the wild as WIT identifiers are not allowed to contain square brackets.
…f background tasks through the FileOutputStream
Unlike FileOutputStream, the background tasks of these stream types are truly async. So aborting _without_ awaiting them was probably already good enough in practice. Nonetheless, waiting for the background to actually shut down just seems like good resource management "hygiene" to me.
- `write` now always spawns the syscall on a background task, regardless of `allow_blocking_current_thread`. - `blocking_write_and_flush` is newly overridden and continues to do the `allow_blocking_current_thread` trickery that `write` used to do.
- `read` always spawns the syscall on a background task, regardless of `allow_blocking_current_thread`. - `blocking_read` performs the `run_blocking`/`allow_blocking_current_thread` trickery.
…I/O, as that's what preview1 did.
…o be a type alias, just like OutputStream
…_read` & `blocking_write_and_flush` implementations
…nto file-hostinputstream
alexcrichton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking on this refactoring, I really like how it's turned out! I think that this will work quite well.
One thing I can note which I was originally worried about but am no longer is that it's still possible for guests to queue up a lot of tasks. For example if a guest queues a write and then cancels it that'll block the guest waiting to cancel that write. The host may then cancel the host guest which will cancel-the-cancel in a way meaning that the write's just sitting in the ether on a blocking thread waiting to be completed. That initially sounded bad to me but I think this is ok because Tokio has a fixed size of blocking calls which can be "in the ether" and otherwise cancellation of a job in the queue that hasn't actually run yet is successful. This means that it's not possible to get an unbounded number of writes pending at any point with this cancellation.
I've got some minor stylistic thoughts below but otherwise this looks good-to-go to me 👍
| if let ReadState::Idle = self.state { | ||
| // The guest hasn't initiated any read, but is nonetheless waiting | ||
| // for data to be available. We'll start a read for them: | ||
|
|
||
| const DEFAULT_READ_SIZE: usize = 4096; | ||
| let p = self.position; | ||
| self.state = ReadState::Waiting( | ||
| self.file | ||
| .spawn_blocking(move |f| Self::blocking_read(f, p, DEFAULT_READ_SIZE)), | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by this, is it required? I would expect that reads are only initiated from the read method above (or blocking-read) and the ready bits here only wait for completion of an already pending read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it required?
Not sure, but the current wasmtime-wasi implementation would push me towards a hesitant "Yes?"
I kinda stole the idea from the stdin implementation that also initiates a new read if there's no data available:
wasmtime/crates/wasi/src/stdio/worker_thread_stdin.rs
Lines 156 to 175 in c7756bd
| async fn ready(&mut self) { | |
| let g = GlobalStdin::get(); | |
| // Scope the synchronous `state.lock()` to this block which does not | |
| // `.await` inside of it. | |
| let notified = { | |
| let mut locked = g.state.lock().unwrap(); | |
| match *locked { | |
| // If a read isn't requested yet | |
| StdinState::ReadNotRequested => { | |
| g.read_requested.notify_one(); | |
| *locked = StdinState::ReadRequested; | |
| g.read_completed.notified() | |
| } | |
| StdinState::ReadRequested => g.read_completed.notified(), | |
| StdinState::Data(_) | StdinState::Closed | StdinState::Error(_) => return, | |
| } | |
| }; | |
| notified.await; |
This, combined with the fact that there exist multiple places in the code that blindly assume that calling .ready().await will yield readable data:
wasmtime/crates/wasi/src/host/io.rs
Lines 178 to 182 in c7756bd
| self.table().get_mut(&dest)?.ready().await; | |
| self.table().get_mut(&src)?.ready().await; | |
| self.splice(dest, src, len).await |
wasmtime/crates/wasi/src/host/io.rs
Lines 211 to 214 in c7756bd
| if let InputStream::Host(s) = self.table().get_mut(&stream)? { | |
| s.ready().await; | |
| } | |
| self.read(stream, len).await |
wasmtime/crates/wasi/src/host/io.rs
Lines 231 to 234 in c7756bd
| if let InputStream::Host(s) = self.table().get_mut(&stream)? { | |
| s.ready().await; | |
| } | |
| self.skip(stream, len).await |
wasmtime/crates/wasi/src/stream.rs
Lines 153 to 154 in c7756bd
| self.ready().await; | |
| self.check_write() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ok, I'm not sure if this will survive the test of time but following suit with stdin doesn't seem unreasonable. This shouldn't get hit most of the time for the time being as WASIp1 shouldn't use this. Let's go ahead and land it 👍
|
@badeend sorry I didn't get time to review this before it landed, but thanks very much for working on it! |
Fixes #9058
All input streams are implemented using the
HostInputStreamtrait. Except files. They got special treatment. This can be seen in the definition ofInputStream:wasmtime/crates/wasi/src/stream.rs
Lines 165 to 168 in ba864e9
The special case was introduced to work around the fact that OS'es don't actually provide any true async APIs for files. A more detailed explanation can be read in the PR that introduced this setup: #6556
This PR properly implements the
HostInputStream&HostOutputStreamtraits for files. And with "properly" I mean:read&write) now really don't block.blocking_read&blocking_write_and_flush) continue to block and still take advantage ofallow_blocking_current_thread.Primary changes:
InputStreamhas been changed from an enum to a type alias, just likeOutputstreamalready is.blocking_*implementations. In this PR, only the File streams actually override these.async fn cancel(&mut self) {}method to the Host(Input/Output)Stream traits. This is called right before the stream is dropped and can be used to wait for asynchronous cleanup.cancelmethod we wait for that read/write to run to completion. From the guest's point of view,input/output-stream::dropthen appears to block. Certainly less than ideal, but arguably still better than letting the guest rack up an unbounded number of background tasks. Also, the guest is only blocked if the stream was dropped mid-read or mid-write, which is not expected to occur frequently.Slightly related changes:
[method]output-stream.forwardfrom bindgen config. That method does not exist.blocking_spliceto take advantage of specializedblocking_read&blocking_write_and_flushimplementationsspawn_blocking->run_blocking