add a join_background_tasks method to Host{In,Out}putStream and WasiCtx. #6823

pchickey · 2023-08-08T22:53:08Z

This method flushes output and terminates background tasks. Background tasks still terminate as part of Drop!

The problem with the current implementation is that there is no way to wait for output buffered in host background tasks to flush before aborting those tasks as part of dropping the Store/Table. This means that e.g. for a trivial component that prints "hello world\n" to stdout and returns, if the Store&Table drop immediately after execution of the component completes, there is a race and the output may not happen at all.

I don't really love this design, but we got backed into a corner because all of the alternatives we could think up are worse:

We can't just get rid of the abort-on-drop completely ("daemonize" the tasks) because that means that streams that are connected to e.g. a stalled client connection will consume resources forever, which is not acceptable in some embeddings.
We can't ensure flushing on drop of a table/store because it requires an await, and rust does not have an async drop
We can't add an explicit destructor to a table/store which will terminate tasks, and if this destructor is not called tasks will "daemonize", because that means cancellation of the future executing a component before the explicit destructor is called will end up daemonizing the task.
We could configure all AsyncWriteStreams (and any other stream impls that spawn a task) at creation, or at insertion to the table, with whether they should daemonize on drop or not. This would mean plumbing a bunch of config into places it currently is not.

Therefore, the only behavior we could come up with was to keep the abort-on-drop behavior for background tasks, and add methods to ensure that background tasks are joined (finished) gracefully. This means that both sync and async users of WasiCtx will need to call the appropriate method to wait on background tasks. This is easy enough for users to miss, but we decided that the alternatives are worse.

Closes #6811

This method flushes output and terminates background tasks. Background tasks still terminate as part of Drop! The problem with the current implementation is that there is no way to wait for output buffered in host background tasks to flush before aborting those tasks as part of dropping the Store/Table. This means that e.g. for a trivial component that prints "hello world\n" to stdout and returns, if the Store&Table drop immediately after execution of the component completes, there is a race and the output may not happen at all. I don't really love this design, but we got backed into a corner because all of the alternatives we could think up are worse: * We can't just get rid of the abort-on-drop completely ("daemonize" the tasks) because that means that streams that are connected to e.g. a stalled client connection will consume resources forever, which is not acceptable in some embeddings. * We can't ensure flushing on drop of a table/store because it requires an await, and rust does not have an async drop * We can't add an explicit destructor to a table/store which will terminate tasks, and if this destructor is not called tasks will "daemonize", because that means cancellation of the future executing a component before the explicit destructor is called will end up daemonizing the task. * We could configure all AsyncWriteStreams (and any other stream impls that spawn a task) at creation, or at insertion to the table, with whether they should daemonize on drop or not. This would mean plumbing a bunch of config into places it currently is not. Therefore, the only behavior we could come up with was to keep the abort-on-drop behavior for background tasks, and add methods to ensure that background tasks are joined (finished) gracefully. This means that both sync and async users of WasiCtx will need to call the appropriate method to wait on background tasks. This is easy enough for users to miss, but we decided that the alternatives are worse. Closes #6811

pchickey · 2023-08-08T22:54:09Z

Waiting to merge until

@elliottt and I add a test to show that waiting for join_background_tasks flushes all output.
all uses of wasi in tests and examples use this new method to wait for output to flush, so that when people copy paste existing code they dont miss this important detail

rylev · 2023-08-09T06:59:34Z

I don't see a discussion of the alternative approach where background tasks are daemonized by default but there is a method that returns a future which completes when all background tasks are done or cancels the background tasks on Drop. This would allow for opting into cancelation (say after a timeout) rather than opting out of cancelation (which is easy to forget to do).

Such an approach seems like the ideal approach since it doesn't risk I/O operations not flushing and easily allows users to opt into ensuring that resources are not leaked. If this is really just a question of which default is better (best effort of I/O flushing vs protecting against potential resource leaks), I would personally pick I/O flushing. The approach in this PR seems like it will be a footgun that a lot of users will run into even if the docs are all updated to mention it.

alexcrichton · 2023-08-09T18:26:48Z

Oh that's discussed a bit in @pchickey's points above, but I can to help clarify some more. I do realize though that yesterday in our sync we ended up concluding daemonize-by-default was the right option, and both @pchickey and I feel the same way but the code ended up not working out too well. One of the main problems with that is that for embeddings where cancel-by-default is desired then that option needs to be active for the entire lifetime of WasiCtx since cancellation may not only happen at the end when tasks are being joined but also at any time during the execution of a guest. This means that the option of daemonize-or-cancel on drop is an up-front configuration option rather than a when-you-make-the-join-future option.

One idea would be to make this an option on WasiCtxBuilder but that required a degree of configuration plumbing which we basically weren't quite ready for. (not that it's impossible I think though). One part about daemonize-by-default though is that I'm not sure how that would work in the sync case because it's not clear to me when the Wasmtime CLI, for example, would perform the "wait" for background tasks to finish before the CLI exits the process.

Given all that the conclusion we reached was to go the opposite way and cancel-on-drop rather than daemonize and require these methods to be invoked to ensure output reaches its destination. I don't disagree it's a footgun though but I think that fixing that may need to happen at a different level rather than daemonize-or-cancel perhaps. One example there was what I mentioned yesterday where blocking-write does something different than what it does today, but as we talked about it's not clear what this would be exactly and whether it would actually work.

rylev · 2023-08-10T13:06:03Z

That's fine. I do think it would go a long way to make blocking writes only return once wasmtime has handed them off to the underlying host kernel.

I believe we discussed the possibility of adding a call to ready before this line to ensure we don't exit blocking_write until we're sure that the previous write has gone all the way through. Would that be possible?

alexcrichton · 2023-08-10T21:33:09Z

@pchickey I think there's one more block to handle as well (same as input streams)

pchickey · 2023-08-10T21:47:56Z

I believe we discussed the possibility of adding a call to ready before this line to ensure we don't exit blocking_write until we're sure that the previous write has gone all the way through. Would that be possible?

That would make blocking write into blocking write + flush, but only work for the one implementation today where ready also means flush. I don't think we want to make every blocking write include a flush, since the flush could hang if the other end of the pipe stops listening, and I also don't want to make ready mean flush on all writers, which limits the flexibility for future implementations.

This commit adds a new operation to `AsyncWriteStream` which flushes the output of the underlying writer. This will invoke the underlying `.flush()` to ensure that the flush request is propagated all the way through the host implementation and Tokio bits. The `join_background_tasks` method is then updated to call `flush` and discard its result. I've opted to do a bit of refactoring to avoid a lot of duplication in the new `flush` method with the preexisting methods, so there's a bit of movement here too.

This builds on the prior commit to redefine the `join_background_tasks` as purely a "flush" operation that flushes all output streams. This no longer attempts to gracefully join output tasks one-by-one since that should no longer be necessary and everything will get cleaned up during drop when `abort` calls are invoked.

alexcrichton · 2023-08-10T23:48:42Z

My previous comment was incorrect as that's related to input streams, not output streams, which is what this cares about.

I've additionally pushed up some more commits after some more discussion with @pchickey. The thinking is that it's pretty likely at this point for WASI to gain a "flush" on output-stream so we may as well start heading in that direction. Furthermore in local testing I found that the join_background_tasks as-is before my commits wasn't actually sufficient, we needed to literally call .flush() on the output stream before returning from join_background_tasks anyway.

So I've updated HostOutputStream to have a flush() method and updated it for the AsyncWriteStream adapter. Additionally I've removed join_background_tasks from HostInputStream and renamed the WasiCtx method to flush_output. This way it's now clear that the final operation to do is to flush all output.

The thinking afterwards, to solve your concern @rylev, is that WASI will gain flush as an inherent method on all output streams. That'll get wired up to the flush I just wrote, and additionally libraries like wasi-libc would be updated to flush at appropriate points. Furthermore the preview1-to-preview2 adapter will probably flush after all writes to stdout since preview1 has no notion of "flushing", which would doubly fix the issue you're seeing @rylev

…und_tasks

rylev · 2023-08-11T08:28:55Z

crates/wasi/src/preview2/ctx.rs

+    /// This will block the current thread up to the `timeout` specified, or
+    /// forever if `None` is specified, until all wasm output has been flushed
+    /// out.
+    pub fn sync_join_background_tasks<'a>(


Should this be sync_flush_output?

rylev · 2023-08-11T08:34:50Z

crates/wasi/src/preview2/pipe.rs

 }

+// Make sure the background task does not outlive the AsyncWriteStream handle.
+// In order to make sure output is flushed before dropping, use `join_background_tasks`.


Suggested change

// In order to make sure output is flushed before dropping, use `join_background_tasks`.

// In order to make sure output is flushed before dropping, use `flush_output`.

rylev · 2023-08-11T08:37:31Z

crates/wasi/src/preview2/ctx.rs

+    /// explicitly wait for this.
+    ///
+    /// In some embeddings, a misbehaving client might cause this graceful exit to await for an
+    /// unbounded amount of time, so we recommend bounding this with a timeout or other mechanism.


I do wonder if this should also take a timeout just like the sync version. This might encourage users to do the right thing and it has the added benefit of being symetrical with the sync version.

rylev · 2023-08-11T08:39:14Z

crates/wasi/src/preview2/pipe.rs

-    fn send(&mut self, bytes: Bytes) -> anyhow::Result<(usize, StreamState)> {
-        use tokio::sync::mpsc::error::TrySendError;
+    fn send(&mut self, msg: WriteMesage) -> anyhow::Result<bool> {
+        assert!(!self.has_pending_op());


Nit: is this necessary if you have to check for the pending op anyway down below?

pchickey · 2023-08-18T00:15:03Z

I know that various bca people are on the same page but for onlookers: this is just too sticky of an issue to hack on a fix in the host like this, and we identified other issues with output-stream, so we are going to redesign the way output-stream does backpressure to make this no longer be a problem.

pchickey requested a review from a team as a code owner August 8, 2023 22:53

pchickey requested review from fitzgen and removed request for a team August 8, 2023 22:53

pchickey requested review from alexcrichton and removed request for fitzgen August 8, 2023 22:54

github-actions bot added the wasi Issues pertaining to WASI label Aug 9, 2023

alexcrichton added 2 commits August 10, 2023 16:34

alexcrichton mentioned this pull request Aug 10, 2023

Refactor Wasmtime CLI to support components #6836

Merged

alexcrichton added 2 commits August 10, 2023 16:54

Merge remote-tracking branch 'origin/main' into pch/wasi_join_backgro…

82c1ebe

…und_tasks

Fix resetting state after an error

aa301b9

rylev reviewed Aug 11, 2023

View reviewed changes

pchickey closed this Aug 18, 2023

pchickey deleted the pch/wasi_join_background_tasks branch September 27, 2023 18:36

	// In order to make sure output is flushed before dropping, use `join_background_tasks`.
	// In order to make sure output is flushed before dropping, use `flush_output`.

add a join_background_tasks method to Host{In,Out}putStream and WasiCtx. #6823

add a join_background_tasks method to Host{In,Out}putStream and WasiCtx. #6823

Uh oh!

Conversation

pchickey commented Aug 8, 2023

Uh oh!

pchickey commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rylev commented Aug 9, 2023

Uh oh!

alexcrichton commented Aug 9, 2023

Uh oh!

rylev commented Aug 10, 2023

Uh oh!

alexcrichton commented Aug 10, 2023

Uh oh!

pchickey commented Aug 10, 2023

Uh oh!

alexcrichton commented Aug 10, 2023

Uh oh!

rylev Aug 11, 2023

Choose a reason for hiding this comment

Uh oh!

rylev Aug 11, 2023

Choose a reason for hiding this comment

Uh oh!

rylev Aug 11, 2023

Choose a reason for hiding this comment

Uh oh!

rylev Aug 11, 2023

Choose a reason for hiding this comment

Uh oh!

pchickey commented Aug 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pchickey commented Aug 8, 2023 •

edited

Loading