Add support for the Component Model Async
ABI
to wasm-tools, wit-bindgen, and wasmtime.
Although WASIp1 and WASIp2 both support multiplexing concurrent operations via
poll_oneoff and wasi:io/poll, respectively, neither of them support
composable concurrency. For example: composing two WASIp2 components, each of
which needs to do concurrent I/O, is not practical in general since only one of
them can call wasi:io/poll#poll at a time.
To address that limitation, WASIp3 is expected to rely on ABI extensions to the
Component Model supporting
async imports and exports, as well as built-in stream, future, and error
types. These extensions make asynchronous composition as easy for application
developers as synchronous composition. In addition, they neatly avoid the
"function coloring
problem"
by allowing async-lowered imports to call sync-lifted exports and vice-versa.
In addition to solving the problem of composable concurrency, this approach
removes the need for WASIp2's wasi:io package, which has proven problematic
for virtualization. In practice, virtualizing any interface that uses wasi:io
requires virtualizing all of them. In the new design, I/O is handled via
built-in component model types which need not (and cannot) be virtualized,
thereby avoiding the issue.
Finally, the new async lower and lift ABI allows guest binding generators such
as those in wit-bindgen to produce more idiomatic code -- especially for
languages with async/await-style concurrency like JavaScript/TypeScript,
Python, C#, and Rust. Application developers can simply import and export
async functions, leaving the details of inter-component task management to the
host. Languages like Go, which rely on stackful coroutines, will now be able to
support concurrency natively, without requiring an
Asyncify-style
transformation.
Relationship with the async-imports RFC
The earlier async-imports RFC proposed adding native async support to
Wasmtime, allowing embedders to call Wasm functions asynchronously such that a
Wasm call to an async host function may suspend the calling fiber and allow the
embedder to do other work concurrently. That proposal has since been
implemented and has proven quite valuable for interoperating with
e.g. tokio-based servers.
However, the scope of that RFC is limited to making host->Wasm and Wasm->host
calls asynchronous only from the host's perspective; from the guest's
perspective those calls are purely synchronous. This proposal will build upon
that work to support calls which are asynchronous and concurrent from both the
host's and guest's perspectives. This means introducing a third way to call
Wasm functions, as well as a third way to add host functions to a
wasmtime::component::LinkerInstance, described below.
I propose adding support to wasm-tools (e.g. wasmparser, wasm-encoder,
wit-component, etc.), wit-bindgen, and wasmtime for the new canonical ABI
options and built-ins described at a high level
here
and in more detail
here
(look for the 🔀 icon, which indicates the new async features). Note that, as
of this writing, the stream, future, and error features are not yet
included in the spec, but that work is in progress and expected to land soon.
Initially, this support will be gated by a disabled-by-default, experimental
feature flag (e.g. component-async). Once the specification and
implementation have reached an appropriate level of maturity and test coverage,
we can consider enabling the feature by default.
For reference, I've created a
prototype based on temporary
forks of wasm-tools, wit-bindgen, and wasmtime, demonstrating the
feasibility of supporting these features within the existing architectures of
each project. That repository includes high-level, end-to-end tests of each
major feature, along with a proof-of-concept implementation of the
wasi:http@0.3.0-draft proposal. Note that, as of this writing, the prototype
mostly predates the specification in the component-model repo and thus differs
from it regarding various details. I'll be reconciling the implementation with
the spec as part of the process of upstreaming this work. Also note that the
prototype uses a slightly different public API than is proposed here; the latter
represents my latest thinking and may be further modified based on feedback.
Regarding testing: I propose starting with a minimal suite of smoke tests as part of each initial PR given the experimental nature of this feature and the likelyhood of spec changes as we gather feedback on the design and implementation. As things stabilize, we will add more tests (including fuzz tests) and make sure we're fully covered prior to enabling the feature by default.
The changes required to wasm-tools are generally straightforward additions:
- Encoding, parsing, and validating the new canonical ABI options (
asyncandcallback) and functions (task.backpressure,task.return,task.wait,stream.new, etc.) wit-componentsupport for "componentizing" a module which uses the abovewit-parseradditions to representstreams,futures, anderrors. Note thatwit-parseralready has minimal support for representingstreams andfutures, but those representations reflect an older, now-obsolete proposal, so we'll need to make some updates.wasmprinterandwatsupport for WAT<->binary conversionwasm-smith/wit-smithsupport for test case generation (although this might not be covered in the initial PR)
A couple items to note for wit-component and wit-parser:
- As of this writing,
BuildTargets.md
does not yet specify name mangling conventions for async features. We'll need
to add new functions to the
NameManglingtrait defined invalidation.rsto support parsing the new async function names, but since there's no standard for them yet, we'll provide only stub implementations for theStandardimplementation, meaning users will need to use theLegacyimplementation until a standard is defined. - From a subtyping perspective the
stream<T>andfuture<T>types are invariant with respect toT. However, since they are anonymous types which can appear anywhere in a WIT file, we need some way for bindings generators to refer unambiguously to each such type when generating calls to e.g.stream.new,stream.drop, etc. I propose to handle this by identifying such types using a name mangling that includes the name of a WIT function in which the type appears (either as part of a parameter type or as part of the return type) along with an ordinal index indicating where the type appears when walking the types recursively left-to-right in depth-first order.- For example, given the WIT function
foo: func(x: future<future<u32>>, y: u32) -> stream<u8>, the import[future-new-0]foowould indicate a call tofuture.newfor the typefuture<u32>. Likewise,[future-new-1]foowould indicate a call tofuture.newforfuture<future<u32>>, and[stream-new-2]foowould indicate a call tostream.newforstream<u8>.
- For example, given the WIT function
As a first step, we'll add support for these new features to the Rust binding
generator (macro and CLI) only, with todo!() stubs for other languages. Based
on user interest and resources, we can incrementally add support for other
languages over time.
For the Rust generator, we'll add an async option to the generate! macro
(and a corresponding CLI option) to specify which imports and/or exports should
be generated as async functions, e.g.:
/// Specified which imports and/or exports should be async
///
/// Functions not explicitly listed here will be generated as synchronous bindings.
#[derive(Default, Debug, Clone)]
pub struct AsyncConfig {
/// List of fully-qualified import names (e.g. `wasi:http/types@0.3.0-draft#[static]body.finish`)
imports: HashSet<String>,
/// List of fully-qualified export names (e.g. `wasi:http/handler@0.3.0-draft#handle`)
exports: HashSet<String>,
}Note that the nonblocking function attribute proposal should allow the user to
avoid configuring this explicitly.
In addition, we'll bundle an async_support module containing the following:
- Low-level (
#[doc(hidden)]) support functions and types for managing tasks, callbacks, etc., invoked from generated async code - Helper types,
impls, and functions for creatingfutures andstreams, spawning tasks, blocking viatask.wait, and interoperating with thefuturescrate (e.g.Stream<Vec<T>>andSink<Vec<T>>impls for the read and write ends of astream<T>, respectively).
🎉 This is the fun one! 🎉 To summarize, I propose to add the following:
- A new
wasmtime::component::LinkerInstance::func_wrap_concurrentfunction for registering host functions which are concurrent from both the host and guest perspectives. c.f. the existingfunc_wrap(synchronous for both the host and guest) andfunc_wrap_async(synchronous for the guest; asynchronous for the host) functions. - A new
wasmtime::component::Func::call_concurrentfunction (and likewise forTypedFunc) for calling Wasm functions concurrently with the same instance. c.f. the existingcall(synchronous, requires a uniqueStorereference) andcall_async(asynchronous, but returns a future which requires a uniqueStorereference, so can't be run concurrently with other calls to the same instance). wasmtime::component::bindgen!support for generating bindings based onfunc_wrap_concurrentandcall_concurrent.- Support for the new canonical options (
async,callback) and built-in functions (task.return,task.wait,stream.new, etc.)- This includes fused adapter support for
async->async,async->sync, andsync->asyncinter-component calls, extending the existingsync->syncsupport. - This also includes support for creating, suspending, and resuming fibers
as necessary to support
async->synccalls correctly, as well as providing backpressure as appropriate.
- This includes fused adapter support for
- New public APIs for creating and using
streams,futures, anderrors, including passing them to and from Wasm guests.
Note that, with the component-async feature enabled, Wasmtime must be
prepared to handle async-lifted exports and async-lowered imports even if the
embedder is not using the new func_wrap_concurrent or call_concurrent
APIs. In other words, components using any combination of ABI options must be
run correctly regardless of the call style the embedder uses. For example, an
async-lifted Wasm export called using the synchronous Func::call function must
work. Likewise, an async-lowered Wasm import implemented via
Linker::func_wrap or Linker::func_wrap_async must also work, but note
neither of those methods allow guest-level concurrency, so they would only make
sense in cases where the host can complete the task without blocking or
awaiting.
For the above reasons, I propose that, once the component-async feature is
enabled by default, we deprecate func_wrap_async and modify the bindgen!
macro to stop using it by default (but still allow opting in to it via an option
during the deprecation period). Rationale: a host function which needs to
await must account for the possibility the it has been called via an
async-lowered import and thus should not block the host or the guest.
Conversely, a host function which is always non-blocking might as well use plain
old func_wrap.
On the other hand, there's no particular reason to deprecate call_async, which
may be more convenient than call_concurrent when the embedder only needs to
call one guest export at a time. Unlike func_wrap_async, call_async cannot
inhibit the guest's concurrency, so it's fine to use regardless of whether the
guest export is lifted sync or async.
The following sections provide more detailed descriptions of the items summarized above.
This allows the caller to register host functions with the LinkerInstance such
that multiple calls to such functions can run concurrently. This isn't possible
with the existing func_wrap_async method because it takes a function which
returns a future that owns a unique reference to the Store, meaning the
Store can't be used for anything else until the future resolves.
Ideally, we'd have a way to thread a StoreContextMut<T> through an arbitrary
Future such that it has access to the Store only while being polled
(i.e. between, but not across, await points). However, there's currently no
way to express that in async Rust, so we make do with a more awkward scheme:
/// Accepts a function which receives a `StoreContextMut<T>` and the
/// parameters received from the guest and returns a `Future` which does
/// _not_ have access to the store. The output of that `Future` is another
/// function which gets access to the store again before producing the final
/// result.
pub fn func_wrap_concurrent<Params, Return, F, N, FN>(&mut self, name: &str, f: F) -> Result<()>
where
N: FnOnce(StoreContextMut<T>) -> Result<Return> + 'static,
FN: Future<Output = N> + Send + Sync + 'static,
F: Fn(StoreContextMut<T>, Params) -> FN + Send + Sync + 'static,
Params: ComponentNamedList + Lift + 'static,
Return: ComponentNamedList + Lower + Send + Sync + 'static,In short, each function registered using func_wrap_concurrent gets access to
the Store twice: once before doing any concurrent operations (i.e. before
awaiting) and once afterward. This allows multiple calls to proceed
concurrently without any one of them monopolizing the store. In practice, this
results in some ugly signatures and bodies when implementing such functions,
unfortunately:
/// Implements `wasi:http@0.3.0-draft/types#[static]body.finish`
fn finish(
mut store: StoreContextMut<'_, Self::BodyData>,
this: Resource<Body>,
) -> impl Future<
Output = impl FnOnce(
StoreContextMut<'_, Self::BodyData>,
)
-> wasmtime::Result<Result<Option<Resource<Fields>>, ErrorCode>>
+ 'static,
> + Send
+ Sync
+ 'static {
let trailers = (|| {
let trailers = store.data_mut().table().delete(this)?.trailers;
trailers
.map(|v| v.receive(store.as_context_mut()))
.transpose()
})();
async move {
let trailers = match trailers {
Ok(Some(trailers)) => Ok(if let Ok(trailers) = trailers.await {
trailers
} else {
None
}),
Ok(None) => Ok(None),
Err(e) => Err(e),
};
component::for_any(move |_| Ok(Ok(trailers?)))
}
}This allows the host to call more than one Wasm export from the same instance
concurrently. Analogously to func_wrap_concurrent, it returns a Promise
object which does not require exclusive access to the Store in order to make
progress. Instead, either a single Promise or a collection of them may be
converted into a Future which, when polled, allows any outstanding Promise
for that Store (whether specified as input to the Future or not) to make
progress.
impl Func {
...
/// Invoke this function with the specified `params`, returning a `Promise`
/// that resolves to a tuple of the results and another `Promise`
/// representing the completion of the task.
pub fn call_concurrent(
&self,
params: &[Val],
) -> Promise<Result<(Vec<Val>, Promise<Result<()>>)>> { ... }
...
}
/// Represents result of a `Func::call_concurrent` invocation
///
/// Unlike a `Future`, instances of this type may make progress without being
/// polled directly.
struct Promise<T> { ... }
impl<T> Promise<T> {
/// Convert this instance into a `Future` which may be `await`ed until the
/// `Promise` resolves.
///
/// Note that this may allow other, unrelated `Promise`s to also make
/// progress.
fn into_future<'a, U>(
self,
store: StoreContextMut<'a, U>,
) -> impl Future<Output = T> + Send + 'a
where
T: Send,
{ ... }
/// Convert the specified `Promise`s into a `Future` which may be `await`ed
/// until one or more of the `Promise`s resolve.
///
/// Note that this may allow other, unrelated `Promise`s to also make
/// progress.
fn select<'a, U>(
store: StoreContextMut<'a, U>,
promises: Vec<Promise<T>>,
) -> impl Future<Output = (Vec<T>, Vec<Promise<T>>)> + Send + 'a
where
T: Send,
{ ... }
/// Map the output of the specified promise to another type.
fn map<V>(self, fun: impl FnOnce(T) -> V) -> Promise<V> { ... }
}I propose to modify the host binding generator to optionally generate bindings
that use func_wrap_concurrent rather than func_wrap_async for async imports.
This mode of operation would be added in three stages:
- Add a boolean
concurrent_importsoption to thewasmtime::component::bindgen!macro which defaults tofalse. - Once the
component-asyncfeature is enabled by default, change the default totrue. - Remove the option entirely once
func_wrap_asyncsupport is removed.
Likewise, we can add a concurrent_exports option which, when set to true,
tells the generator to generate call_concurrent calls rather than call_async
calls. As noted earlier, there's no reason to deprecate call_async, though,
and we can choose a default value for concurrent_exports based on user
feedback.
A lot of this work is mechanical: plumbing the options and built-in calls through various layers of Wasmtime code and processing them according to the specification:
- Translating the
wasmparserrepresentations into several stages ofwasmtime-environrepresentations - Generating fused adapters as necessary for each combination of import/export used (sync/sync, sync/async, async/sync, async/async)
- Generating trampolines in
wasmtime-cranelift - Handling built-in function calls at runtime in the
wasmtimecrate
The most interesting part is the last one. Wasmtime must now manage a top-level event loop kicked off by the first call to an async-lifted export, alternating between waiting for host tasks to make progress and delivering events to guest tasks (either by calling callback functions or resuming fibers).
async->sync calls are particularly interesting, since the host must either
call the sync export in a new fiber or queue the call for later if one is
already in progress for that instance. If and when that fiber makes a blocking
host call, it must be suspended so that the caller can continue to execute.
Fortunately, wasmtime-fiber already provides a robust API for managing fibers,
and Wasmtime's existing async support demonstrates how to use it effectively
in the context of an async runtime. The main innovation required here is to
support multiple fibers per top-level component instance and safely pass access
to the Store from one to another as appropriate.
Another interesting case is connecting a {stream|future}.send call with a
corresponding {stream|future}.receive call. Ideally, we'd handle this with a
fused adapter, except in this case we can't know statically which component
instance will be the receiver (and thus what string encoding it is using, etc.).
When a component sends data to the write end of a stream, the receiver end
could be held by any other component instance under the same top-level instance.
In general, if there are N components and M distinct payload types we don't want
to generate (N^2)*M fused adapters and select among them dynamically at runtime;
nor do we want to do bunch of static analysis to wittle down the number of
possible cases. Instead, we analyze the payload type ahead of time, using an
optimized, memcpy-based host call for "flat" payloads (i.e. those without
pointers or handles) and a slower host call for other types based on the
wasmtime::component::Val dynamic API.
Finally, note that streams and futures can generally be treated as owned
resource handles, except that they are generic with respect to their payload
type and cannot be borrowed. errors, on the other hand, have
reference-counted, immutable value semantics, meaning a given error can be
passed to another component but remain available in the original component until
explicitly dropped.
Ideally, the public API for streams and futures would look a lot like the
futures::channel::mpsc
and
futures::channel::oneshot
APIs, respectively. Or perhaps
AsyncRead and
AsyncWrite
APIs for the special case of stream<u8>. However, we have the same challenge
here as with call_concurrent: each of these types need a StoreContextMut<T>
to do their job, i.e. lifting from and lowering to Wasm linear memory (which may
involve cabi_realloc calls, etc.). Therefore, as with call_concurrent, we
use the Promise type described above to achieve Store-friendly concurrency.
pub struct FutureSender<T> { ... }
impl<T> FutureSender<T> {
pub fn send(self, value: T) -> Promise<Result<()>>
where
T: func::Lower + Send + Sync + 'static,
{ ... }
pub fn close<U, S: AsContextMut<Data = U>>(self, store: S) -> Result<()> { ... }
}
pub struct FutureReceiver<T> { ... }
impl<T> FutureReceiver<T> {
pub fn receive(self) -> Promise<Result<Option<T>>>
where
T: func::Lift + Sync + Send + 'static,
{ ... }
pub fn close<U, S: AsContextMut<Data = U>>(self, store: S) -> Result<()> { ... }
}
pub fn future<T, U, S: AsContextMut<Data = U>>(
store: S,
) -> Result<(FutureSender<T>, FutureReceiver<T>)> { ... }
pub struct StreamSender<T> { ... }
impl<T> StreamSender<T> {
pub fn send(&mut self, values: Vec<T>) -> Promise<Result<()>>
where
T: func::Lower + Send + Sync + 'static,
{ ... }
pub fn close<U, S: AsContextMut<Data = U>>(self, store: S) -> Result<()> { ... }
}
pub struct StreamReceiver<T> { ... }
impl<T> StreamReceiver<T> {
pub fn receive(&mut self) -> Promise<Result<Option<Vec<T>>>
where
T: func::Lift + Sync + Send + 'static,
{ ... }
pub fn close<U, S: AsContextMut<Data = U>>(self, store: S) -> Result<()> { ... }
}
pub fn stream<T, U, S: AsContextMut<Data = U>>(
store: S,
) -> Result<(StreamSender<T>, StreamReceiver<T>)> { ... }The error API, on the other hand, is quite simple since you can't do anything
with it except pass it around:
pub struct Error { ... }
pub fn error<U, S: AsContextMut<Data = U>>(store: S) -> Result<Error> { ... }We expect that errors will eventually have anyhow-style functionality
(diagnostic messages, backtraces, downcast-able payloads, etc.), but for now
they're just opaque handles.
- As noted above, the problem of interleaving unique
Storereferences withFutures and the rest of Rust'sasyncmachinery results in some pretty awkward APIs. I've tried to minimize the changes to Wasmtime as much as possible, but perhaps this calls for a refactor of thewasmtimecrate and API with both guest-level concurrency and thread-based parallelism in mind -- one which enables a more ergonomic async/stream/future API. - The host APIs for
streams described above deals withVec<T>types. We could optimize that API and implementation by using slices instead to avoid allocation overhead while still minimizing copies. Ideally, we'd provide an API for true zero-copy I/O via e.g.io_uringsuch that bytes are delivered directly to a guest's linear memory, but that will require more thought and can always be added leter.
In general, the details of the implementation can always be refined once we've established a sane public API.
- Is there a more ergonomic approach to interleaving unique
Storereferences with Rust'sasyncmachinery without sacrificing type and memory safety? - While implementing the
prototype, I found that
calling
func_wrap_concurrentrequired usingBox<dyn ...>trait objects to make the borrow checker happy, and that in turn required adding a'staticbound to theTtype argument toStoreContextMut<T>, which is unfortunate. I suspect this is a limitation in the Rust type system that could be fixed, but we should find out for sure. - What's the most expedient way to test callback-less, async-lifted exports? Go would be the first likely candidate to support this, but I'm not sure how much work that would be. LLVM-based toolchains use a shadow stack and will need to be modified to support multiple shadow stacks. Meanwhile, we can always use handwritten WAT.
- Async function calls and calls to certain canonical built-in functions (such
as
task.return) require converting between*mut dyn VMStoreandStoreContextMut<T>, which is highly unsafe for multiple reasons. One of those reasons is that converting from a (non-aliasable)&mutto an (aliasable)*mutand then back to a (non-aliasable)&mutinvalidates the original&mut-- i.e. it must no longer exist or be used. In my prototype, I tried to address this by making every function that takes aStoreContextMut<T>and possibly does such a&mut->*mut->&mutconversion also return aStoreContextMut<T>(in addtion to anything else it needs to return). The caller is then required to use the returnedStoreContextMut<T>rather than the original one (i.e. no reborrow tricks). Is that sufficient to ensure soundness? Are there more robust options?