Debugging: add an async debug-step-result interface, and catch traps with it. by cfallin · Pull Request #11826 · bytecodealliance/wasmtime

cfallin · 2025-10-09T07:32:36Z

(Stacked on top of #11769.)

As part of the new guest-debugging API, we want to allow the host to
execute the debugged guest code asynchronously, receiving its "debug
step" results each time a debugging-relevant event occurs. In the
fullness of time, this will include: traps, thrown exceptions,
breakpoints and watchpoints hit, single-steps, etc.

As a first step, this PR adds:

A notion of running an asynchronous function call in a "debug
session";
An API on that debug session object (which owns the store while the
function is running) that provides an async method to get the next
DebugStepResult;
An implementation that transmutes traps into a debug-step result,
allowing introspection of the guest state before the trap tears down
its stack;
Access to the stack introspection API provided by Wasmtime: implement debug instrumentation and basic host API to examine runtime state. #11769.

The implementation works by performing call injection from the signal
handler. The basic idea is that rather than perform an exception resume
from the signal handler, directly rewriting register state to unwind all
Wasm frames and return the error code to the host, we rewrite register
state to redirect to a handwritten assembly stub. This stub cannot
assume anything about register state (because we don't enforce any
constraints on register state at all the points that trapping signals
could occur); thus, it has to save every register. To allow this
trampoline to do anything at all, we inject a few parameters to it; the
original values of the parameter registers, as well as the original PC
(location of the trap), are saved to the store so they can be restored
into the register-save frame before the injected stub returns (if it
does).

The injected stub can then call into the runtime to perform a
fiber-suspend, setting a "debug yield" value that indicates that a trap
occurred.

A few notes on design constraints that forced my hand in several ways:

We need to inject a call by rewriting only register state, not pushing
a new frame from within the stack handler, because it appears that
Windows vectored exception handlers run on the same stack as the guest
and so there is no room to push an additional frame.
We need access to the store from the signal context now; we can get
this from TLS if we add a raw backpointer from VMStoreContext to
StoreOpaque. I believe we aren't committing any serious pointer
provenance or aliasing-rules crimes here, because dynamically we are
taking ownership of the store back when we're running within the
signal context (it's as if it was passed as an argument, via a very
circuitous route), but I could very well be wrong. I hope we can find
another working approach if so!
The trap suspend protocol looks a little like a resumable trap but
only because we need to properly tear down the future (otherwise we
get a panic on drop). Basically we resume back, and if the trap was a
non-resumable trap, the assembly stub returns not to the original PC
but the PC of another stub that does the original
resume-to-entry-handler action.

Everything is set up here for resumable traps (e.g. for breakpoints) to
also work, but I haven't implemented that yet; that's the next PR (and
requires some other machinery, most notably a private copy of code
memory and the ability to edit and re-publish it; and metadata to
indicate where to patch in breaks; and a pc += BREAK_SIZE somewhere to
skip over on resume).

This is a draft that works on Linux on x86-64; I still need to implement

aarch64, riscv64, s390x assembly stubs
Windows and macOS updates to trap handlers
equivalent behavior on the raise libcall too, not just
signal-based traps

but I wanted to post it now to communicate the current direction and get
any early feedback.

github-actions · 2025-10-09T09:45:40Z

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

If you added a new Config method, you wrote extensive documentation for
it.

Details

Our documentation should be of the following form:

Short, simple summary sentence.

More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.

Is this method fallible? If so, when can it return an error?

Can this method panic? If so, when does it panic?

# Example

Optional example here.

If you added a new Config method, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.

Details

For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.

Often, all that is required of you is to ensure that there is a knob for this
configuration option in wasmtime_fuzzing::Config (or one
of its nested structs).

Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See our docs on fuzzing for more details.
If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.

Details

To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the
.github/label-messager.json configuration file.

Learn more.

cfallin · 2025-10-12T22:29:14Z

This now has support for all our native architectures, but not macOS or Windows; integrating with the separate exception-handling thread on macOS is proving to be a little unexpectedly interesting and I think the form it may take is that the call-injection machinery I've built here will subsume the existing (non-state-preserving, for-unwinding-traps-only) call injection on macOS. I haven't looked in detail at Windows yet as I'll have to dust off my Windows VM (for the first time since implementing fastcall in 2021!) but I hope the only tricky bit there will be adding a fastcall variant of the x86-64 stub.

One interesting bit that might be good to discuss (cc @alexcrichton / @fitzgen) is the actual API for the "debug step" protocol. I'm relatively happy with the DebugSession in the current PR, with the async fn next(..) -> Option<DebugStepResult> that runs until a trap or exception or breakpoint or ... event. The dynamic store ownership protocol basically works with the safe Rust restrictions there too: one can get at the store only when the Wasm code yields, which is morally like a hostcall that passes a reborrowed &mut Store back. One can then read all store-owned state until one resumes. (To allow the debugger to take control back when running, the plan is that this will compose fine with epochs; we can make an epoch change a debug step event too.) There's the separate issue I wrote up in #11835 about whether "access to store during yields" means StoreOpaque or the whole Store but that's not the issue here.

The thing that I am finding interesting is how to enter a debug session. Right now I have a Func::call_debug that is like call_async but returns a DebugSession, not a future directly. That's fine but feels pretty ad-hoc, and importantly, will not compose with any wit-bindgen-generated host-side glue. For example, attaching a debugger to a WASI CLI-world or HTTP-world component won't be directly possible because the raw calls are inside generated code. So instead I'm considering an alternative (which was actually my first draft before getting lost in Futures Hell and finding an exit to this current world):

let session = store.with_debugger(|store| async {
  // ...
  wasi_instance.main(&mut store);
  // ...
  Ok(())
});

while let Some(step) = session.next().await {
  update_debug_ui(step);
  update_memory_view(mem.data(&mut session));
  // ...
}

The idea here is that there is that the session wraps an inner arbitrary future that runs with the store. I was tripped up before about the store dynamic ownership-passing protocol but the idea above that debug-steps are morally like hostcalls, so a debug yield passes ownership back, seems to free us from that question. What do you think?

(In the current implementation, nested debug sessions are forbidden dynamically, and the debug session sees only one Wasm activation deep i.e. from Wasm entry to Wasm exit and any hostcall is an atomic step; these simplifying restrictions are important to the coherency of the above too, IMHO.)

alexcrichton · 2025-10-13T18:44:27Z

integrating with the separate exception-handling thread on macOS is proving to be a little unexpectedly interesting

One idea to work with this is that, for all platforms, when the signal handler updates state to the trampoline to call out to the host anything clobbered is pushed to the stack instead of saved in the store. For example the stack pointer would be decremented by 32, the first 16 bytes being the saved return address/frame pointer (pretending it's a called frame) and the next 16 bytes would be 2 clobbered registers or something like that. That would work on macOS and all other platforms as well and means that the store isn't necessary in the signal handler routine at least.

Also, somewhat orthogonal, but I don't think that the asm stubs need to save all registers, only the caller-saved ones according to the native ABI, right?

The thing that I am finding interesting is how to enter a debug session

I'm not sure of a way other than what you've described you've done in this PR already with a call_debug. The call/call_async interfaces effectively fundamentally don't do what you want which is that they take and "lock" the store for the entire duration of the call. There's no way to interrupt the call halfway through and get the store back at the caller side. This works for host imports because once within the future we can temporarily loan the store to the host during a host call, but that doesn't work for giving the store back to the original caller. I do agree though that call_debug is not great and doesn't compose well with generated bindings, so I agree it'd be worthwhile to try to fix this.

What might work best is to go ahead and sketch out call_debug and test/implement with that for now and we can brainstorm later about a possible alternative. My suspicion is that it's going to look like run_concurrent from the component-model-async proposal.

cfallin · 2025-10-13T19:26:09Z

Also, somewhat orthogonal, but I don't think that the asm stubs need to save all registers, only the caller-saved ones according to the native ABI, right?

Ah, in this case we do actually need to save everything: we're interrupting guest code and we don't have regalloc clobbers on the trap-causing instruction so we need to effectively do a full context switch. (Including vector registers, so this is somewhat heavyweight.)

More is in this comment in this PR.

I'm not sure of a way other than what you've described you've done in this PR already with a call_debug.

What do you think about the with_debugger sketch above?

There's no way to interrupt the call halfway through and get the store back at the caller side.

I guess this is what I'm trying to get at with

The dynamic store ownership protocol basically works with the safe Rust restrictions there too: one can get at the store only when the Wasm code yields, which is morally like a hostcall that passes a reborrowed &mut Store back.

and also restated over in this comment; a hostcall is effectively an interrupt to a call, and so if one sees any debug-step yield that occurs at a trapping instruction as a fancy way of that instruction "calling" back to the host, I think this should actually work. Very important is the way that the lifetimes are tied together on the async fn next on the session: it takes a Pin<&mut Self> with the implicit lifetime there tied to the future, so it does own the store until the future is ready; but the future becomes ready (async fn returns) every time a "debug step result" / debug event occurs, which is effectively such a hostcall. Does that make sense? I think this capability is pretty important for the feasibility of the whole enterprise here so I'm happy to try to explain it another way if needed :-)

fitzgen · 2025-10-13T20:13:01Z

The thing that I am finding interesting is how to enter a debug session. Right now I have a Func::call_debug that is like call_async but returns a DebugSession, not a future directly. That's fine but feels pretty ad-hoc, and importantly, will not compose with any wit-bindgen-generated host-side glue. For example, attaching a debugger to a WASI CLI-world or HTTP-world component won't be directly possible because the raw calls are inside generated code. So instead I'm considering an alternative (which was actually my first draft before getting lost in Futures Hell and finding an exit to this current world):

This API makes sense to me, modulo bike shedding the exact naming and such.

We could alternatively, if we wanted to rearrange some deck chairs, make the API a callback on the Store that is given the debugging-equivalent of Caller and a step/break/etc event, instead of designing the API as a coroutine that returns many step/break/etc events until the computation completes. This is essentially what SpiderMonkey's Debugger API exposes: when you set a breakpoint, for example, you provide a callback that is invoked with the Debugger.Frame object when the breakpoint is hit (for us it would be that and the Caller) and you return a "continuation value" which is morally enum { Return(Value), Throw(Value), Panic }. This is potentially easier to integrate transparently with existing API usage (e.g. an existing call into host bindgen! code).

But these two approaches are basically the same at the end of the day, and we should be able to make either work if we can make one of them work.

fitzgen · 2025-10-13T20:14:48Z

(In the current implementation, nested debug sessions are forbidden dynamically, and the debug session sees only one Wasm activation deep i.e. from Wasm entry to Wasm exit and any hostcall is an atomic step; these simplifying restrictions are important to the coherency of the above too, IMHO.)

Callbacks, rather than coroutines, should also Just Work for multiple activations, I think.

cfallin · 2025-10-13T20:26:11Z

That's fair, yeah; the thing I am trying to aim for is a nice API for the debugger main event loop, and a callback-based approach would have to timeslice debugger and debuggee at the top level and use a channel to push events from the callback, then pause if waiting for a "continue" token; this is also more awkward in a world that we have the debugger component using all this from behind a wit interface. Whereas the async coroutine approach unfolds this in a way that can work in a single thread without channels; the program-under-test is "just" a thing that one can poll for the next output. But, either could work.

…with it. As part of the new guest-debugging API, we want to allow the host to execute the debugged guest code asynchronously, receiving its "debug step" results each time a debugging-relevant event occurs. In the fullness of time, this will include: traps, thrown exceptions, breakpoints and watchpoints hit, single-steps, etc. As a first step, this PR adds: - A notion of running an asynchronous function call in a "debug session"; - An API on that debug session object (which owns the store while the function is running) that provides an async method to get the next `DebugStepResult`; - An implementation that transmutes traps into a debug-step result, allowing introspection of the guest state before the trap tears down its stack; - Access to the stack introspection API provided by bytecodealliance#11769. The implementation works by performing *call injection* from the signal handler. The basic idea is that rather than perform an exception resume from the signal handler, directly rewriting register state to unwind all Wasm frames and return the error code to the host, we rewrite register state to redirect to a handwritten assembly stub. This stub cannot assume anything about register state (because we don't enforce any constraints on register state at all the points that trapping signals could occur); thus, it has to save every register. To allow this trampoline to do anything at all, we inject a few parameters to it; the original values of the parameter registers, as well as the original PC (location of the trap), are saved to the store so they can be restored into the register-save frame before the injected stub returns (if it does). The injected stub can then call into the runtime to perform a fiber-suspend, setting a "debug yield" value that indicates that a trap occurred. A few notes on design constraints that forced my hand in several ways: - We need to inject a call by rewriting only register state, not pushing a new frame from within the stack handler, because it appears that Windows vectored exception handlers run on the same stack as the guest and so there is no room to push an additional frame. - We need access to the store from the signal context now; we can get this from TLS if we add a raw backpointer from VMStoreContext to StoreOpaque. I *believe* we aren't committing any serious pointer provenance or aliasing-rules crimes here, because dynamically we are taking ownership of the store back when we're running within the signal context (it's as if it was passed as an argument, via a very circuitous route), but I could very well be wrong. I hope we can find another working approach if so! - The trap suspend protocol looks a little like a resumable trap but only because we need to properly tear down the future (otherwise we get a panic on drop). Basically we resume back, and if the trap was a non-resumable trap, the assembly stub returns not to the original PC but the PC of *another* stub that does the original resume-to-entry-handler action. Everything is set up here for resumable traps (e.g. for breakpoints) to also work, but I haven't implemented that yet; that's the next PR (and requires some other machinery, most notably a private copy of code memory and the ability to edit and re-publish it; and metadata to indicate where to patch in breaks; and a `pc += BREAK_SIZE` somewhere to skip over on resume). This is a draft that works on Linux; I still need to implement Windows and macOS updates to trap handlers, but I wanted to post it now to communicate the current direction and get any early feedback.

github-actions · 2025-10-15T06:49:52Z

Subscribe to Label Action

cc @fitzgen

Details

This issue or pull request has been labeled: "cranelift", "pulley", "wasmtime:api", "wasmtime:config"

Thus the following users have been cc'd because of the following labels:

fitzgen: pulley

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

This PR adds a notion of "debug events", and a mechanism in Wasmtime to associate a "debug handler" with a store such that the handler is invoked as-if it were an async hostcall on each event. The async handler owns the store while its future exists, so the whole "world" (within the store) is frozen and the handler can examine any state it likes with a `StoreContextMut`. Note that this callback-based scheme is a compromise: eventually, we would like to have a native async API that produces a stream of events, as sketched in bytecodealliance#11826 and in [this branch]. However, the async approach implemented naively (that is, with manual fiber suspends and with state passed on the store) suffers from unsoundness in the presence of dropped futures. Alex, Nick and I discussed this extensively and agreed that the `Accessor` mechanism is the right way to allow for a debugger to have "timesliced"/"shared" access to a store (only when polled/when an event is delivered), but we will defer that for now, because it requires additional work (mainly, converting existing async yield points in the runtime to "give up" the store with the `run_concurrent` mechanism). I'll file a followup issue to track that. The idea is that we can eventually build that when ready, but the API we provide to a debugger component can remain unchanged; only this plumbing and the glue to the debugger component will be reworked. With this scheme based on callbacks, we expect that one should be able to implement a debugger using async channels to communicate with the callback. The idea is that there would be a protocol where the callback sends a debug event to the debugger main loop elsewhere in the executor (e.g., over a Tokio channel or other async channel mechanism), and when the debugger wants to allow execution to continue, it sends a "continue" message back. In the meantime, while the world is paused, the debugger can send messages to the callback to query the `StoreContextMut` it has and read out state. This indirection/proxying of Store access is necessary for soundness: again, teleporting the Store out may look like it almost works ("it is like a mutable reborrow on a hostcall") except in the presence of dropped futures with sandwiched Wasm->host->Wasm situations. This PR implements debug events for a few cases that can be caught directly in the runtime, e.g., exceptions and traps raised just before re-entry to Wasm. Other kinds of traps, such as those normally implemented by host signals, require additional work (as in bytecodealliance#11826) to implement "hostcall injection" on signal reception; and breakpoints will be built on top of that. The point of this PR is only to get the initial plumbing in place for events. [this branch]: https://github.com/cfallin/wasmtime/tree/wasmtime-debug-async

…11895) * Debugging: add a debugger callback mechanism to handle debug events. This PR adds a notion of "debug events", and a mechanism in Wasmtime to associate a "debug handler" with a store such that the handler is invoked as-if it were an async hostcall on each event. The async handler owns the store while its future exists, so the whole "world" (within the store) is frozen and the handler can examine any state it likes with a `StoreContextMut`. Note that this callback-based scheme is a compromise: eventually, we would like to have a native async API that produces a stream of events, as sketched in #11826 and in [this branch]. However, the async approach implemented naively (that is, with manual fiber suspends and with state passed on the store) suffers from unsoundness in the presence of dropped futures. Alex, Nick and I discussed this extensively and agreed that the `Accessor` mechanism is the right way to allow for a debugger to have "timesliced"/"shared" access to a store (only when polled/when an event is delivered), but we will defer that for now, because it requires additional work (mainly, converting existing async yield points in the runtime to "give up" the store with the `run_concurrent` mechanism). I'll file a followup issue to track that. The idea is that we can eventually build that when ready, but the API we provide to a debugger component can remain unchanged; only this plumbing and the glue to the debugger component will be reworked. With this scheme based on callbacks, we expect that one should be able to implement a debugger using async channels to communicate with the callback. The idea is that there would be a protocol where the callback sends a debug event to the debugger main loop elsewhere in the executor (e.g., over a Tokio channel or other async channel mechanism), and when the debugger wants to allow execution to continue, it sends a "continue" message back. In the meantime, while the world is paused, the debugger can send messages to the callback to query the `StoreContextMut` it has and read out state. This indirection/proxying of Store access is necessary for soundness: again, teleporting the Store out may look like it almost works ("it is like a mutable reborrow on a hostcall") except in the presence of dropped futures with sandwiched Wasm->host->Wasm situations. This PR implements debug events for a few cases that can be caught directly in the runtime, e.g., exceptions and traps raised just before re-entry to Wasm. Other kinds of traps, such as those normally implemented by host signals, require additional work (as in #11826) to implement "hostcall injection" on signal reception; and breakpoints will be built on top of that. The point of this PR is only to get the initial plumbing in place for events. [this branch]: https://github.com/cfallin/wasmtime/tree/wasmtime-debug-async * Add some more tests. * Review feedback: comment updates, and make `debug` feature depend on `async`. * Review feedback: debug-hook setter requires guest debugging to be enabled. * Review feedback: ThrownException event; handle block_on errors; explicitly list UnwindState cases. * Add comment about load-bearing Send requirement. * Fix no-unwind build. * Review feedback: pass in hostcall error messages while keeping the trait object-safe. Co-authored-by: Alex Crichton <alex@alexcrichton.com> * Ignore divide-trapping test on Pulley for now. --------- Co-authored-by: Alex Crichton <alex@alexcrichton.com>

…nal-based traps. This repurposes the code from bytecodealliance#11826 to "inject calls": when in a signal handler, we can update the register state to redirect execution upon signal-handler return to a special hand-written trampoline, and this trampoline can save all registers and enter the host, just as if a hostcall had occurred.

cfallin · 2025-11-01T18:20:14Z

I'm closing this for now but I'll keep the branch around -- I'm going to write up an issue describing a simpler path, but we can keep the call-injection stubs around for future performance work one day, if we need them.

cfallin force-pushed the wasmtime-debug-traps branch 2 times, most recently from b0ec3e3 to 1cab4f4 Compare October 9, 2025 07:38

github-actions bot added cranelift Issues related to the Cranelift code generator wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime labels Oct 9, 2025

cfallin mentioned this pull request Oct 10, 2025

Consider non-monomorphized type rather than AsContext/AsContextMut for memory/table/GC accessors #11835

Closed

cfallin force-pushed the wasmtime-debug-traps branch from b239144 to 0fe5d9b Compare October 11, 2025 00:13

cfallin force-pushed the wasmtime-debug-traps branch from 3dfe44b to 18c8796 Compare October 13, 2025 06:19

cfallin force-pushed the wasmtime-debug-traps branch from 18c8796 to f6476d3 Compare October 15, 2025 04:51

github-actions bot added the pulley Issues related to the Pulley interpreter label Oct 15, 2025

cfallin mentioned this pull request Oct 21, 2025

Debugging: add a debugger callback mechanism to handle debug events. #11895

Merged

cfallin mentioned this pull request Oct 21, 2025

Debugging: build an async debugger API on top of run_concurrent #11896

Open

cfallin mentioned this pull request Oct 24, 2025

Debug: implement call injection to invoke debug event handlers at signal-based traps. #11930

Closed

cfallin closed this Nov 1, 2025

cfallin mentioned this pull request Nov 1, 2025

Debug: plan for simple libcall/instrumentation-based MVP #11964

Closed

10 tasks

cfallin added the wasmtime:debugging Issues related to debugging of JIT'ed code label Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging: add an async debug-step-result interface, and catch traps with it.#11826

Debugging: add an async debug-step-result interface, and catch traps with it.#11826
cfallin wants to merge 1 commit intobytecodealliance:mainfrom
cfallin:wasmtime-debug-traps

cfallin commented Oct 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

cfallin commented Oct 12, 2025 •

edited

Loading

Uh oh!

alexcrichton commented Oct 13, 2025

Uh oh!

cfallin commented Oct 13, 2025

Uh oh!

fitzgen commented Oct 13, 2025

Uh oh!

fitzgen commented Oct 13, 2025

Uh oh!

cfallin commented Oct 13, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

cfallin commented Nov 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cfallin commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 9, 2025

Label Messager: wasmtime:config

Uh oh!

cfallin commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcrichton commented Oct 13, 2025

Uh oh!

cfallin commented Oct 13, 2025

Uh oh!

fitzgen commented Oct 13, 2025

Uh oh!

fitzgen commented Oct 13, 2025

Uh oh!

cfallin commented Oct 13, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Subscribe to Label Action

Uh oh!

cfallin commented Nov 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cfallin commented Oct 9, 2025 •

edited

Loading

cfallin commented Oct 12, 2025 •

edited

Loading