Debugging: add a debugger callback mechanism to handle debug events. by cfallin · Pull Request #11895 · bytecodealliance/wasmtime

cfallin · 2025-10-21T01:26:42Z

This PR adds a notion of "debug events", and a mechanism in Wasmtime to associate a "debug handler" with a store such that the handler is invoked as-if it were an async hostcall on each event. The async handler owns the store while its future exists, so the whole "world" (within the store) is frozen and the handler can examine any state it likes with a StoreContextMut.

Note that this callback-based scheme is a compromise: eventually, we would like to have a native async API that produces a stream of events, as sketched in #11826 and in this branch. However, the async approach implemented naively (that is, with manual fiber suspends and with state passed on the store) suffers from unsoundness in the presence of dropped futures. Alex, Nick and I discussed this extensively and agreed that the Accessor mechanism is the right way to allow for a debugger to have "timesliced"/"shared" access to a store (only when polled/when an event is delivered), but we will defer that for now, because it requires additional work (mainly, converting existing async yield points in the runtime to "give up" the store with the run_concurrent mechanism). I'll file a followup issue to track that. The idea is that we can eventually build that when ready, but the API we provide to a debugger component can remain unchanged; only this plumbing and the glue to the debugger component will be reworked.

With this scheme based on callbacks, we expect that one should be able to implement a debugger using async channels to communicate with the callback. The idea is that there would be a protocol where the callback sends a debug event to the debugger main loop elsewhere in the executor (e.g., over a Tokio channel or other async channel mechanism), and when the debugger wants to allow execution to continue, it sends a "continue" message back. In the meantime, while the world is paused, the debugger can send messages to the callback to query the StoreContextMut it has and read out state. This indirection/proxying of Store access is necessary for soundness: again, teleporting the Store out may look like it almost works ("it is like a mutable reborrow on a hostcall") except in the presence of dropped futures with sandwiched Wasm->host->Wasm situations.

This PR implements debug events for a few cases that can be caught directly in the runtime, e.g., exceptions and traps raised just before re-entry to Wasm. Other kinds of traps, such as those normally implemented by host signals, require additional work (as in #11826) to implement "hostcall injection" on signal reception; and breakpoints will be built on top of that. The point of this PR is only to get the initial plumbing in place for events.

Thanks to Alex for help on working how how to keep this from requiring T: Send throughout Wasmtime!

This PR adds a notion of "debug events", and a mechanism in Wasmtime to associate a "debug handler" with a store such that the handler is invoked as-if it were an async hostcall on each event. The async handler owns the store while its future exists, so the whole "world" (within the store) is frozen and the handler can examine any state it likes with a `StoreContextMut`. Note that this callback-based scheme is a compromise: eventually, we would like to have a native async API that produces a stream of events, as sketched in bytecodealliance#11826 and in [this branch]. However, the async approach implemented naively (that is, with manual fiber suspends and with state passed on the store) suffers from unsoundness in the presence of dropped futures. Alex, Nick and I discussed this extensively and agreed that the `Accessor` mechanism is the right way to allow for a debugger to have "timesliced"/"shared" access to a store (only when polled/when an event is delivered), but we will defer that for now, because it requires additional work (mainly, converting existing async yield points in the runtime to "give up" the store with the `run_concurrent` mechanism). I'll file a followup issue to track that. The idea is that we can eventually build that when ready, but the API we provide to a debugger component can remain unchanged; only this plumbing and the glue to the debugger component will be reworked. With this scheme based on callbacks, we expect that one should be able to implement a debugger using async channels to communicate with the callback. The idea is that there would be a protocol where the callback sends a debug event to the debugger main loop elsewhere in the executor (e.g., over a Tokio channel or other async channel mechanism), and when the debugger wants to allow execution to continue, it sends a "continue" message back. In the meantime, while the world is paused, the debugger can send messages to the callback to query the `StoreContextMut` it has and read out state. This indirection/proxying of Store access is necessary for soundness: again, teleporting the Store out may look like it almost works ("it is like a mutable reborrow on a hostcall") except in the presence of dropped futures with sandwiched Wasm->host->Wasm situations. This PR implements debug events for a few cases that can be caught directly in the runtime, e.g., exceptions and traps raised just before re-entry to Wasm. Other kinds of traps, such as those normally implemented by host signals, require additional work (as in bytecodealliance#11826) to implement "hostcall injection" on signal reception; and breakpoints will be built on top of that. The point of this PR is only to get the initial plumbing in place for events. [this branch]: https://github.com/cfallin/wasmtime/tree/wasmtime-debug-async

fitzgen

LGTM! Will hold off on adding to the merge queue until @alexcrichton has a chance to look at this.

crates/wasmtime/src/runtime/debug.rs

crates/wasmtime/src/runtime/store.rs

alexcrichton

Would it make sense to also start something like a mask of events for a debug handler? For example if a HostcallError isn't desired then that could get masked out while other events would still be processed.

crates/wasmtime/src/runtime/debug.rs

crates/wasmtime/src/runtime/store.rs

crates/wasmtime/src/runtime/vm/traphandlers.rs

crates/wasmtime/src/runtime/store.rs

crates/wasmtime/src/runtime/debug.rs

…`async`.

…bled.

…citly list UnwindState cases.

…backs

cfallin · 2025-10-22T00:12:53Z

Would it make sense to also start something like a mask of events for a debug handler? For example if a HostcallError isn't desired then that could get masked out while other events would still be processed.

Maybe eventually, yeah -- that's a good idea. I might defer that to the point that we're building out the top half to see how we want to use it.

cfallin · 2025-10-22T00:24:15Z

OK, updated based on feedback -- final look before I merge? Thanks!

fitzgen

LGTM, thanks!

alexcrichton · 2025-10-22T15:22:32Z

With respect to clones, object-safety, lifetimes, and async fn, it's all possible. That at least compiles in the wasmtime crate, although I didn't go through tests yet. Would you prefer the design in this PR? Or ok if I have a follow-up to improve the API?

cfallin · 2025-10-22T15:29:07Z

Ah, neat, an internal adapter that gets monomorphized in the top layer where we have T -- I'll incorporate that, thanks!

…ait object-safe. Co-authored-by: Alex Crichton <alex@alexcrichton.com>

cfallin · 2025-10-22T20:36:36Z

All of that worked fine, thanks! I pulled in your patch and then added some more comments to the trait to indicate that it should be cheap to clone, i.e., is recommended to be an Arc under the covers or something similar.

cfallin · 2025-10-22T22:14:09Z

I pushed a change to avoid this failure where execution in Pulley did not emit a debug event for a divide-by-zero; this is because the trap-handling path is ever so slightly different in Pulley. I'll address in a followup (we're very close and it'd be honestly easier to keep Pulley working throughout this process than to exclude it in config); the cfg to limit to Cranelift-native architectures is in 97336c1.

…roperly in Pulley. This is a followup to bytecodealliance#11895 where I had disabled a test that failed to emit a debug event for a hostcall-generated trap on a divide-by-zero in Pulley. This PR allows that test to pass, and brings Pulley back to parity with native Cranelift in debug support currently. This was a bit of a "start to pull the thread and the entire finished mechanism materializes" PR; happy to consider ways to split it up if needed. In short, disabling signal-based traps on a Pulley configuration still relies on Pulley opcodes (e.g., divide) actually trapping, in a way that looks more like a "native ISA trap"; so I had to start to build out the actual trap-handling mechanisms. In any case, this will all be needed for followup work soon that will handle traps on native platforms (redirecting from signals by injecting calls), so this is not a distraction. This PR includes, ranked in decreasing order of "may scare other Wasmtime maintainers" score: - A raw `NonNull<dyn VMStore>` in the `CallThreadState`, with a long comment about provenance and mut-borrow exclusivity. This is needed right now to allow the interpreter to invoke the debug event handler, but will soon be needed when injecting hostcalls on signals, because a signal context also has no state available from the Wasm code other than what is in TLS. Hence, we need a way to get the store back from the Wasm when we do something that is "morally a hostcall" at a trapping instruction. I do believe this is sound, or at least close to it if not (please scrutinize carefully!); the basic idea is that the Wasm acts as an opaque blob in the middle, and the pointer comes out of it one way or another (the normal way, as the first arg to a hostcall, or the weird way, via TLS and the CallThreadState during a trap). Exclusive ownership is still clear at any given point and only one `&mut` ever exists in the current frame at a time. That said, I haven't tested with miri yet. This does require careful thought about the Wasm compilation, too; we need the moral equivalent of a `&mut self` reborrow as-if we were making a hostcall on each trapping instruction. It turns out that we already treat them as memory-fence instructions, so nothing loaded from the store can be moved or cached across them, and I've added a comment now about how this is load-bearing. - Updates to `CallThreadState`'s "exit state", normally set by the exit trampoline, that we now also set when we invoke a debug event handler during a trap context[^1] so that `Store::debug_frames` properly sees the current activation. This is a little more awkward than it could be because we store the *tramopline* FP, not last Wasm FP, and there is no trampoline frame in this case, so I've added a flag and some conditionals. I'm happy to refactor instead to go (back) to storing the last Wasm FP instead, with the extra load in the exit trampoline to compute that. - A whole bunch of plumbing, creating a large but mechanical diff, in the code translator to actually add debug tags on all traps and calls to `raise`. It turns out that once I got all of the above working in Pulley, the test disagreed about current Wasm PC between native and Pulley, and Pulley was right; native was getting it wrong because the `ud2` was sunk to the bottom in a cold block and, without tags, we scanned backward to pick up the last Wasm PC in the function. This new plumbing and addition of tags in all the appropriate places fixes that. [^1]: I keep saying "during a trap context" here, but to avoid any signal-safety scares, note that when this is done for native signals in a followup PR, we will inject a hostcall by modifying stack/register state and returning from the actual signal context, so it really is as-if we did a hostcall from a trapping instruction.

…roperly in Pulley. This is a followup to bytecodealliance#11895 where I had disabled a test that failed to emit a debug event for a hostcall-generated trap on a divide-by-zero in Pulley. This PR allows that test to pass, and brings Pulley back to parity with native Cranelift in debug support currently. This was a bit of a "start to pull the thread and the entire finished mechanism materializes" PR; happy to consider ways to split it up if needed. In short, disabling signal-based traps on a Pulley configuration still relies on Pulley opcodes (e.g., divide) actually trapping, in a way that looks more like a "native ISA trap"; so I had to start to build out the actual trap-handling mechanisms. In any case, this will all be needed for followup work soon that will handle traps on native platforms (redirecting from signals by injecting calls), so this is not a distraction. This PR includes, ranked in decreasing order of "may scare other Wasmtime maintainers" score: - A raw `NonNull<dyn VMStore>` in the `CallThreadState`, with a long comment about provenance and mut-borrow exclusivity. This is needed right now to allow the interpreter to invoke the debug event handler, but will soon be needed when injecting hostcalls on signals, because a signal context also has no state available from the Wasm code other than what is in TLS. Hence, we need a way to get the store back from the Wasm when we do something that is "morally a hostcall" at a trapping instruction. I do believe this is sound, or at least close to it if not (please scrutinize carefully!); the basic idea is that the Wasm acts as an opaque blob in the middle, and the pointer comes out of it one way or another (the normal way, as the first arg to a hostcall, or the weird way, via TLS and the CallThreadState during a trap). Exclusive ownership is still clear at any given point and only one `&mut` ever exists in the current frame at a time. That said, I haven't tested with miri yet. This does require careful thought about the Wasm compilation, too; we need the moral equivalent of a `&mut self` reborrow as-if we were making a hostcall on each trapping instruction. It turns out that we already treat them as memory-fence instructions, so nothing loaded from the store can be moved or cached across them, and I've added a comment now about how this is load-bearing. - Updates to `CallThreadState`'s "exit state", normally set by the exit trampoline, that we now also set when we invoke a debug event handler during a trap context[^1] so that `Store::debug_frames` properly sees the current activation. This is a little more awkward than it could be because we store the *trampoline* FP, not last Wasm FP, and there is no trampoline frame in this case, so I've added a flag and some conditionals. I'm happy to refactor instead to go (back) to storing the last Wasm FP instead, with the extra load in the exit trampoline to compute that. - A whole bunch of plumbing, creating a large but mechanical diff, in the code translator to actually add debug tags on all traps and calls to `raise`. It turns out that once I got all of the above working in Pulley, the test disagreed about current Wasm PC between native and Pulley, and Pulley was right; native was getting it wrong because the `raise` libcall was sunk to the bottom in a cold block and, without tags, we scanned backward to pick up the last Wasm PC in the function. This new plumbing and addition of tags in all the appropriate places fixes that. [^1]: I keep saying "during a trap context" here, but to avoid any signal-safety scares, note that when this is done for native signals in a followup PR, we will inject a hostcall by modifying stack/register state and returning from the actual signal context, so it really is as-if we did a hostcall from a trapping instruction.

cfallin requested a review from alexcrichton October 21, 2025 01:26

cfallin requested a review from a team as a code owner October 21, 2025 01:26

cfallin force-pushed the wasmtime-debug-callbacks branch from 3d94175 to 8272afb Compare October 21, 2025 02:15

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Oct 21, 2025

Add some more tests.

6841d75

cfallin mentioned this pull request Oct 21, 2025

Debugging: build an async debugger API on top of run_concurrent #11896

Open

fitzgen approved these changes Oct 21, 2025

View reviewed changes

crates/wasmtime/src/runtime/debug.rs Outdated Show resolved Hide resolved

crates/wasmtime/src/runtime/debug.rs Outdated Show resolved Hide resolved

crates/wasmtime/src/runtime/store.rs Show resolved Hide resolved

crates/wasmtime/src/runtime/store.rs Show resolved Hide resolved

alexcrichton approved these changes Oct 21, 2025

View reviewed changes

fitzgen reviewed Oct 21, 2025

View reviewed changes

crates/wasmtime/src/runtime/debug.rs Outdated Show resolved Hide resolved

cfallin added 5 commits October 21, 2025 16:29

Review feedback: comment updates, and make debug feature depend on …

08f1ba4

…`async`.

Review feedback: debug-hook setter requires guest debugging to be ena…

28c63d5

…bled.

Review feedback: ThrownException event; handle block_on errors; expli…

7e9d386

…citly list UnwindState cases.

Add comment about load-bearing Send requirement.

50bb08e

Merge remote-tracking branch 'upstream/main' into wasmtime-debug-call…

993ccc5

…backs

Fix no-unwind build.

e2ac7c2

fitzgen approved these changes Oct 22, 2025

View reviewed changes

cfallin added this pull request to the merge queue Oct 22, 2025

cfallin removed this pull request from the merge queue due to a manual request Oct 22, 2025

Review feedback: pass in hostcall error messages while keeping the tr…

367248c

…ait object-safe. Co-authored-by: Alex Crichton <alex@alexcrichton.com>

cfallin enabled auto-merge October 22, 2025 20:36

cfallin added this pull request to the merge queue Oct 22, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 22, 2025

Ignore divide-trapping test on Pulley for now.

97336c1

cfallin enabled auto-merge October 22, 2025 22:14

cfallin added this pull request to the merge queue Oct 22, 2025

Merged via the queue into bytecodealliance:main with commit e4190de Oct 22, 2025
45 checks passed

cfallin deleted the wasmtime-debug-callbacks branch October 22, 2025 22:49

cfallin mentioned this pull request Oct 23, 2025

Debug: add some infrastructure for catching traps, and handle traps properly in Pulley. #11921

Closed

cfallin mentioned this pull request Nov 1, 2025

Debug: plan for simple libcall/instrumentation-based MVP #11964

Closed

10 tasks

cfallin added the wasmtime:debugging Issues related to debugging of JIT'ed code label Dec 2, 2025

Conversation

cfallin commented Oct 21, 2025

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cfallin commented Oct 22, 2025

Uh oh!

cfallin commented Oct 22, 2025

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Oct 22, 2025

Uh oh!

Uh oh!

cfallin commented Oct 22, 2025

Uh oh!

cfallin commented Oct 22, 2025

Uh oh!

Uh oh!

cfallin commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants