Debugging: add an async debug-step-result interface, and catch traps with it.#11826
Debugging: add an async debug-step-result interface, and catch traps with it.#11826cfallin wants to merge 1 commit intobytecodealliance:mainfrom
Conversation
b0ec3e3 to
1cab4f4
Compare
Label Messager: wasmtime:configIt looks like you are changing Wasmtime's configuration options. Make sure to
DetailsTo modify this label's message, edit the To add new label messages or remove existing label messages, edit the |
b239144 to
0fe5d9b
Compare
|
This now has support for all our native architectures, but not macOS or Windows; integrating with the separate exception-handling thread on macOS is proving to be a little unexpectedly interesting and I think the form it may take is that the call-injection machinery I've built here will subsume the existing (non-state-preserving, for-unwinding-traps-only) call injection on macOS. I haven't looked in detail at Windows yet as I'll have to dust off my Windows VM (for the first time since implementing fastcall in 2021!) but I hope the only tricky bit there will be adding a fastcall variant of the x86-64 stub. One interesting bit that might be good to discuss (cc @alexcrichton / @fitzgen) is the actual API for the "debug step" protocol. I'm relatively happy with the The thing that I am finding interesting is how to enter a debug session. Right now I have a The idea here is that there is that the session wraps an inner arbitrary future that runs with the store. I was tripped up before about the store dynamic ownership-passing protocol but the idea above that debug-steps are morally like hostcalls, so a debug yield passes ownership back, seems to free us from that question. What do you think? (In the current implementation, nested debug sessions are forbidden dynamically, and the debug session sees only one Wasm activation deep i.e. from Wasm entry to Wasm exit and any hostcall is an atomic step; these simplifying restrictions are important to the coherency of the above too, IMHO.) |
3dfe44b to
18c8796
Compare
One idea to work with this is that, for all platforms, when the signal handler updates state to the trampoline to call out to the host anything clobbered is pushed to the stack instead of saved in the store. For example the stack pointer would be decremented by 32, the first 16 bytes being the saved return address/frame pointer (pretending it's a called frame) and the next 16 bytes would be 2 clobbered registers or something like that. That would work on macOS and all other platforms as well and means that the store isn't necessary in the signal handler routine at least. Also, somewhat orthogonal, but I don't think that the asm stubs need to save all registers, only the caller-saved ones according to the native ABI, right?
I'm not sure of a way other than what you've described you've done in this PR already with a What might work best is to go ahead and sketch out |
Ah, in this case we do actually need to save everything: we're interrupting guest code and we don't have regalloc clobbers on the trap-causing instruction so we need to effectively do a full context switch. (Including vector registers, so this is somewhat heavyweight.) More is in this comment in this PR.
What do you think about the
I guess this is what I'm trying to get at with
and also restated over in this comment; a hostcall is effectively an interrupt to a call, and so if one sees any debug-step yield that occurs at a trapping instruction as a fancy way of that instruction "calling" back to the host, I think this should actually work. Very important is the way that the lifetimes are tied together on the |
This API makes sense to me, modulo bike shedding the exact naming and such. We could alternatively, if we wanted to rearrange some deck chairs, make the API a callback on the But these two approaches are basically the same at the end of the day, and we should be able to make either work if we can make one of them work. |
Callbacks, rather than coroutines, should also Just Work for multiple activations, I think. |
|
That's fair, yeah; the thing I am trying to aim for is a nice API for the debugger main event loop, and a callback-based approach would have to timeslice debugger and debuggee at the top level and use a channel to push events from the callback, then pause if waiting for a "continue" token; this is also more awkward in a world that we have the debugger component using all this from behind a wit interface. Whereas the async coroutine approach unfolds this in a way that can work in a single thread without channels; the program-under-test is "just" a thing that one can poll for the next output. But, either could work. |
…with it. As part of the new guest-debugging API, we want to allow the host to execute the debugged guest code asynchronously, receiving its "debug step" results each time a debugging-relevant event occurs. In the fullness of time, this will include: traps, thrown exceptions, breakpoints and watchpoints hit, single-steps, etc. As a first step, this PR adds: - A notion of running an asynchronous function call in a "debug session"; - An API on that debug session object (which owns the store while the function is running) that provides an async method to get the next `DebugStepResult`; - An implementation that transmutes traps into a debug-step result, allowing introspection of the guest state before the trap tears down its stack; - Access to the stack introspection API provided by bytecodealliance#11769. The implementation works by performing *call injection* from the signal handler. The basic idea is that rather than perform an exception resume from the signal handler, directly rewriting register state to unwind all Wasm frames and return the error code to the host, we rewrite register state to redirect to a handwritten assembly stub. This stub cannot assume anything about register state (because we don't enforce any constraints on register state at all the points that trapping signals could occur); thus, it has to save every register. To allow this trampoline to do anything at all, we inject a few parameters to it; the original values of the parameter registers, as well as the original PC (location of the trap), are saved to the store so they can be restored into the register-save frame before the injected stub returns (if it does). The injected stub can then call into the runtime to perform a fiber-suspend, setting a "debug yield" value that indicates that a trap occurred. A few notes on design constraints that forced my hand in several ways: - We need to inject a call by rewriting only register state, not pushing a new frame from within the stack handler, because it appears that Windows vectored exception handlers run on the same stack as the guest and so there is no room to push an additional frame. - We need access to the store from the signal context now; we can get this from TLS if we add a raw backpointer from VMStoreContext to StoreOpaque. I *believe* we aren't committing any serious pointer provenance or aliasing-rules crimes here, because dynamically we are taking ownership of the store back when we're running within the signal context (it's as if it was passed as an argument, via a very circuitous route), but I could very well be wrong. I hope we can find another working approach if so! - The trap suspend protocol looks a little like a resumable trap but only because we need to properly tear down the future (otherwise we get a panic on drop). Basically we resume back, and if the trap was a non-resumable trap, the assembly stub returns not to the original PC but the PC of *another* stub that does the original resume-to-entry-handler action. Everything is set up here for resumable traps (e.g. for breakpoints) to also work, but I haven't implemented that yet; that's the next PR (and requires some other machinery, most notably a private copy of code memory and the ability to edit and re-publish it; and metadata to indicate where to patch in breaks; and a `pc += BREAK_SIZE` somewhere to skip over on resume). This is a draft that works on Linux; I still need to implement Windows and macOS updates to trap handlers, but I wanted to post it now to communicate the current direction and get any early feedback.
18c8796 to
f6476d3
Compare
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "cranelift", "pulley", "wasmtime:api", "wasmtime:config"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
This PR adds a notion of "debug events", and a mechanism in Wasmtime to associate a "debug handler" with a store such that the handler is invoked as-if it were an async hostcall on each event. The async handler owns the store while its future exists, so the whole "world" (within the store) is frozen and the handler can examine any state it likes with a `StoreContextMut`. Note that this callback-based scheme is a compromise: eventually, we would like to have a native async API that produces a stream of events, as sketched in bytecodealliance#11826 and in [this branch]. However, the async approach implemented naively (that is, with manual fiber suspends and with state passed on the store) suffers from unsoundness in the presence of dropped futures. Alex, Nick and I discussed this extensively and agreed that the `Accessor` mechanism is the right way to allow for a debugger to have "timesliced"/"shared" access to a store (only when polled/when an event is delivered), but we will defer that for now, because it requires additional work (mainly, converting existing async yield points in the runtime to "give up" the store with the `run_concurrent` mechanism). I'll file a followup issue to track that. The idea is that we can eventually build that when ready, but the API we provide to a debugger component can remain unchanged; only this plumbing and the glue to the debugger component will be reworked. With this scheme based on callbacks, we expect that one should be able to implement a debugger using async channels to communicate with the callback. The idea is that there would be a protocol where the callback sends a debug event to the debugger main loop elsewhere in the executor (e.g., over a Tokio channel or other async channel mechanism), and when the debugger wants to allow execution to continue, it sends a "continue" message back. In the meantime, while the world is paused, the debugger can send messages to the callback to query the `StoreContextMut` it has and read out state. This indirection/proxying of Store access is necessary for soundness: again, teleporting the Store out may look like it almost works ("it is like a mutable reborrow on a hostcall") except in the presence of dropped futures with sandwiched Wasm->host->Wasm situations. This PR implements debug events for a few cases that can be caught directly in the runtime, e.g., exceptions and traps raised just before re-entry to Wasm. Other kinds of traps, such as those normally implemented by host signals, require additional work (as in bytecodealliance#11826) to implement "hostcall injection" on signal reception; and breakpoints will be built on top of that. The point of this PR is only to get the initial plumbing in place for events. [this branch]: https://github.com/cfallin/wasmtime/tree/wasmtime-debug-async
This PR adds a notion of "debug events", and a mechanism in Wasmtime to associate a "debug handler" with a store such that the handler is invoked as-if it were an async hostcall on each event. The async handler owns the store while its future exists, so the whole "world" (within the store) is frozen and the handler can examine any state it likes with a `StoreContextMut`. Note that this callback-based scheme is a compromise: eventually, we would like to have a native async API that produces a stream of events, as sketched in bytecodealliance#11826 and in [this branch]. However, the async approach implemented naively (that is, with manual fiber suspends and with state passed on the store) suffers from unsoundness in the presence of dropped futures. Alex, Nick and I discussed this extensively and agreed that the `Accessor` mechanism is the right way to allow for a debugger to have "timesliced"/"shared" access to a store (only when polled/when an event is delivered), but we will defer that for now, because it requires additional work (mainly, converting existing async yield points in the runtime to "give up" the store with the `run_concurrent` mechanism). I'll file a followup issue to track that. The idea is that we can eventually build that when ready, but the API we provide to a debugger component can remain unchanged; only this plumbing and the glue to the debugger component will be reworked. With this scheme based on callbacks, we expect that one should be able to implement a debugger using async channels to communicate with the callback. The idea is that there would be a protocol where the callback sends a debug event to the debugger main loop elsewhere in the executor (e.g., over a Tokio channel or other async channel mechanism), and when the debugger wants to allow execution to continue, it sends a "continue" message back. In the meantime, while the world is paused, the debugger can send messages to the callback to query the `StoreContextMut` it has and read out state. This indirection/proxying of Store access is necessary for soundness: again, teleporting the Store out may look like it almost works ("it is like a mutable reborrow on a hostcall") except in the presence of dropped futures with sandwiched Wasm->host->Wasm situations. This PR implements debug events for a few cases that can be caught directly in the runtime, e.g., exceptions and traps raised just before re-entry to Wasm. Other kinds of traps, such as those normally implemented by host signals, require additional work (as in bytecodealliance#11826) to implement "hostcall injection" on signal reception; and breakpoints will be built on top of that. The point of this PR is only to get the initial plumbing in place for events. [this branch]: https://github.com/cfallin/wasmtime/tree/wasmtime-debug-async
This PR adds a notion of "debug events", and a mechanism in Wasmtime to associate a "debug handler" with a store such that the handler is invoked as-if it were an async hostcall on each event. The async handler owns the store while its future exists, so the whole "world" (within the store) is frozen and the handler can examine any state it likes with a `StoreContextMut`. Note that this callback-based scheme is a compromise: eventually, we would like to have a native async API that produces a stream of events, as sketched in bytecodealliance#11826 and in [this branch]. However, the async approach implemented naively (that is, with manual fiber suspends and with state passed on the store) suffers from unsoundness in the presence of dropped futures. Alex, Nick and I discussed this extensively and agreed that the `Accessor` mechanism is the right way to allow for a debugger to have "timesliced"/"shared" access to a store (only when polled/when an event is delivered), but we will defer that for now, because it requires additional work (mainly, converting existing async yield points in the runtime to "give up" the store with the `run_concurrent` mechanism). I'll file a followup issue to track that. The idea is that we can eventually build that when ready, but the API we provide to a debugger component can remain unchanged; only this plumbing and the glue to the debugger component will be reworked. With this scheme based on callbacks, we expect that one should be able to implement a debugger using async channels to communicate with the callback. The idea is that there would be a protocol where the callback sends a debug event to the debugger main loop elsewhere in the executor (e.g., over a Tokio channel or other async channel mechanism), and when the debugger wants to allow execution to continue, it sends a "continue" message back. In the meantime, while the world is paused, the debugger can send messages to the callback to query the `StoreContextMut` it has and read out state. This indirection/proxying of Store access is necessary for soundness: again, teleporting the Store out may look like it almost works ("it is like a mutable reborrow on a hostcall") except in the presence of dropped futures with sandwiched Wasm->host->Wasm situations. This PR implements debug events for a few cases that can be caught directly in the runtime, e.g., exceptions and traps raised just before re-entry to Wasm. Other kinds of traps, such as those normally implemented by host signals, require additional work (as in bytecodealliance#11826) to implement "hostcall injection" on signal reception; and breakpoints will be built on top of that. The point of this PR is only to get the initial plumbing in place for events. [this branch]: https://github.com/cfallin/wasmtime/tree/wasmtime-debug-async
…11895) * Debugging: add a debugger callback mechanism to handle debug events. This PR adds a notion of "debug events", and a mechanism in Wasmtime to associate a "debug handler" with a store such that the handler is invoked as-if it were an async hostcall on each event. The async handler owns the store while its future exists, so the whole "world" (within the store) is frozen and the handler can examine any state it likes with a `StoreContextMut`. Note that this callback-based scheme is a compromise: eventually, we would like to have a native async API that produces a stream of events, as sketched in #11826 and in [this branch]. However, the async approach implemented naively (that is, with manual fiber suspends and with state passed on the store) suffers from unsoundness in the presence of dropped futures. Alex, Nick and I discussed this extensively and agreed that the `Accessor` mechanism is the right way to allow for a debugger to have "timesliced"/"shared" access to a store (only when polled/when an event is delivered), but we will defer that for now, because it requires additional work (mainly, converting existing async yield points in the runtime to "give up" the store with the `run_concurrent` mechanism). I'll file a followup issue to track that. The idea is that we can eventually build that when ready, but the API we provide to a debugger component can remain unchanged; only this plumbing and the glue to the debugger component will be reworked. With this scheme based on callbacks, we expect that one should be able to implement a debugger using async channels to communicate with the callback. The idea is that there would be a protocol where the callback sends a debug event to the debugger main loop elsewhere in the executor (e.g., over a Tokio channel or other async channel mechanism), and when the debugger wants to allow execution to continue, it sends a "continue" message back. In the meantime, while the world is paused, the debugger can send messages to the callback to query the `StoreContextMut` it has and read out state. This indirection/proxying of Store access is necessary for soundness: again, teleporting the Store out may look like it almost works ("it is like a mutable reborrow on a hostcall") except in the presence of dropped futures with sandwiched Wasm->host->Wasm situations. This PR implements debug events for a few cases that can be caught directly in the runtime, e.g., exceptions and traps raised just before re-entry to Wasm. Other kinds of traps, such as those normally implemented by host signals, require additional work (as in #11826) to implement "hostcall injection" on signal reception; and breakpoints will be built on top of that. The point of this PR is only to get the initial plumbing in place for events. [this branch]: https://github.com/cfallin/wasmtime/tree/wasmtime-debug-async * Add some more tests. * Review feedback: comment updates, and make `debug` feature depend on `async`. * Review feedback: debug-hook setter requires guest debugging to be enabled. * Review feedback: ThrownException event; handle block_on errors; explicitly list UnwindState cases. * Add comment about load-bearing Send requirement. * Fix no-unwind build. * Review feedback: pass in hostcall error messages while keeping the trait object-safe. Co-authored-by: Alex Crichton <alex@alexcrichton.com> * Ignore divide-trapping test on Pulley for now. --------- Co-authored-by: Alex Crichton <alex@alexcrichton.com>
…nal-based traps. This repurposes the code from bytecodealliance#11826 to "inject calls": when in a signal handler, we can update the register state to redirect execution upon signal-handler return to a special hand-written trampoline, and this trampoline can save all registers and enter the host, just as if a hostcall had occurred.
…nal-based traps. This repurposes the code from bytecodealliance#11826 to "inject calls": when in a signal handler, we can update the register state to redirect execution upon signal-handler return to a special hand-written trampoline, and this trampoline can save all registers and enter the host, just as if a hostcall had occurred.
|
I'm closing this for now but I'll keep the branch around -- I'm going to write up an issue describing a simpler path, but we can keep the call-injection stubs around for future performance work one day, if we need them. |
(Stacked on top of #11769.)
As part of the new guest-debugging API, we want to allow the host to
execute the debugged guest code asynchronously, receiving its "debug
step" results each time a debugging-relevant event occurs. In the
fullness of time, this will include: traps, thrown exceptions,
breakpoints and watchpoints hit, single-steps, etc.
As a first step, this PR adds:
session";
function is running) that provides an async method to get the next
DebugStepResult;allowing introspection of the guest state before the trap tears down
its stack;
The implementation works by performing call injection from the signal
handler. The basic idea is that rather than perform an exception resume
from the signal handler, directly rewriting register state to unwind all
Wasm frames and return the error code to the host, we rewrite register
state to redirect to a handwritten assembly stub. This stub cannot
assume anything about register state (because we don't enforce any
constraints on register state at all the points that trapping signals
could occur); thus, it has to save every register. To allow this
trampoline to do anything at all, we inject a few parameters to it; the
original values of the parameter registers, as well as the original PC
(location of the trap), are saved to the store so they can be restored
into the register-save frame before the injected stub returns (if it
does).
The injected stub can then call into the runtime to perform a
fiber-suspend, setting a "debug yield" value that indicates that a trap
occurred.
A few notes on design constraints that forced my hand in several ways:
a new frame from within the stack handler, because it appears that
Windows vectored exception handlers run on the same stack as the guest
and so there is no room to push an additional frame.
this from TLS if we add a raw backpointer from VMStoreContext to
StoreOpaque. I believe we aren't committing any serious pointer
provenance or aliasing-rules crimes here, because dynamically we are
taking ownership of the store back when we're running within the
signal context (it's as if it was passed as an argument, via a very
circuitous route), but I could very well be wrong. I hope we can find
another working approach if so!
only because we need to properly tear down the future (otherwise we
get a panic on drop). Basically we resume back, and if the trap was a
non-resumable trap, the assembly stub returns not to the original PC
but the PC of another stub that does the original
resume-to-entry-handler action.
Everything is set up here for resumable traps (e.g. for breakpoints) to
also work, but I haven't implemented that yet; that's the next PR (and
requires some other machinery, most notably a private copy of code
memory and the ability to edit and re-publish it; and metadata to
indicate where to patch in breaks; and a
pc += BREAK_SIZEsomewhere toskip over on resume).
This is a draft that works on Linux on x86-64; I still need to implement
raiselibcall too, not justsignal-based traps
but I wanted to post it now to communicate the current direction and get
any early feedback.