Debugging: add builtin gdbstub component.#12771
Debugging: add builtin gdbstub component.#12771cfallin merged 9 commits intobytecodealliance:mainfrom
Conversation
|
This is stacked on top of #12756 until that one lands; only the last commit is new. I haven't added end-to-end tests that spawn/interact with LLDB yet; depending on how that goes I might be able to include that here or might defer to another PR if that's OK. |
d7959df to
fc1f75a
Compare
2719201 to
71bd19d
Compare
34e9d51 to
c0c1f02
Compare
|
Rebased out #12756; should be good to review now. |
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "wizer"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
Also, to clarify, @cfallin what depth would you like me to review the gdbstub component code itself? I'm happy more-or-less not reviewing it at all in the sense that it's well-sequestered, low-risk, and we'll likely iterate a lot on it in-tree. If you'd prefer though I could give it a closer look in any particular areas of interest. |
I guess my default answer is "to whatever extent allows us to fulfill policy and be comfortable having this code in-repo" :-) I agree that since it's sandboxed, the bar could be lower than for core runtime code. I guess the spirit of our code-review policies is still that someone should give it a once-over -- but up to you how deep you take that! |
…g forward to first opcode. LLDB, when instructed to `break main`, looks at the DWARF metadata for `main` and finds its PC range, then sets a breakpoint at the first PC. This is reasonable behavior for native ISAs! That PC better be a real instruction! On Wasm, however, (i) toolchains typically emit the PC range as *including* the *locals count*, a leb128 value that precedes the first opcode and any types of locals; (ii) our gdbstub component that bridges LLDB to our debug APIs (bytecodealliance#12771) only supports *exact* PCs for breakpoints, so when presented with a PC that does not actually point to an opcode, setting the breakpoint is effectively a no-op. There will always be a difference of at least 1 byte between the start-of-function offset and first-opcode offset (for a leb128 of `0` for no locals), so a breakpoint "on" a function will never work. I initially prototyped a fix that adds a sequence point at the start of every function (which, again, is *guaranteed* to be distinct from the first opcode), and the branch is [here], but I didn't like the developer experience: this meant that when a breakpoint at a function start fired, LLDB had a weird interstitial state where no line-number applied. The behavior that would be closer in line with "native" debug expectations is that we add a bit of fuzzy-ish matching: setting a breakpoint at function start should break at the first opcode, even if that's a few (or many) bytes later. There are two options here: special-case function start, or generally change the semantics of our breakpoint API so that "add breakpoint at `pc`" means "add breakpoint at next opcode at or after `pc`". I opted for the latter in this PR because it's more consistent. The logic is a little subtle because we're effectively defining an n-to-1 mapping with this "snap-to-next" behavior, so we have to refcount each breakpoint (consider setting a breakpoint at function start *and* at the first opcode, then deleting them, one at a time). I believe the result is self-consistent, even if a little more complicated now. And, importantly, with bytecodealliance#12771 on top of this change, it produces the expected behavior for the (very simple!) debug script "`b main`; `continue`". [here]: https://github.com/cfallin/wasmtime/tree/breakpoint-at-func-start
…et PCs on traps. This was not exposed earlier by (i) lack of handling of trap events in the initial version of the gdbstub component in bytecodealliance#12771, and (ii) lack of asserting some value for the PC on the top frame in the debug-event test for traps. We got the PC for the last opcode in the function body previously because, with no debug tags on the trapping path that calls raise() (sunk to the bottom of the machine code body as cold code), we scanned backward for the last tag metadata and found that instead. Adding metadata according to the current source location when emitting traps fixes this for all trapping events.
e73119a to
2c56975
Compare
…g forward to first opcode. (#12791) LLDB, when instructed to `break main`, looks at the DWARF metadata for `main` and finds its PC range, then sets a breakpoint at the first PC. This is reasonable behavior for native ISAs! That PC better be a real instruction! On Wasm, however, (i) toolchains typically emit the PC range as *including* the *locals count*, a leb128 value that precedes the first opcode and any types of locals; (ii) our gdbstub component that bridges LLDB to our debug APIs (#12771) only supports *exact* PCs for breakpoints, so when presented with a PC that does not actually point to an opcode, setting the breakpoint is effectively a no-op. There will always be a difference of at least 1 byte between the start-of-function offset and first-opcode offset (for a leb128 of `0` for no locals), so a breakpoint "on" a function will never work. I initially prototyped a fix that adds a sequence point at the start of every function (which, again, is *guaranteed* to be distinct from the first opcode), and the branch is [here], but I didn't like the developer experience: this meant that when a breakpoint at a function start fired, LLDB had a weird interstitial state where no line-number applied. The behavior that would be closer in line with "native" debug expectations is that we add a bit of fuzzy-ish matching: setting a breakpoint at function start should break at the first opcode, even if that's a few (or many) bytes later. There are two options here: special-case function start, or generally change the semantics of our breakpoint API so that "add breakpoint at `pc`" means "add breakpoint at next opcode at or after `pc`". I opted for the latter in this PR because it's more consistent. The logic is a little subtle because we're effectively defining an n-to-1 mapping with this "snap-to-next" behavior, so we have to refcount each breakpoint (consider setting a breakpoint at function start *and* at the first opcode, then deleting them, one at a time). I believe the result is self-consistent, even if a little more complicated now. And, importantly, with #12771 on top of this change, it produces the expected behavior for the (very simple!) debug script "`b main`; `continue`". [here]: https://github.com/cfallin/wasmtime/tree/breakpoint-at-func-start
…et PCs on traps. (#12802) This was not exposed earlier by (i) lack of handling of trap events in the initial version of the gdbstub component in #12771, and (ii) lack of asserting some value for the PC on the top frame in the debug-event test for traps. We got the PC for the last opcode in the function body previously because, with no debug tags on the trapping path that calls raise() (sunk to the bottom of the machine code body as cold code), we scanned backward for the last tag metadata and found that instead. Adding metadata according to the current source location when emitting traps fixes this for all trapping events.
This adds a debug component that makes use of the debug-main world defined in bytecodealliance#12756 and serves the gdbstub protocol, with Wasm extensions, compatible with LLDB. This component is built and included inside the Wasmtime binary, and is loaded using the lower-level `-D debugger=...` debug-main option; the user doesn't need to specify the `.wasm` adapter component. Instead, the user simply runs `wasmtime run -g <PORT> program.wasm ...` and Wasmtime will load and prepare to run `program.wasm` as the debuggee, waiting for a gdbstub connection on the given TCP port before continuing. The workflow is: ``` $ wasmtime run -g 1234 program.wasm [ wasmtime starts and waits for connection ] $ /opt/wasi-sdk/bin/lldb # use LLDB from wasi-sdk release 32 or later (lldb) process connect --plugin wasm connect://localhost:1234 Process 1 stopped * thread #1, stop reason = signal SIGTRAP frame #0: 0x40000000000001cc -> 0x40000000000001cc: unreachable 0x40000000000001cd: end 0x40000000000001ce: local.get 0 0x40000000000001d0: call 13 (lldb) si Process 1 stopped * thread #1, stop reason = instruction step into frame #0: 0x4000000000000184 -> 0x4000000000000184: block 0x4000000000000186: block 0x4000000000000188: global.get 1 0x400000000000018e: i32.const 3664 [ ... ] ``` This makes use of the `gdbstub` third-party crate, into which I've upstreamed support for the Wasm extensions in daniel5151/gdbstub#188, daniel5151/gdbstub#189, daniel5151/gdbstub#190, and daniel5151/gdbstub#192. (I'll add vets as part of this PR.)
2c56975 to
5aef60e
Compare
alexcrichton
left a comment
There was a problem hiding this comment.
I'm realizing that I'm going to be gone for awhile after wasm.io and I don't want to leave this languishing. With the various comments I've left I think this is fine to land and iterate in-tree, and if you'd prefer feel free to defer anything to an issue and/or a follow-up PR.
|
Merged failed due to crate publish checks seeing that |
|
It seems our publish script checks that crates compile as-published, so the |
|
Another merge queue failure: the check here for MSRV 1.91.0 fails because the gdbstub adapter uses wstd 0.6.6 which requires rustc 1.91.1. Do we technically still fit in the "N-2" policy if we require a patch release? If so should we bump to 1.91.1? (This seems reasonable to me because in general someone stuck on 1.91 should be upgrading patch releases to fix bugs, but I'm curious if anyone has an objection) |
|
Ah actually I think we're just out-of-date with our MSRV -- #12828 to bump. |
This adds a debug component that makes use of the debug-main world defined in #12756 and serves the gdbstub protocol, with Wasm extensions, compatible with LLDB.
This component is built and included inside the Wasmtime binary, and is loaded using the lower-level
-D debugger=...debug-main option; the user doesn't need to specify the.wasmadapter component. Instead, the user simply runswasmtime run -g <PORT> program.wasm ...and Wasmtime will load and prepare to runprogram.wasmas the debuggee, waiting for a gdbstub connection on the given TCP port before continuing.The workflow is:
This makes use of the
gdbstubthird-party crate, into which I've upstreamed support for the Wasm extensions in daniel5151/gdbstub#188, daniel5151/gdbstub#189, daniel5151/gdbstub#190, and daniel5151/gdbstub#192. (I'll add vets as part of this PR.)