Skip to content

Debugging: add builtin gdbstub component.#12771

Merged
cfallin merged 9 commits intobytecodealliance:mainfrom
cfallin:gdbstub-component
Mar 24, 2026
Merged

Debugging: add builtin gdbstub component.#12771
cfallin merged 9 commits intobytecodealliance:mainfrom
cfallin:gdbstub-component

Conversation

@cfallin
Copy link
Copy Markdown
Member

@cfallin cfallin commented Mar 12, 2026

This adds a debug component that makes use of the debug-main world defined in #12756 and serves the gdbstub protocol, with Wasm extensions, compatible with LLDB.

This component is built and included inside the Wasmtime binary, and is loaded using the lower-level -D debugger=... debug-main option; the user doesn't need to specify the .wasm adapter component. Instead, the user simply runs wasmtime run -g <PORT> program.wasm ... and Wasmtime will load and prepare to run program.wasm as the debuggee, waiting for a gdbstub connection on the given TCP port before continuing.

The workflow is:

$ wasmtime run -g 1234 program.wasm
[ wasmtime starts and waits for connection ]

$ /opt/wasi-sdk/bin/lldb  # use LLDB from wasi-sdk release 32 or later
(lldb) process connect --plugin wasm connect://localhost:1234
Process 1 stopped
* thread #1, stop reason = signal SIGTRAP
    frame #0: 0x40000000000001cc
->  0x40000000000001cc: unreachable
    0x40000000000001cd: end
    0x40000000000001ce: local.get 0
    0x40000000000001d0: call   13
(lldb) si
Process 1 stopped
* thread #1, stop reason = instruction step into
    frame #0: 0x4000000000000184
->  0x4000000000000184: block
    0x4000000000000186: block
    0x4000000000000188: global.get 1
    0x400000000000018e: i32.const 3664
[ ... ]

This makes use of the gdbstub third-party crate, into which I've upstreamed support for the Wasm extensions in daniel5151/gdbstub#188, daniel5151/gdbstub#189, daniel5151/gdbstub#190, and daniel5151/gdbstub#192. (I'll add vets as part of this PR.)

@cfallin cfallin requested review from a team as code owners March 12, 2026 22:45
@cfallin cfallin requested review from dicej and removed request for a team March 12, 2026 22:45
@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Mar 12, 2026

This is stacked on top of #12756 until that one lands; only the last commit is new.

I haven't added end-to-end tests that spawn/interact with LLDB yet; depending on how that goes I might be able to include that here or might defer to another PR if that's OK.

@cfallin cfallin requested review from alexcrichton and removed request for dicej March 12, 2026 22:46
@cfallin cfallin force-pushed the gdbstub-component branch from d7959df to fc1f75a Compare March 12, 2026 22:47
@cfallin cfallin force-pushed the gdbstub-component branch 6 times, most recently from 2719201 to 71bd19d Compare March 13, 2026 08:01
@cfallin cfallin force-pushed the gdbstub-component branch 2 times, most recently from 34e9d51 to c0c1f02 Compare March 13, 2026 19:23
@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Mar 13, 2026

Rebased out #12756; should be good to review now.

@github-actions github-actions bot added the wizer Issues related to Wizer snapshotting, pre-initialization, and the `wasmtime wizer` subcommand label Mar 13, 2026
@github-actions
Copy link
Copy Markdown

Subscribe to Label Action

cc @fitzgen

Details This issue or pull request has been labeled: "wizer"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: wizer

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton
Copy link
Copy Markdown
Member

Also, to clarify, @cfallin what depth would you like me to review the gdbstub component code itself? I'm happy more-or-less not reviewing it at all in the sense that it's well-sequestered, low-risk, and we'll likely iterate a lot on it in-tree. If you'd prefer though I could give it a closer look in any particular areas of interest.

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Mar 16, 2026

Also, to clarify, @cfallin what depth would you like me to review the gdbstub component code itself? I'm happy more-or-less not reviewing it at all in the sense that it's well-sequestered, low-risk, and we'll likely iterate a lot on it in-tree. If you'd prefer though I could give it a closer look in any particular areas of interest.

I guess my default answer is "to whatever extent allows us to fulfill policy and be comfortable having this code in-repo" :-) I agree that since it's sandboxed, the bar could be lower than for core runtime code. I guess the spirit of our code-review policies is still that someone should give it a once-over -- but up to you how deep you take that!

cfallin added a commit to cfallin/wasmtime that referenced this pull request Mar 17, 2026
…g forward to first opcode.

LLDB, when instructed to `break main`, looks at the DWARF metadata for
`main` and finds its PC range, then sets a breakpoint at the first
PC. This is reasonable behavior for native ISAs! That PC better be a
real instruction!

On Wasm, however, (i) toolchains typically emit the PC range as
*including* the *locals count*, a leb128 value that precedes the first
opcode and any types of locals; (ii) our gdbstub component that
bridges LLDB to our debug APIs (bytecodealliance#12771) only supports *exact* PCs for
breakpoints, so when presented with a PC that does not actually point
to an opcode, setting the breakpoint is effectively a no-op. There
will always be a difference of at least 1 byte between the
start-of-function offset and first-opcode offset (for a leb128 of `0`
for no locals), so a breakpoint "on" a function will never work.

I initially prototyped a fix that adds a sequence point at the start
of every function (which, again, is *guaranteed* to be distinct from
the first opcode), and the branch is [here], but I didn't like the
developer experience: this meant that when a breakpoint at a function
start fired, LLDB had a weird interstitial state where no line-number
applied.

The behavior that would be closer in line with "native" debug
expectations is that we add a bit of fuzzy-ish matching: setting a
breakpoint at function start should break at the first opcode, even if
that's a few (or many) bytes later. There are two options here:
special-case function start, or generally change the semantics of our
breakpoint API so that "add breakpoint at `pc`" means "add breakpoint
at next opcode at or after `pc`". I opted for the latter in this PR
because it's more consistent.

The logic is a little subtle because we're effectively defining an
n-to-1 mapping with this "snap-to-next" behavior, so we have to
refcount each breakpoint (consider setting a breakpoint at function
start *and* at the first opcode, then deleting them, one at a time). I
believe the result is self-consistent, even if a little more
complicated now. And, importantly, with bytecodealliance#12771 on top of this change,
it produces the expected behavior for the (very simple!) debug script
"`b main`; `continue`".

[here]: https://github.com/cfallin/wasmtime/tree/breakpoint-at-func-start
cfallin added a commit to cfallin/wasmtime that referenced this pull request Mar 19, 2026
…et PCs on traps.

This was not exposed earlier by (i) lack of handling of trap events in
the initial version of the gdbstub component in bytecodealliance#12771, and (ii) lack
of asserting some value for the PC on the top frame in the debug-event
test for traps. We got the PC for the last opcode in the function body
previously because, with no debug tags on the trapping path that calls
raise() (sunk to the bottom of the machine code body as cold code), we
scanned backward for the last tag metadata and found that
instead. Adding metadata according to the current source location when
emitting traps fixes this for all trapping events.
@cfallin cfallin force-pushed the gdbstub-component branch from e73119a to 2c56975 Compare March 19, 2026 01:23
github-merge-queue bot pushed a commit that referenced this pull request Mar 19, 2026
…g forward to first opcode. (#12791)

LLDB, when instructed to `break main`, looks at the DWARF metadata for
`main` and finds its PC range, then sets a breakpoint at the first
PC. This is reasonable behavior for native ISAs! That PC better be a
real instruction!

On Wasm, however, (i) toolchains typically emit the PC range as
*including* the *locals count*, a leb128 value that precedes the first
opcode and any types of locals; (ii) our gdbstub component that
bridges LLDB to our debug APIs (#12771) only supports *exact* PCs for
breakpoints, so when presented with a PC that does not actually point
to an opcode, setting the breakpoint is effectively a no-op. There
will always be a difference of at least 1 byte between the
start-of-function offset and first-opcode offset (for a leb128 of `0`
for no locals), so a breakpoint "on" a function will never work.

I initially prototyped a fix that adds a sequence point at the start
of every function (which, again, is *guaranteed* to be distinct from
the first opcode), and the branch is [here], but I didn't like the
developer experience: this meant that when a breakpoint at a function
start fired, LLDB had a weird interstitial state where no line-number
applied.

The behavior that would be closer in line with "native" debug
expectations is that we add a bit of fuzzy-ish matching: setting a
breakpoint at function start should break at the first opcode, even if
that's a few (or many) bytes later. There are two options here:
special-case function start, or generally change the semantics of our
breakpoint API so that "add breakpoint at `pc`" means "add breakpoint
at next opcode at or after `pc`". I opted for the latter in this PR
because it's more consistent.

The logic is a little subtle because we're effectively defining an
n-to-1 mapping with this "snap-to-next" behavior, so we have to
refcount each breakpoint (consider setting a breakpoint at function
start *and* at the first opcode, then deleting them, one at a time). I
believe the result is self-consistent, even if a little more
complicated now. And, importantly, with #12771 on top of this change,
it produces the expected behavior for the (very simple!) debug script
"`b main`; `continue`".

[here]: https://github.com/cfallin/wasmtime/tree/breakpoint-at-func-start
github-merge-queue bot pushed a commit that referenced this pull request Mar 19, 2026
…et PCs on traps. (#12802)

This was not exposed earlier by (i) lack of handling of trap events in
the initial version of the gdbstub component in #12771, and (ii) lack
of asserting some value for the PC on the top frame in the debug-event
test for traps. We got the PC for the last opcode in the function body
previously because, with no debug tags on the trapping path that calls
raise() (sunk to the bottom of the machine code body as cold code), we
scanned backward for the last tag metadata and found that
instead. Adding metadata according to the current source location when
emitting traps fixes this for all trapping events.
cfallin added 3 commits March 19, 2026 16:54
This adds a debug component that makes use of the debug-main world
defined in bytecodealliance#12756 and serves the gdbstub protocol, with Wasm
extensions, compatible with LLDB.

This component is built and included inside the Wasmtime binary, and
is loaded using the lower-level `-D debugger=...` debug-main option;
the user doesn't need to specify the `.wasm` adapter
component. Instead, the user simply runs `wasmtime run -g <PORT>
program.wasm ...` and Wasmtime will load and prepare to run
`program.wasm` as the debuggee, waiting for a gdbstub connection on
the given TCP port before continuing.

The workflow is:

```
$ wasmtime run -g 1234 program.wasm
[ wasmtime starts and waits for connection ]

$ /opt/wasi-sdk/bin/lldb  # use LLDB from wasi-sdk release 32 or later
(lldb) process connect --plugin wasm connect://localhost:1234
Process 1 stopped
* thread #1, stop reason = signal SIGTRAP
    frame #0: 0x40000000000001cc
->  0x40000000000001cc: unreachable
    0x40000000000001cd: end
    0x40000000000001ce: local.get 0
    0x40000000000001d0: call   13
(lldb) si
Process 1 stopped
* thread #1, stop reason = instruction step into
    frame #0: 0x4000000000000184
->  0x4000000000000184: block
    0x4000000000000186: block
    0x4000000000000188: global.get 1
    0x400000000000018e: i32.const 3664
[ ... ]
```

This makes use of the `gdbstub` third-party crate, into which I've
upstreamed support for the Wasm extensions in daniel5151/gdbstub#188,
daniel5151/gdbstub#189, daniel5151/gdbstub#190, and
daniel5151/gdbstub#192. (I'll add vets as part of this PR.)
@cfallin cfallin force-pushed the gdbstub-component branch from 2c56975 to 5aef60e Compare March 19, 2026 23:55
Copy link
Copy Markdown
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing that I'm going to be gone for awhile after wasm.io and I don't want to leave this languishing. With the various comments I've left I think this is fine to land and iterate in-tree, and if you'd prefer feel free to defer anything to an issue and/or a follow-up PR.

@cfallin cfallin enabled auto-merge March 23, 2026 21:16
@cfallin cfallin added this pull request to the merge queue Mar 23, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 23, 2026
@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Mar 23, 2026

Merged failed due to crate publish checks seeing that wasmtime-internal-gdbstub-component-artifact doesn't exist on crates.io yet; I'm working through the runbook here at the moment and am blocked in getting someone with the right access to accept the crate ownership.

@cfallin cfallin added this pull request to the merge queue Mar 23, 2026
@cfallin cfallin removed this pull request from the merge queue due to a manual request Mar 23, 2026
@cfallin cfallin enabled auto-merge March 23, 2026 22:57
@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Mar 23, 2026

It seems our publish script checks that crates compile as-published, so the compile_error! in the artifact crate when built in isolation without the component crate is no-go; for now altered build.rs to generate an empty array instead. Since the feature is off by default and our published release artifacts will be built in a way that includes the actual component, the risk of unexpected behavior seems small enough to me for now, but I'm happy to iterate on this if anyone has better ideas!

@cfallin cfallin added this pull request to the merge queue Mar 23, 2026
@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Mar 23, 2026

Another merge queue failure: the check here for MSRV 1.91.0 fails because the gdbstub adapter uses wstd 0.6.6 which requires rustc 1.91.1. Do we technically still fit in the "N-2" policy if we require a patch release? If so should we bump to 1.91.1?

(This seems reasonable to me because in general someone stuck on 1.91 should be upgrading patch releases to fix bugs, but I'm curious if anyone has an objection)

cc @pchickey @alexcrichton @fitzgen

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 23, 2026
@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Mar 23, 2026

Ah actually I think we're just out-of-date with our MSRV -- #12828 to bump.

@cfallin cfallin added this pull request to the merge queue Mar 24, 2026
Merged via the queue into bytecodealliance:main with commit dbaaa92 Mar 24, 2026
46 checks passed
@cfallin cfallin deleted the gdbstub-component branch March 24, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wizer Issues related to Wizer snapshotting, pre-initialization, and the `wasmtime wizer` subcommand

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants