Skip to content

Linux: SA_ONSTACK lost on some Wails 2.12 apps; one-shot g_idle_add mitigation doesn't catch every case #5506

@Adam-D-Lewis

Description

@Adam-D-Lewis

Linux: SA_ONSTACK lost on some Wails 2.12 apps; one-shot g_idle_add mitigation doesn't catch every case

Follow-up to #3965. The v2.12.0 fix prevents the crash for many apps, but not for ours. v2.11.0 has the same behaviour — this isn't a regression, just a case the v2.12 mitigation doesn't cover.

What we saw

App: nebari-dev/nebi at v0.10.5. Linux desktop build via Wails. Crash within ~400 ms of launch, every run:

Overriding existing handler for signal 10. Set JSC_SIGNAL_FOR_GC if you want WebKit to use a different signal
signal 11 received but handler not on signal stack
mp.gsignal stack [...], mp.g0 stack [...]
fatal error: non-Go code set up signal handler without SA_ONSTACK flag

runtime stack:
runtime.throw(...)            runtime/panic.go:1096
runtime.sigNotOnStack(0xb, …) runtime/signal_unix.go:1116
runtime.adjustSignalStack2(…) runtime/signal_unix.go:601
runtime.adjustSignalStack(…)  runtime/signal_unix.go:588
runtime.sigtrampgo(0xb, …)    runtime/signal_unix.go:480
runtime.sigtramp()            runtime/sys_linux_amd64.s:352

Top user-code frame is always a goroutine handling an asset-server request just after /api/v1/version returned 200. The SIGSEGV is not from user Go code — there's no nil deref, no Go-side fault.

Reproducer:

git clone https://github.com/nebari-dev/nebi
cd nebi && git checkout v0.10.5
sudo apt-get install -y libgtk-3-dev libwebkit2gtk-4.1-dev
go install github.com/wailsapp/wails/v2/cmd/wails@v2.12.0
make build-desktop
./build/bin/Nebi

Crashes in ~1 s. Bumping go.mod to wailsapp/wails/v2@v2.12.0 does not change this.

Bisect inside nebi (16 commits between two known-good/-bad tags):

nebi commit crashes?
v0.10.4 no
v0.10.5 (and 1546d97, first bad) yes

1546d97 skips a casbin/gorm bootstrap call (rbac.InitEnforcer) when the app runs in local-desktop mode. Two confirming patches at 1546d97:

patch crashes?
revert the skip (always run InitEnforcer) no
keep skip, replace with time.Sleep(2 * time.Second) yes

So wall-clock time isn't the variable — something about InitEnforcer running, not just running eventually, prevents the crash.

What we know fixes it

Dropping this file into the nebi build (Linux only) prevents the crash 100% of the time, with Wails 2.11.0 or 2.12.0:

//go:build linux

package main

/*
#include <signal.h>

static void fix(int s) {
    struct sigaction st;
    if (sigaction(s, NULL, &st) < 0) return;
    if (!(st.sa_flags & SA_ONSTACK)) {
        st.sa_flags |= SA_ONSTACK;
        sigaction(s, &st, NULL);
    }
}

static void fix_all(void) {
    fix(SIGSEGV); fix(SIGBUS); fix(SIGFPE); fix(SIGILL); fix(SIGABRT);
}
*/
import "C"
import "time"

func init() {
    go func() {
        t := time.NewTicker(50 * time.Millisecond)
        defer t.Stop()
        for range t.C { C.fix_all() }
    }()
}

With this file in place, the same nebi build runs indefinitely.

What we think is happening

Best guess, consistent with the evidence above:

  • Something WebKit/JSC does on the WebView side installs signal handlers (at least for SIGUSR1 — we see the explicit "Overriding existing handler for signal 10" warning every launch, suggesting JSC's GC-signal install runs) without preserving any SA_ONSTACK flag Go's runtime had set.
  • Go's runtime calls sigaction with SA_ONSTACK from runtime.minit every time it spawns a new M. So apps whose Go side does enough thread churn at startup (a DB pool warming up, regexes compiling, etc.) keep re-applying SA_ONSTACK over WebKit's clobber, and Go never sees a stray SIGSEGV on an M with the bad handler.
  • v2.12.0's g_idle_add(install_signal_handlers_idle, NULL) is a one-shot; if WebKit/JSC clobbers handlers after that idle pass (e.g. when the first JS context is actually created, which happens after gtk_init returns), there's nothing to re-fix them.
  • The 50 ms ticker workaround above papers over this by doing what Go's runtime.minit does, periodically, regardless of whether new Ms are being spawned.

This is a guess at mechanism — we have indirect evidence (the bisect, the workaround, the SIGUSR1 warning) but haven't instrumented WebKit's signal-installing path directly. Happy to dig further if useful.

Suggested fixes (in increasing order of involvement)

  1. Recurring re-application. Swap g_idle_add for g_timeout_add on a low-frequency interval (e.g. 200 ms). Smallest change. The cost of a periodic sigaction is negligible.
  2. Hook the WebView's load-finished signal. Connect to WEBKIT_LOAD_FINISHED (or notify::is-loading reaching false) and call install_signal_handlers() from the callback. By that point JSC's handlers are installed, so a one-shot re-fix is timed correctly.
  3. Both. load-changed for the typical case, plus a low-frequency timer as belt-and-suspenders for sub-frames / service workers / on-demand JSC re-init.

We'd lean toward #2 alone for correctness, optionally with #1 added if there's concern about WebKit re-installing handlers after later events.

Environment

Wails:          v2.12.0 (also v2.11.0)
OS:             Ubuntu 26.04 LTS
libwebkit2gtk:  libwebkit2gtk-4.1-0 2.52.3-0ubuntu0.26.04.2
Go:             1.24
Build tags:     webkit2_41

Same crash signature appears in older issues #1570 and #2134, but those have user-code Go panics. This one's SIGSEGV is WebKit-side; no Go panic precedes it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions