Linux: SA_ONSTACK lost on some Wails 2.12 apps; one-shot g_idle_add mitigation doesn't catch every case
Follow-up to #3965. The v2.12.0 fix prevents the crash for many apps, but not for ours. v2.11.0 has the same behaviour — this isn't a regression, just a case the v2.12 mitigation doesn't cover.
What we saw
App: nebari-dev/nebi at v0.10.5. Linux desktop build via Wails. Crash within ~400 ms of launch, every run:
Overriding existing handler for signal 10. Set JSC_SIGNAL_FOR_GC if you want WebKit to use a different signal
signal 11 received but handler not on signal stack
mp.gsignal stack [...], mp.g0 stack [...]
fatal error: non-Go code set up signal handler without SA_ONSTACK flag
runtime stack:
runtime.throw(...) runtime/panic.go:1096
runtime.sigNotOnStack(0xb, …) runtime/signal_unix.go:1116
runtime.adjustSignalStack2(…) runtime/signal_unix.go:601
runtime.adjustSignalStack(…) runtime/signal_unix.go:588
runtime.sigtrampgo(0xb, …) runtime/signal_unix.go:480
runtime.sigtramp() runtime/sys_linux_amd64.s:352
Top user-code frame is always a goroutine handling an asset-server request just after /api/v1/version returned 200. The SIGSEGV is not from user Go code — there's no nil deref, no Go-side fault.
Reproducer:
git clone https://github.com/nebari-dev/nebi
cd nebi && git checkout v0.10.5
sudo apt-get install -y libgtk-3-dev libwebkit2gtk-4.1-dev
go install github.com/wailsapp/wails/v2/cmd/wails@v2.12.0
make build-desktop
./build/bin/Nebi
Crashes in ~1 s. Bumping go.mod to wailsapp/wails/v2@v2.12.0 does not change this.
Bisect inside nebi (16 commits between two known-good/-bad tags):
| nebi commit |
crashes? |
v0.10.4 |
no |
v0.10.5 (and 1546d97, first bad) |
yes |
1546d97 skips a casbin/gorm bootstrap call (rbac.InitEnforcer) when the app runs in local-desktop mode. Two confirming patches at 1546d97:
| patch |
crashes? |
revert the skip (always run InitEnforcer) |
no |
keep skip, replace with time.Sleep(2 * time.Second) |
yes |
So wall-clock time isn't the variable — something about InitEnforcer running, not just running eventually, prevents the crash.
What we know fixes it
Dropping this file into the nebi build (Linux only) prevents the crash 100% of the time, with Wails 2.11.0 or 2.12.0:
//go:build linux
package main
/*
#include <signal.h>
static void fix(int s) {
struct sigaction st;
if (sigaction(s, NULL, &st) < 0) return;
if (!(st.sa_flags & SA_ONSTACK)) {
st.sa_flags |= SA_ONSTACK;
sigaction(s, &st, NULL);
}
}
static void fix_all(void) {
fix(SIGSEGV); fix(SIGBUS); fix(SIGFPE); fix(SIGILL); fix(SIGABRT);
}
*/
import "C"
import "time"
func init() {
go func() {
t := time.NewTicker(50 * time.Millisecond)
defer t.Stop()
for range t.C { C.fix_all() }
}()
}
With this file in place, the same nebi build runs indefinitely.
What we think is happening
Best guess, consistent with the evidence above:
- Something WebKit/JSC does on the WebView side installs signal handlers (at least for
SIGUSR1 — we see the explicit "Overriding existing handler for signal 10" warning every launch, suggesting JSC's GC-signal install runs) without preserving any SA_ONSTACK flag Go's runtime had set.
- Go's runtime calls
sigaction with SA_ONSTACK from runtime.minit every time it spawns a new M. So apps whose Go side does enough thread churn at startup (a DB pool warming up, regexes compiling, etc.) keep re-applying SA_ONSTACK over WebKit's clobber, and Go never sees a stray SIGSEGV on an M with the bad handler.
- v2.12.0's
g_idle_add(install_signal_handlers_idle, NULL) is a one-shot; if WebKit/JSC clobbers handlers after that idle pass (e.g. when the first JS context is actually created, which happens after gtk_init returns), there's nothing to re-fix them.
- The 50 ms ticker workaround above papers over this by doing what Go's
runtime.minit does, periodically, regardless of whether new Ms are being spawned.
This is a guess at mechanism — we have indirect evidence (the bisect, the workaround, the SIGUSR1 warning) but haven't instrumented WebKit's signal-installing path directly. Happy to dig further if useful.
Suggested fixes (in increasing order of involvement)
- Recurring re-application. Swap
g_idle_add for g_timeout_add on a low-frequency interval (e.g. 200 ms). Smallest change. The cost of a periodic sigaction is negligible.
- Hook the WebView's load-finished signal. Connect to
WEBKIT_LOAD_FINISHED (or notify::is-loading reaching false) and call install_signal_handlers() from the callback. By that point JSC's handlers are installed, so a one-shot re-fix is timed correctly.
- Both.
load-changed for the typical case, plus a low-frequency timer as belt-and-suspenders for sub-frames / service workers / on-demand JSC re-init.
We'd lean toward #2 alone for correctness, optionally with #1 added if there's concern about WebKit re-installing handlers after later events.
Environment
Wails: v2.12.0 (also v2.11.0)
OS: Ubuntu 26.04 LTS
libwebkit2gtk: libwebkit2gtk-4.1-0 2.52.3-0ubuntu0.26.04.2
Go: 1.24
Build tags: webkit2_41
Same crash signature appears in older issues #1570 and #2134, but those have user-code Go panics. This one's SIGSEGV is WebKit-side; no Go panic precedes it.
Linux: SA_ONSTACK lost on some Wails 2.12 apps; one-shot
g_idle_addmitigation doesn't catch every caseFollow-up to #3965. The v2.12.0 fix prevents the crash for many apps, but not for ours. v2.11.0 has the same behaviour — this isn't a regression, just a case the v2.12 mitigation doesn't cover.
What we saw
App: nebari-dev/nebi at
v0.10.5. Linux desktop build via Wails. Crash within ~400 ms of launch, every run:Top user-code frame is always a goroutine handling an asset-server request just after
/api/v1/versionreturned 200. The SIGSEGV is not from user Go code — there's no nil deref, no Go-side fault.Reproducer:
Crashes in ~1 s. Bumping
go.modtowailsapp/wails/v2@v2.12.0does not change this.Bisect inside nebi (16 commits between two known-good/-bad tags):
v0.10.4v0.10.5(and1546d97, first bad)1546d97skips a casbin/gorm bootstrap call (rbac.InitEnforcer) when the app runs in local-desktop mode. Two confirming patches at1546d97:InitEnforcer)time.Sleep(2 * time.Second)So wall-clock time isn't the variable — something about
InitEnforcerrunning, not just running eventually, prevents the crash.What we know fixes it
Dropping this file into the nebi build (Linux only) prevents the crash 100% of the time, with Wails 2.11.0 or 2.12.0:
With this file in place, the same nebi build runs indefinitely.
What we think is happening
Best guess, consistent with the evidence above:
SIGUSR1— we see the explicit "Overriding existing handler for signal 10" warning every launch, suggesting JSC's GC-signal install runs) without preserving anySA_ONSTACKflag Go's runtime had set.sigactionwithSA_ONSTACKfromruntime.minitevery time it spawns a new M. So apps whose Go side does enough thread churn at startup (a DB pool warming up, regexes compiling, etc.) keep re-applyingSA_ONSTACKover WebKit's clobber, and Go never sees a straySIGSEGVon an M with the bad handler.g_idle_add(install_signal_handlers_idle, NULL)is a one-shot; if WebKit/JSC clobbers handlers after that idle pass (e.g. when the first JS context is actually created, which happens aftergtk_initreturns), there's nothing to re-fix them.runtime.minitdoes, periodically, regardless of whether new Ms are being spawned.This is a guess at mechanism — we have indirect evidence (the bisect, the workaround, the SIGUSR1 warning) but haven't instrumented WebKit's signal-installing path directly. Happy to dig further if useful.
Suggested fixes (in increasing order of involvement)
g_idle_addforg_timeout_addon a low-frequency interval (e.g. 200 ms). Smallest change. The cost of a periodicsigactionis negligible.WEBKIT_LOAD_FINISHED(ornotify::is-loadingreachingfalse) and callinstall_signal_handlers()from the callback. By that point JSC's handlers are installed, so a one-shot re-fix is timed correctly.load-changedfor the typical case, plus a low-frequency timer as belt-and-suspenders for sub-frames / service workers / on-demand JSC re-init.We'd lean toward #2 alone for correctness, optionally with #1 added if there's concern about WebKit re-installing handlers after later events.
Environment
Same crash signature appears in older issues #1570 and #2134, but those have user-code Go panics. This one's
SIGSEGVis WebKit-side; no Go panic precedes it.