fix(net): surface gvproxy bind errors through FFI#612
Conversation
47aefa2 to
0872494
Compare
|
Where the error is produced — go func() {
vn, err := virtualnetwork.New(tapConfig) // :440 the bind happens here (EADDRINUSE, EACCES, …)
if err != nil {
initErr <- err // :443 hand the error out of the goroutine
return
}
initErr <- nil // :446 success
// ...
}()Where it's captured — if err := <-initErr; err != nil { // :534 wait for the bind result
setErr(err) // copy the reason into *errOut
cancel(); delete(instances, id); listener.Close(); os.Remove(socketPath)
return -1 // :547 ← turn the failure into -1
}Pre-fix there was no The Rust side ( |
| os.Remove(socketPath) | ||
| }() | ||
|
|
||
| // Wait for virtualnetwork.New to complete before returning a valid id. |
There was a problem hiding this comment.
will this hurt performances in normal cases?
There was a problem hiding this comment.
gvproxy_create_latency_test integrated for normal case (virtual network.New wait < 0.5ms threshold), usually 0.1~0.3ms on Amazon c8i.xlarge ubuntu 22.04
A/B test (latency increase because of this waiting) also in this range
thx for the instruction
`virtualnetwork.New` failures (e.g. host port EADDRINUSE when another
process holds the port a user passed to `-p HOST:GUEST`) used to die
inside a goroutine at gvproxy-bridge/main.go:417, while the surrounding
`gvproxy_create` had already returned a valid id. The Rust runtime
shipped the broken socket downstream, the guest booted, and the
failure surfaced ~20s later as "DNS lookup … i/o timeout" from inside
the guest — multiple layers from the root cause.
Add an `initErr` channel so `gvproxy_create` waits for the
`virtualnetwork.New` result before returning. On failure it tears
down the instance and returns -1; the Rust runtime maps that to
`Network("gvproxy_create failed")` so the user sees a clear, named,
fail-fast error in well under a second.
Regression guard: `src/cli/tests/gvproxy_port_conflict.rs` —
non-boxlite `TcpListener` holds an ephemeral host port, then
`boxlite run -p PORT:80 alpine:latest true` must exit non-zero with
stderr containing "gvproxy_create failed".
The previous commit surfaced bind failures as `gvproxy_create failed`,
but that string alone doesn't tell the user *why* — they still have to
read box debug files to find "address already in use".
Extend `gvproxy_create` with an `errOut **C.char` out-parameter; on
each -1 return path the Go side writes the underlying `err.Error()` as
a heap-allocated C string. The Rust caller reads, frees, and folds
the detail into `BoxliteError::Network`, so `boxlite run` stderr now
reads:
Error: network error: gvproxy_create failed:
cannot add network services: listen tcp 0.0.0.0:27499:
bind: address already in use
instead of an opaque "gvproxy_create failed".
Regression guard: `gvproxy_port_conflict_fails_fast_with_named_error`
now also asserts stderr contains "address already in use" — locks down
the FFI plumbing against future regressions that would re-collapse
the message.
The 3 assertions guard distinct regression classes — boxlite's exit code, the named gvproxy error, and the underlying OS-level bind detail. With `assert!` short-circuiting, a regression that breaks all three only shows the first failure on the test panic, forcing iterative debugging. Collect each check into a Vec and emit a single combined panic at the end so a maintainer sees every level that regressed in one test run. Failure mode on plain main (verified): "3 of 3 checks failed: - L1 [boxlite rc != 0]: ... - L2 [stderr names gvproxy]: ... - L3 [stderr carries OS detail]: ..."
Companion to gvproxy_port_conflict_fails_fast_with_named_error, exercising a different `virtualnetwork.New` bind failure mode: EACCES when a non-root caller maps a privileged host port (e.g. -p 80:80), instead of EADDRINUSE when the port is busy. Same FFI/error plumbing, different kernel error string. Confirms the fix surfaces whatever the kernel returned at bind time rather than hard-coding a port-conflict shortcut. Verified on main worktree: 3 of 3 soft-assertions fire (boxlite rc=0, stderr contains neither "gvproxy_create failed" nor "permission denied" — just qcow2 warns and "Auto-stopping non-detached box"). On the fix branch, all three pass. Skipped when the runner has permission to bind <1024 (root / CAP_NET_BIND_SERVICE / lowered ip_unprivileged_port_start) — the test premise doesn't hold there.
0872494 to
93cdbc3
Compare
The fix makes gvproxy_create block on `<-initErr` until virtualnetwork.New finishes, prompting the review question of normal-case cost. virtualnetwork.New is the upper bound on that wait; this pins its median (over N runs, warmed, robust to one-off GC/scheduler spikes) under a 0.5ms budget — observed ~120µs. Two-side verified: a 600µs injection pushes the median to ~1.2ms and fails. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ix wins `buildAllowNetDNSZones` had three independent bugs that combined to silently sinkhole allow-listed hostnames inside the guest, returning 0.0.0.0 for the lifetime of the box: 1. **No retry/timeout on host-side resolution.** Each allow-listed hostname was resolved exactly once with `context.Background()` and no timeout. A momentary host hiccup (VPN flap, slow corp resolver, mDNSResponder churn) silently dropped the zone, and the symptom `allow_zones=N-1 total_zones=N` was the only signal — a warning most callers never noticed. Now: 4 attempts with exponential backoff (100/300/900/2700 ms) and a 2 s per-attempt deadline. Successful retries log `INFO allowNet: DNS resolution succeeded after retry`. 2. **Silent half-baked sinkhole.** When resolution definitively failed, the zone was just dropped and box.create proceeded as if nothing was wrong — producing a sinkhole that would 0.0.0.0 exactly the host the caller said they needed. Now: the resolver error propagates through `buildAllowNetDNSZones` → `buildDNSZones` → `buildTapConfig` → `gvproxy_create`, surfaced to the Rust caller via the new `errOut` channel from boxlite-ai#612 so the user-visible BoxliteError says `allowNet: resolve "iapi.example.com": ...` instead of "krun_start_enter returned: -22". 3. **Map-iteration zone order vs first-suffix-wins matcher.** gvisor-tap-vsock's DNS handler matches zones first-suffix-wins with no most-specific-match preference (services/dns/dns.go:51). We were emitting zones in Go map iteration order — random per process. With an allow-list of `[github.com, api.github.com, iapi.merck.com]`, an `iapi.merck.com.` query would sometimes hit the `com.` zone first (because `github.com` lives there), find no matching record name, and return that zone's `DefaultIP=0.0.0.0`. The same query hit `merck.com.` first on other runs and resolved correctly — explaining the "intermittent" reports. Now: zones are sorted longest-name-first before the slice is built, so the most-specific suffix always wins, and the catch-all root zone stays last. Also adds two pieces of operability that made this debuggable: - A build-identity log line on the first `gvproxy_create` of a shim process: `INFO gvproxy-bridge: build identity build_id=… vcs_revision=… vcs_modified=… go_version=…`. Lets a user confirm whether a freshly built shim is actually what's loaded — without it, "did the rebuild take effect?" is answered by guesswork. - An `INFO allowNet: host resolved` line per allow-listed hostname, with the resolved IP list and v4 count. Visible at INFO so zero-IPv4 / IPv6-only hosts no longer hide. Tests added (in `dns_filter_test.go`): - `RetriesTransientResolverFailure` — flaky resolver that fails the first 2 attempts then succeeds; asserts retry recovers and bakes the host's records. - `FailsClosedAfterRetries` — always-fail resolver; asserts the N-attempts-then-error contract (no zone with `Default=0.0.0.0` silently shipped). - `HonorsPerAttemptTimeout` — hanging resolver that blocks until ctx cancellation; asserts the per-attempt deadline fires (without this, an unresponsive corp DNS server hangs box.create indefinitely). - `PerHostFailureFailsAggregate` — multi-host allow-list where one host permafails; asserts the whole sinkhole build aborts and the failing host is named in the error. - `LongestSuffixWinsBeforeRoot` — pins the bug-3 fix: with `[github.com, api.github.com, raw.githubusercontent.com, codeload.github.com, iapi.merck.com]`, asserts every per-TLD zone precedes the `com.` zone in the output slice, so the first-match-wins matcher sees `merck.com.` before `com.`. The retry-loop timing constants are exposed as package-level `var`s so tests can override them without slowing the suite (the timing contract is what's pinned, not the specific durations). Two-side verification: with the production changes reverted but the new tests kept, the four reproducer tests fail with the expected signals (`expected retry to recover`, `expected 4 attempts got 1`, hangs to the 60s suite timeout, and `LongestSuffixWinsBeforeRoot`'s ordering assertion); restore the changes and all pass. Docs updated: - `docs/faq.md` — new Networking entry "An allow_net host resolves to 0.0.0.0 inside the box" with diagnostic steps (where the gvproxy log lives, what `allow_zones=N-1 total_zones=N` looks like vs the new retry/fail warnings, why the sinkhole is baked once at create time and recreating the box — not restarting it — is what propagates a fix). - `docs/reference/README.md` — bullet under the `network` notes describing host-side pre-resolution, retries, and fail-closed behavior so callers know `allow_net` entries must be resolvable on the host before `box.create`.
boxlite-ai#612 surfaced `virtualnetwork.New` bind errors out of the gvproxy goroutine via an ad-hoc `initErr` channel. Four sibling failure sites still fell into `logrus.Error + return` and went unobserved by the Rust runtime: main.go:454 OverrideTCPHandler — allow_net / MITM silently disabled main.go:475 transport.AcceptVfkit — VFKit transport silently dead (macOS) main.go:484 vn.AcceptVfkit — VFKit protocol silently dead main.go:496 listener.Accept — Qemu transport silently dead (Linux) main.go:510 vn.AcceptQemu — Qemu protocol silently dead OverrideTCPHandler in particular is a security-shaped bug: when allow_net is configured but OverrideTCPHandler fails, the box appears healthy AND traffic bypasses the filter. The Accept failures are boxlite-ai#612's symptom on a different code path: the VM "connects" but the protocol pump never runs, so 20s later the guest reports "DNS lookup … i/o timeout" or every TCP RST. This change introduces `ErrSink` as the single propagation entry point for all five sites: error_sink.go (new): - ErrSink with two channels: initErr (cap 1) + runtimeErr (cap 16) - Init(source, err) / WaitInit() pair — replaces boxlite-ai#612's ad-hoc channel - Runtime(source, err) / PollRuntime() pair — non-blocking on the producer side (queue-full drops with `runtime_err_dropped` warn + atomic counter); the Rust runtime polls via the new FFI export main.go: - GvproxyInstance gains `errSink *ErrSink` (init at instance creation) - The init goroutine now: sink.Init("virtualnetwork.New", err) if New fails sink.Init("OverrideTCPHandler", err) if override fails ← MOVED sink.Init("", nil) when all init succeeds OverrideTCPHandler MOVED into init phase: a failure here used to be silent and post-init; now it blocks gvproxy_create from returning, so the box.create call fails-fast instead of producing a half-baked sinkhole. - The four runtime-phase sites now also push to sink.Runtime(...) *in addition* to logging — log preserved for backwards-compat with existing field grep filters, sink call adds the structured surface. gvproxy_poll_runtime_error (new //export): - Returns the oldest unread runtime error as a heap-allocated C string (`gvproxy_free_string` to release), or nil if queue empty. - Intended for a Rust-side background task polling ~250ms; out of scope for this PR (Go-side framework only — Rust consumer is follow-up). Tests (error_sink_test.go, 9 tests): - Init contract: nil success, error wrapping with source, WaitInit blocks until Init is called - Runtime contract: nil no-op, FIFO drain via PollRuntime, queue-full drop counts, producer-never-blocks under backpressure, concurrent producers safe (observed + dropped = sent) - Render contract: RuntimeError.String contains source + cause + RFC3339 timestamp (so the operator-visible format is pinned) Two-sided context: the sink-wiring sites in main.go can't be unit-tested without invoking the cgo `gvproxy_create` (real Unix-socket path + VM transport). That integration belongs in the Rust-side consumer PR. The ErrSink framework itself is fully two-sided in the unit tests above — each one trivially red'd by reverting the corresponding ErrSink line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…routines
Completes the "5/5 silent sites have unit-level two-side proof" claim
on top of the previous Layer-3 demo (which only covered listener.Accept).
Extraction (3 new helper files):
- qemu_accept_loop.go now takes `acceptQemu qemuProtocolFn` (function
value) instead of `*virtualnetwork.VirtualNetwork`, so tests can mock
the protocol layer without standing up real virtualnetwork.New.
- vfkit_accept_loop.go (new) — same shape for macOS: `acceptVfkitTransport
vfkitTransportFn` + `acceptVfkit vfkitProtocolFn`. Compiles on Linux too;
only invoked on GOOS=="darwin" in production but tests run everywhere.
- install_tcp_override.go (new) — wraps the OverrideTCPHandler call
so the init goroutine in main.go (which still runs inline) can route
failures through a single point. `installFn func() error` injection
lets tests force the failure shape.
main.go now reads:
go runQemuAcceptLoop(ctx, id, listener, vn.AcceptQemu, sink)
go runVfkitAcceptLoop(ctx, id, conn.(*net.UnixConn),
transport.AcceptVfkit, vn.AcceptVfkit, sink)
err := installTCPOverride(hasTCPFilter, installTCPHandler)
Same behavior end-to-end; the inline goroutines are 30 lines smaller.
Test coverage delta (10 new test functions, +400 lines):
qemu_accept_loop_test.go (+1):
- ProtocolErrorSurfacesViaSink: real listener + real dialer (so Accept
succeeds), mock acceptQemu returns error → sink.PollRuntime gets
source="vn.AcceptQemu" + errors.Is(re.Err, mockErr).
vfkit_accept_loop_test.go (new file, 3 tests):
- TransportErrorSurfacesViaSink: mock acceptVfkitTransport returns
error → sink gets source="transport.AcceptVfkit".
- ProtocolErrorSurfacesViaSink: net.Pipe + successful mock transport
+ failing mock acceptVfkit → sink gets source="vn.AcceptVfkit".
- CancelDoesNotSinkRuntime: planned shutdown via ctx cancel, mock
transport blocks on ctx.Done then errors → sink stays empty.
install_tcp_override_test.go (new file, 4 tests):
- SkipsWhenNoFilterConfigured: hasTCPFilter=false → installFn NOT
called (the production guard pin).
- CallsWhenFilterConfigured: hasTCPFilter=true → installFn IS called.
- PropagatesError: installFn returns mockErr → installTCPOverride
returns the same wrapped error (the security-shaped guarantee).
- EmptyAllowNetReturnsNil: newTCPFilterFromConfig with empty/nil
AllowNet → nil (MITM-only mode).
Two-sided verification (toggle red'd):
| toggle | red message |
|---|---|
| qemu_accept_loop.go: remove `sink.Runtime("vn.AcceptQemu", err)` | `ProtocolErrorSurfacesViaSink`: "expected vn.AcceptQemu runtime error in sink, got nil — pre-fix silent return: protocol pump errored but no signal reached anyone (Rust runtime, operator, ops dashboard)" |
| vfkit_accept_loop.go: remove `sink.Runtime("transport.AcceptVfkit", err)` | `TransportErrorSurfacesViaSink`: "expected transport.AcceptVfkit runtime error in sink, got nil — pre-fix silent return: gvproxy alive but VFKit transport dead, guest fails 20s later as DNS/network unreachable (same shape as boxlite-ai#612)" |
| vfkit_accept_loop.go: remove `sink.Runtime("vn.AcceptVfkit", err)` | `ProtocolErrorSurfacesViaSink` (Vfkit): "expected vn.AcceptVfkit runtime error in sink, got nil — pre-fix silent return: VFKit protocol pump died, guest TCP connections RST silently" |
| install_tcp_override.go: swallow `installFn()` return | `PropagatesError`: "expected error to propagate; got nil — pre-fix silent install: OverrideTCPHandler error swallowed, allow_net/MITM silently disabled, box.create reports SUCCESS" |
All 4 reds restore to green when the toggle is reverted. Combined with the
prior listener.Accept toggle (in the previous commit), this gives 5/5
unit-level proof that each silent site, pre-fix, would have stayed silent.
Test totals after this commit (Go side):
- 9 error_sink_test.go (framework contracts)
- 3 qemu_accept_loop_test.go (listener.Accept + vn.AcceptQemu + cancel control)
- 3 vfkit_accept_loop_test.go (transport + protocol + cancel control)
- 4 install_tcp_override_test.go
--- ---
19 Go tests (+ 3 Rust FFI integration tests = 22 total)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ix wins `buildAllowNetDNSZones` had three independent bugs that combined to silently sinkhole allow-listed hostnames inside the guest, returning 0.0.0.0 for the lifetime of the box: 1. **No retry/timeout on host-side resolution.** Each allow-listed hostname was resolved exactly once with `context.Background()` and no timeout. A momentary host hiccup (VPN flap, slow corp resolver, mDNSResponder churn) silently dropped the zone, and the symptom `allow_zones=N-1 total_zones=N` was the only signal — a warning most callers never noticed. Now: 4 attempts with exponential backoff (100/300/900/2700 ms) and a 2 s per-attempt deadline. Successful retries log `INFO allowNet: DNS resolution succeeded after retry`. 2. **Silent half-baked sinkhole.** When resolution definitively failed, the zone was just dropped and box.create proceeded as if nothing was wrong — producing a sinkhole that would 0.0.0.0 exactly the host the caller said they needed. Now: the resolver error propagates through `buildAllowNetDNSZones` → `buildDNSZones` → `buildTapConfig` → `gvproxy_create`, surfaced to the Rust caller via the new `errOut` channel from boxlite-ai#612 so the user-visible BoxliteError says `allowNet: resolve "iapi.example.com": ...` instead of "krun_start_enter returned: -22". 3. **Map-iteration zone order vs first-suffix-wins matcher.** gvisor-tap-vsock's DNS handler matches zones first-suffix-wins with no most-specific-match preference (services/dns/dns.go:51). We were emitting zones in Go map iteration order — random per process. With an allow-list of `[github.com, api.github.com, iapi.merck.com]`, an `iapi.merck.com.` query would sometimes hit the `com.` zone first (because `github.com` lives there), find no matching record name, and return that zone's `DefaultIP=0.0.0.0`. The same query hit `merck.com.` first on other runs and resolved correctly — explaining the "intermittent" reports. Now: zones are sorted longest-name-first before the slice is built, so the most-specific suffix always wins, and the catch-all root zone stays last. Also adds two pieces of operability that made this debuggable: - A build-identity log line on the first `gvproxy_create` of a shim process: `INFO gvproxy-bridge: build identity build_id=… vcs_revision=… vcs_modified=… go_version=…`. Lets a user confirm whether a freshly built shim is actually what's loaded — without it, "did the rebuild take effect?" is answered by guesswork. - An `INFO allowNet: host resolved` line per allow-listed hostname, with the resolved IP list and v4 count. Visible at INFO so zero-IPv4 / IPv6-only hosts no longer hide. Tests added (in `dns_filter_test.go`): - `RetriesTransientResolverFailure` — flaky resolver that fails the first 2 attempts then succeeds; asserts retry recovers and bakes the host's records. - `FailsClosedAfterRetries` — always-fail resolver; asserts the N-attempts-then-error contract (no zone with `Default=0.0.0.0` silently shipped). - `HonorsPerAttemptTimeout` — hanging resolver that blocks until ctx cancellation; asserts the per-attempt deadline fires (without this, an unresponsive corp DNS server hangs box.create indefinitely). - `PerHostFailureFailsAggregate` — multi-host allow-list where one host permafails; asserts the whole sinkhole build aborts and the failing host is named in the error. - `LongestSuffixWinsBeforeRoot` — pins the bug-3 fix: with `[github.com, api.github.com, raw.githubusercontent.com, codeload.github.com, iapi.merck.com]`, asserts every per-TLD zone precedes the `com.` zone in the output slice, so the first-match-wins matcher sees `merck.com.` before `com.`. The retry-loop timing constants are exposed as package-level `var`s so tests can override them without slowing the suite (the timing contract is what's pinned, not the specific durations). Two-side verification: with the production changes reverted but the new tests kept, the four reproducer tests fail with the expected signals (`expected retry to recover`, `expected 4 attempts got 1`, hangs to the 60s suite timeout, and `LongestSuffixWinsBeforeRoot`'s ordering assertion); restore the changes and all pass. Docs updated: - `docs/faq.md` — new Networking entry "An allow_net host resolves to 0.0.0.0 inside the box" with diagnostic steps (where the gvproxy log lives, what `allow_zones=N-1 total_zones=N` looks like vs the new retry/fail warnings, why the sinkhole is baked once at create time and recreating the box — not restarting it — is what propagates a fix). - `docs/reference/README.md` — bullet under the `network` notes describing host-side pre-resolution, retries, and fail-closed behavior so callers know `allow_net` entries must be resolvable on the host before `box.create`.
Surface gvproxy
virtualnetwork.Newbind failures (port-in-use, privileged-port, …) through the GoinitErrchannel → FFIerrOut→BoxliteError::Networkinstead of swallowing them into an opaque box failureTest plan
Two regression tests, each verified two-sided (fix reverted vs applied):
gvproxy_port_conflict_fails_fast_with_named_error— host pre-binds the port, thenboxlite run -pcollides (EADDRINUSE).gvproxy_privileged_port_fails_fast_with_named_error— non-rootboxlite run -p 80:80(EACCES).rc=0— boots despite the bind failurerc≠0— fails fastgvproxy_create failed, no OS reasongvproxy_create failed: … address already in use/… permission deniedPre-fix the bind error was swallowed in a goroutine
logrusline and the box booted with broken networking; post-fix it interrupts startup with the named OS reason.