Skip to content

fix(net): surface gvproxy bind errors through FFI#612

Merged
DorianZheng merged 5 commits into
boxlite-ai:mainfrom
G4614:fix/gvproxy-init-error-passthrough
Jun 1, 2026
Merged

fix(net): surface gvproxy bind errors through FFI#612
DorianZheng merged 5 commits into
boxlite-ai:mainfrom
G4614:fix/gvproxy-init-error-passthrough

Conversation

@G4614

@G4614 G4614 commented May 28, 2026

Copy link
Copy Markdown
Contributor

Surface gvproxy virtualnetwork.New bind failures (port-in-use, privileged-port, …) through the Go initErr channel → FFI errOutBoxliteError::Network instead of swallowing them into an opaque box failure

Test plan

Two regression tests, each verified two-sided (fix reverted vs applied):

  • gvproxy_port_conflict_fails_fast_with_named_error — host pre-binds the port, then boxlite run -p collides (EADDRINUSE).
  • gvproxy_privileged_port_fails_fast_with_named_error — non-root boxlite run -p 80:80 (EACCES).
observed pre-fix (main.go/ffi.rs reverted) post-fix
box exit code rc=0 — boots despite the bind failure rc≠0 — fails fast
stderr no gvproxy_create failed, no OS reason gvproxy_create failed: … address already in use / … permission denied
test result both FAIL (all 3 checks) both PASS

Pre-fix the bind error was swallowed in a goroutine logrus line and the box booted with broken networking; post-fix it interrupts startup with the named OS reason.

@G4614 G4614 force-pushed the fix/gvproxy-init-error-passthrough branch from 47aefa2 to 0872494 Compare May 28, 2026 08:29
@G4614 G4614 changed the title fix(net): gvproxy bind-error passthrough + EXPOSE auto-remap fix(net): surface gvproxy bind errors through FFI May 28, 2026
@G4614 G4614 marked this pull request as ready for review May 28, 2026 08:46
@G4614

G4614 commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

Where the error is producedgvproxy-bridge/main.go:440, inside the network goroutine:

go func() {
    vn, err := virtualnetwork.New(tapConfig)   // :440  the bind happens here (EADDRINUSE, EACCES, …)
    if err != nil {
        initErr <- err                          // :443  hand the error out of the goroutine
        return
    }
    initErr <- nil                              // :446  success
    // ...
}()

Where it's capturedmain.go:534, gvproxy_create blocks until that goroutine reports back, and a non-nil result becomes a hard stop:

if err := <-initErr; err != nil {   // :534  wait for the bind result
    setErr(err)                     //        copy the reason into *errOut
    cancel(); delete(instances, id); listener.Close(); os.Remove(socketPath)
    return -1                       // :547  ← turn the failure into -1
}

Pre-fix there was no initErr: the goroutine only logrus.Error'd the bind failure and returned, and gvproxy_create fell straight through to return C.longlong(id) without waiting — so the error never reached the return value and the box booted with broken networking (the test's rc=0).

The Rust side (net/gvproxy/ffi.rs, if id < 0 => return Err(...)) already aborted box startup on -1; it simply never received one until this fix produced it.

os.Remove(socketPath)
}()

// Wait for virtualnetwork.New to complete before returning a valid id.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this hurt performances in normal cases?

@G4614 G4614 May 28, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gvproxy_create_latency_test integrated for normal case (virtual network.New wait < 0.5ms threshold), usually 0.1~0.3ms on Amazon c8i.xlarge ubuntu 22.04

A/B test (latency increase because of this waiting) also in this range

thx for the instruction

G4614 added 4 commits May 28, 2026 10:31
`virtualnetwork.New` failures (e.g. host port EADDRINUSE when another
process holds the port a user passed to `-p HOST:GUEST`) used to die
inside a goroutine at gvproxy-bridge/main.go:417, while the surrounding
`gvproxy_create` had already returned a valid id. The Rust runtime
shipped the broken socket downstream, the guest booted, and the
failure surfaced ~20s later as "DNS lookup … i/o timeout" from inside
the guest — multiple layers from the root cause.

Add an `initErr` channel so `gvproxy_create` waits for the
`virtualnetwork.New` result before returning. On failure it tears
down the instance and returns -1; the Rust runtime maps that to
`Network("gvproxy_create failed")` so the user sees a clear, named,
fail-fast error in well under a second.

Regression guard: `src/cli/tests/gvproxy_port_conflict.rs` —
non-boxlite `TcpListener` holds an ephemeral host port, then
`boxlite run -p PORT:80 alpine:latest true` must exit non-zero with
stderr containing "gvproxy_create failed".
The previous commit surfaced bind failures as `gvproxy_create failed`,
but that string alone doesn't tell the user *why* — they still have to
read box debug files to find "address already in use".

Extend `gvproxy_create` with an `errOut **C.char` out-parameter; on
each -1 return path the Go side writes the underlying `err.Error()` as
a heap-allocated C string. The Rust caller reads, frees, and folds
the detail into `BoxliteError::Network`, so `boxlite run` stderr now
reads:

    Error: network error: gvproxy_create failed:
      cannot add network services: listen tcp 0.0.0.0:27499:
      bind: address already in use

instead of an opaque "gvproxy_create failed".

Regression guard: `gvproxy_port_conflict_fails_fast_with_named_error`
now also asserts stderr contains "address already in use" — locks down
the FFI plumbing against future regressions that would re-collapse
the message.
The 3 assertions guard distinct regression classes — boxlite's exit
code, the named gvproxy error, and the underlying OS-level bind detail.
With `assert!` short-circuiting, a regression that breaks all three
only shows the first failure on the test panic, forcing iterative
debugging. Collect each check into a Vec and emit a single combined
panic at the end so a maintainer sees every level that regressed
in one test run.

Failure mode on plain main (verified): "3 of 3 checks failed:
  - L1 [boxlite rc != 0]: ...
  - L2 [stderr names gvproxy]: ...
  - L3 [stderr carries OS detail]: ..."
Companion to gvproxy_port_conflict_fails_fast_with_named_error,
exercising a different `virtualnetwork.New` bind failure mode: EACCES
when a non-root caller maps a privileged host port (e.g. -p 80:80),
instead of EADDRINUSE when the port is busy.

Same FFI/error plumbing, different kernel error string. Confirms the
fix surfaces whatever the kernel returned at bind time rather than
hard-coding a port-conflict shortcut.

Verified on main worktree: 3 of 3 soft-assertions fire (boxlite rc=0,
stderr contains neither "gvproxy_create failed" nor "permission denied"
— just qcow2 warns and "Auto-stopping non-detached box"). On the fix
branch, all three pass.

Skipped when the runner has permission to bind <1024 (root /
CAP_NET_BIND_SERVICE / lowered ip_unprivileged_port_start) — the test
premise doesn't hold there.
@G4614 G4614 marked this pull request as draft May 28, 2026 10:35
@G4614 G4614 force-pushed the fix/gvproxy-init-error-passthrough branch from 0872494 to 93cdbc3 Compare May 28, 2026 10:36
The fix makes gvproxy_create block on `<-initErr` until virtualnetwork.New
finishes, prompting the review question of normal-case cost. virtualnetwork.New
is the upper bound on that wait; this pins its median (over N runs, warmed,
robust to one-off GC/scheduler spikes) under a 0.5ms budget — observed ~120µs.
Two-side verified: a 600µs injection pushes the median to ~1.2ms and fails.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 marked this pull request as ready for review May 28, 2026 11:06
@DorianZheng DorianZheng merged commit 694b4e2 into boxlite-ai:main Jun 1, 2026
25 checks passed
G4614 pushed a commit to G4614/boxlite that referenced this pull request Jun 3, 2026
…ix wins

`buildAllowNetDNSZones` had three independent bugs that combined to
silently sinkhole allow-listed hostnames inside the guest, returning
0.0.0.0 for the lifetime of the box:

1. **No retry/timeout on host-side resolution.** Each allow-listed
   hostname was resolved exactly once with `context.Background()` and
   no timeout. A momentary host hiccup (VPN flap, slow corp resolver,
   mDNSResponder churn) silently dropped the zone, and the symptom
   `allow_zones=N-1 total_zones=N` was the only signal — a warning
   most callers never noticed. Now: 4 attempts with exponential
   backoff (100/300/900/2700 ms) and a 2 s per-attempt deadline.
   Successful retries log `INFO allowNet: DNS resolution succeeded
   after retry`.

2. **Silent half-baked sinkhole.** When resolution definitively
   failed, the zone was just dropped and box.create proceeded as if
   nothing was wrong — producing a sinkhole that would 0.0.0.0
   exactly the host the caller said they needed. Now: the resolver
   error propagates through `buildAllowNetDNSZones` →
   `buildDNSZones` → `buildTapConfig` → `gvproxy_create`, surfaced
   to the Rust caller via the new `errOut` channel from boxlite-ai#612 so the
   user-visible BoxliteError says
   `allowNet: resolve "iapi.example.com": ...` instead of
   "krun_start_enter returned: -22".

3. **Map-iteration zone order vs first-suffix-wins matcher.**
   gvisor-tap-vsock's DNS handler matches zones first-suffix-wins
   with no most-specific-match preference (services/dns/dns.go:51).
   We were emitting zones in Go map iteration order — random per
   process. With an allow-list of `[github.com, api.github.com,
   iapi.merck.com]`, an `iapi.merck.com.` query would sometimes hit
   the `com.` zone first (because `github.com` lives there), find no
   matching record name, and return that zone's
   `DefaultIP=0.0.0.0`. The same query hit `merck.com.` first on
   other runs and resolved correctly — explaining the "intermittent"
   reports. Now: zones are sorted longest-name-first before the
   slice is built, so the most-specific suffix always wins, and the
   catch-all root zone stays last.

Also adds two pieces of operability that made this debuggable:

- A build-identity log line on the first `gvproxy_create` of a shim
  process: `INFO gvproxy-bridge: build identity build_id=… vcs_revision=…
  vcs_modified=… go_version=…`. Lets a user confirm whether a freshly
  built shim is actually what's loaded — without it, "did the rebuild
  take effect?" is answered by guesswork.
- An `INFO allowNet: host resolved` line per allow-listed hostname,
  with the resolved IP list and v4 count. Visible at INFO so
  zero-IPv4 / IPv6-only hosts no longer hide.

Tests added (in `dns_filter_test.go`):

- `RetriesTransientResolverFailure` — flaky resolver that fails the
  first 2 attempts then succeeds; asserts retry recovers and bakes
  the host's records.
- `FailsClosedAfterRetries` — always-fail resolver; asserts the
  N-attempts-then-error contract (no zone with `Default=0.0.0.0`
  silently shipped).
- `HonorsPerAttemptTimeout` — hanging resolver that blocks until
  ctx cancellation; asserts the per-attempt deadline fires (without
  this, an unresponsive corp DNS server hangs box.create
  indefinitely).
- `PerHostFailureFailsAggregate` — multi-host allow-list where one
  host permafails; asserts the whole sinkhole build aborts and the
  failing host is named in the error.
- `LongestSuffixWinsBeforeRoot` — pins the bug-3 fix: with
  `[github.com, api.github.com, raw.githubusercontent.com,
  codeload.github.com, iapi.merck.com]`, asserts every per-TLD zone
  precedes the `com.` zone in the output slice, so the
  first-match-wins matcher sees `merck.com.` before `com.`.

The retry-loop timing constants are exposed as package-level `var`s
so tests can override them without slowing the suite (the timing
contract is what's pinned, not the specific durations).

Two-side verification: with the production changes reverted but the
new tests kept, the four reproducer tests fail with the expected
signals (`expected retry to recover`, `expected 4 attempts got 1`,
hangs to the 60s suite timeout, and `LongestSuffixWinsBeforeRoot`'s
ordering assertion); restore the changes and all pass.

Docs updated:

- `docs/faq.md` — new Networking entry
  "An allow_net host resolves to 0.0.0.0 inside the box" with
  diagnostic steps (where the gvproxy log lives, what
  `allow_zones=N-1 total_zones=N` looks like vs the new retry/fail
  warnings, why the sinkhole is baked once at create time and
  recreating the box — not restarting it — is what propagates a
  fix).
- `docs/reference/README.md` — bullet under the `network` notes
  describing host-side pre-resolution, retries, and fail-closed
  behavior so callers know `allow_net` entries must be
  resolvable on the host before `box.create`.
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 3, 2026
boxlite-ai#612 surfaced `virtualnetwork.New` bind errors out of the gvproxy goroutine
via an ad-hoc `initErr` channel. Four sibling failure sites still fell into
`logrus.Error + return` and went unobserved by the Rust runtime:

  main.go:454  OverrideTCPHandler           — allow_net / MITM silently disabled
  main.go:475  transport.AcceptVfkit        — VFKit transport silently dead (macOS)
  main.go:484  vn.AcceptVfkit               — VFKit protocol silently dead
  main.go:496  listener.Accept              — Qemu transport silently dead (Linux)
  main.go:510  vn.AcceptQemu                — Qemu protocol silently dead

OverrideTCPHandler in particular is a security-shaped bug: when allow_net is
configured but OverrideTCPHandler fails, the box appears healthy AND traffic
bypasses the filter. The Accept failures are boxlite-ai#612's symptom on a different
code path: the VM "connects" but the protocol pump never runs, so 20s later
the guest reports "DNS lookup … i/o timeout" or every TCP RST.

This change introduces `ErrSink` as the single propagation entry point for
all five sites:

  error_sink.go (new):
    - ErrSink with two channels: initErr (cap 1) + runtimeErr (cap 16)
    - Init(source, err) / WaitInit() pair — replaces boxlite-ai#612's ad-hoc channel
    - Runtime(source, err) / PollRuntime() pair — non-blocking on the
      producer side (queue-full drops with `runtime_err_dropped` warn +
      atomic counter); the Rust runtime polls via the new FFI export

  main.go:
    - GvproxyInstance gains `errSink *ErrSink` (init at instance creation)
    - The init goroutine now:
        sink.Init("virtualnetwork.New", err)    if New fails
        sink.Init("OverrideTCPHandler", err)    if override fails  ← MOVED
        sink.Init("", nil)                      when all init succeeds
      OverrideTCPHandler MOVED into init phase: a failure here used to be
      silent and post-init; now it blocks gvproxy_create from returning,
      so the box.create call fails-fast instead of producing a half-baked
      sinkhole.
    - The four runtime-phase sites now also push to sink.Runtime(...)
      *in addition* to logging — log preserved for backwards-compat with
      existing field grep filters, sink call adds the structured surface.

  gvproxy_poll_runtime_error (new //export):
    - Returns the oldest unread runtime error as a heap-allocated C string
      (`gvproxy_free_string` to release), or nil if queue empty.
    - Intended for a Rust-side background task polling ~250ms; out of scope
      for this PR (Go-side framework only — Rust consumer is follow-up).

Tests (error_sink_test.go, 9 tests):
  - Init contract: nil success, error wrapping with source, WaitInit blocks
    until Init is called
  - Runtime contract: nil no-op, FIFO drain via PollRuntime, queue-full
    drop counts, producer-never-blocks under backpressure, concurrent
    producers safe (observed + dropped = sent)
  - Render contract: RuntimeError.String contains source + cause +
    RFC3339 timestamp (so the operator-visible format is pinned)

Two-sided context: the sink-wiring sites in main.go can't be unit-tested
without invoking the cgo `gvproxy_create` (real Unix-socket path + VM
transport). That integration belongs in the Rust-side consumer PR. The
ErrSink framework itself is fully two-sided in the unit tests above —
each one trivially red'd by reverting the corresponding ErrSink line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 3, 2026
…routines

Completes the "5/5 silent sites have unit-level two-side proof" claim
on top of the previous Layer-3 demo (which only covered listener.Accept).

Extraction (3 new helper files):
  - qemu_accept_loop.go now takes `acceptQemu qemuProtocolFn` (function
    value) instead of `*virtualnetwork.VirtualNetwork`, so tests can mock
    the protocol layer without standing up real virtualnetwork.New.
  - vfkit_accept_loop.go (new) — same shape for macOS: `acceptVfkitTransport
    vfkitTransportFn` + `acceptVfkit vfkitProtocolFn`. Compiles on Linux too;
    only invoked on GOOS=="darwin" in production but tests run everywhere.
  - install_tcp_override.go (new) — wraps the OverrideTCPHandler call
    so the init goroutine in main.go (which still runs inline) can route
    failures through a single point. `installFn func() error` injection
    lets tests force the failure shape.

main.go now reads:
    go runQemuAcceptLoop(ctx, id, listener, vn.AcceptQemu, sink)
    go runVfkitAcceptLoop(ctx, id, conn.(*net.UnixConn),
                          transport.AcceptVfkit, vn.AcceptVfkit, sink)
    err := installTCPOverride(hasTCPFilter, installTCPHandler)
Same behavior end-to-end; the inline goroutines are 30 lines smaller.

Test coverage delta (10 new test functions, +400 lines):

  qemu_accept_loop_test.go (+1):
    - ProtocolErrorSurfacesViaSink: real listener + real dialer (so Accept
      succeeds), mock acceptQemu returns error → sink.PollRuntime gets
      source="vn.AcceptQemu" + errors.Is(re.Err, mockErr).

  vfkit_accept_loop_test.go (new file, 3 tests):
    - TransportErrorSurfacesViaSink: mock acceptVfkitTransport returns
      error → sink gets source="transport.AcceptVfkit".
    - ProtocolErrorSurfacesViaSink: net.Pipe + successful mock transport
      + failing mock acceptVfkit → sink gets source="vn.AcceptVfkit".
    - CancelDoesNotSinkRuntime: planned shutdown via ctx cancel, mock
      transport blocks on ctx.Done then errors → sink stays empty.

  install_tcp_override_test.go (new file, 4 tests):
    - SkipsWhenNoFilterConfigured: hasTCPFilter=false → installFn NOT
      called (the production guard pin).
    - CallsWhenFilterConfigured: hasTCPFilter=true → installFn IS called.
    - PropagatesError: installFn returns mockErr → installTCPOverride
      returns the same wrapped error (the security-shaped guarantee).
    - EmptyAllowNetReturnsNil: newTCPFilterFromConfig with empty/nil
      AllowNet → nil (MITM-only mode).

Two-sided verification (toggle red'd):

  | toggle | red message |
  |---|---|
  | qemu_accept_loop.go: remove `sink.Runtime("vn.AcceptQemu", err)` | `ProtocolErrorSurfacesViaSink`: "expected vn.AcceptQemu runtime error in sink, got nil — pre-fix silent return: protocol pump errored but no signal reached anyone (Rust runtime, operator, ops dashboard)" |
  | vfkit_accept_loop.go: remove `sink.Runtime("transport.AcceptVfkit", err)` | `TransportErrorSurfacesViaSink`: "expected transport.AcceptVfkit runtime error in sink, got nil — pre-fix silent return: gvproxy alive but VFKit transport dead, guest fails 20s later as DNS/network unreachable (same shape as boxlite-ai#612)" |
  | vfkit_accept_loop.go: remove `sink.Runtime("vn.AcceptVfkit", err)` | `ProtocolErrorSurfacesViaSink` (Vfkit): "expected vn.AcceptVfkit runtime error in sink, got nil — pre-fix silent return: VFKit protocol pump died, guest TCP connections RST silently" |
  | install_tcp_override.go: swallow `installFn()` return | `PropagatesError`: "expected error to propagate; got nil — pre-fix silent install: OverrideTCPHandler error swallowed, allow_net/MITM silently disabled, box.create reports SUCCESS" |

All 4 reds restore to green when the toggle is reverted. Combined with the
prior listener.Accept toggle (in the previous commit), this gives 5/5
unit-level proof that each silent site, pre-fix, would have stayed silent.

Test totals after this commit (Go side):
  - 9  error_sink_test.go     (framework contracts)
  - 3  qemu_accept_loop_test.go (listener.Accept + vn.AcceptQemu + cancel control)
  - 3  vfkit_accept_loop_test.go (transport + protocol + cancel control)
  - 4  install_tcp_override_test.go
  --- ---
  19  Go tests (+ 3 Rust FFI integration tests = 22 total)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ldarren added a commit to ldarren/boxlite that referenced this pull request Jun 7, 2026
…ix wins

`buildAllowNetDNSZones` had three independent bugs that combined to
silently sinkhole allow-listed hostnames inside the guest, returning
0.0.0.0 for the lifetime of the box:

1. **No retry/timeout on host-side resolution.** Each allow-listed
   hostname was resolved exactly once with `context.Background()` and
   no timeout. A momentary host hiccup (VPN flap, slow corp resolver,
   mDNSResponder churn) silently dropped the zone, and the symptom
   `allow_zones=N-1 total_zones=N` was the only signal — a warning
   most callers never noticed. Now: 4 attempts with exponential
   backoff (100/300/900/2700 ms) and a 2 s per-attempt deadline.
   Successful retries log `INFO allowNet: DNS resolution succeeded
   after retry`.

2. **Silent half-baked sinkhole.** When resolution definitively
   failed, the zone was just dropped and box.create proceeded as if
   nothing was wrong — producing a sinkhole that would 0.0.0.0
   exactly the host the caller said they needed. Now: the resolver
   error propagates through `buildAllowNetDNSZones` →
   `buildDNSZones` → `buildTapConfig` → `gvproxy_create`, surfaced
   to the Rust caller via the new `errOut` channel from boxlite-ai#612 so the
   user-visible BoxliteError says
   `allowNet: resolve "iapi.example.com": ...` instead of
   "krun_start_enter returned: -22".

3. **Map-iteration zone order vs first-suffix-wins matcher.**
   gvisor-tap-vsock's DNS handler matches zones first-suffix-wins
   with no most-specific-match preference (services/dns/dns.go:51).
   We were emitting zones in Go map iteration order — random per
   process. With an allow-list of `[github.com, api.github.com,
   iapi.merck.com]`, an `iapi.merck.com.` query would sometimes hit
   the `com.` zone first (because `github.com` lives there), find no
   matching record name, and return that zone's
   `DefaultIP=0.0.0.0`. The same query hit `merck.com.` first on
   other runs and resolved correctly — explaining the "intermittent"
   reports. Now: zones are sorted longest-name-first before the
   slice is built, so the most-specific suffix always wins, and the
   catch-all root zone stays last.

Also adds two pieces of operability that made this debuggable:

- A build-identity log line on the first `gvproxy_create` of a shim
  process: `INFO gvproxy-bridge: build identity build_id=… vcs_revision=…
  vcs_modified=… go_version=…`. Lets a user confirm whether a freshly
  built shim is actually what's loaded — without it, "did the rebuild
  take effect?" is answered by guesswork.
- An `INFO allowNet: host resolved` line per allow-listed hostname,
  with the resolved IP list and v4 count. Visible at INFO so
  zero-IPv4 / IPv6-only hosts no longer hide.

Tests added (in `dns_filter_test.go`):

- `RetriesTransientResolverFailure` — flaky resolver that fails the
  first 2 attempts then succeeds; asserts retry recovers and bakes
  the host's records.
- `FailsClosedAfterRetries` — always-fail resolver; asserts the
  N-attempts-then-error contract (no zone with `Default=0.0.0.0`
  silently shipped).
- `HonorsPerAttemptTimeout` — hanging resolver that blocks until
  ctx cancellation; asserts the per-attempt deadline fires (without
  this, an unresponsive corp DNS server hangs box.create
  indefinitely).
- `PerHostFailureFailsAggregate` — multi-host allow-list where one
  host permafails; asserts the whole sinkhole build aborts and the
  failing host is named in the error.
- `LongestSuffixWinsBeforeRoot` — pins the bug-3 fix: with
  `[github.com, api.github.com, raw.githubusercontent.com,
  codeload.github.com, iapi.merck.com]`, asserts every per-TLD zone
  precedes the `com.` zone in the output slice, so the
  first-match-wins matcher sees `merck.com.` before `com.`.

The retry-loop timing constants are exposed as package-level `var`s
so tests can override them without slowing the suite (the timing
contract is what's pinned, not the specific durations).

Two-side verification: with the production changes reverted but the
new tests kept, the four reproducer tests fail with the expected
signals (`expected retry to recover`, `expected 4 attempts got 1`,
hangs to the 60s suite timeout, and `LongestSuffixWinsBeforeRoot`'s
ordering assertion); restore the changes and all pass.

Docs updated:

- `docs/faq.md` — new Networking entry
  "An allow_net host resolves to 0.0.0.0 inside the box" with
  diagnostic steps (where the gvproxy log lives, what
  `allow_zones=N-1 total_zones=N` looks like vs the new retry/fail
  warnings, why the sinkhole is baked once at create time and
  recreating the box — not restarting it — is what propagates a
  fix).
- `docs/reference/README.md` — bullet under the `network` notes
  describing host-side pre-resolution, retries, and fail-closed
  behavior so callers know `allow_net` entries must be
  resolvable on the host before `box.create`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants