Summary
The native-smoke tape init-wizard-reverse-proxy flakes on the macOS runner: it intermittently times out waiting for the wizard to reach a terminal state, failing the Native Smoke (macOS) job. The same tape passes on Linux and passes on most macOS runs — it is non-deterministic, not a real regression.
Evidence it is flaky (not a code regression)
Observed on PR #1306 (docker/container-hardening-1287-1303), whose diff is Docker-only (Dockerfile comment + scripts/docker/test-daemon-lifecycle.sh) and touches no wizard / TUI / tape / native-smoke / macOS code:
| Commit |
smoke run |
macOS Native Smoke |
b43604fe (first push) |
26867175059 |
✅ pass |
4f5b99a7 (amend — Docker-only delta) |
26868137909 |
❌ fail |
The delta between those two commits cannot affect a wizard tape, so the difference is non-determinism. The same tape passed on Linux in the failing run, and passes on dev and other PRs (e.g. #1307).
Failure signature
File: …/init-wizard-reverse-proxy.tape
…
failed to execute command: timeout waiting for
"Screen (Ready \| qwen2:0\.5b|TAPE\$ netclaw init)" to match;
last value was: ╭─Netclaw Setup──────────…────╮ (still on the wizard)
recording failed
FAIL: vhs exited with status 1 for tape init-wizard-reverse-proxy
The timed-out step is the final terminal-state wait in tests/smoke/tapes/init-wizard-reverse-proxy.tape:
# ─── Step 10: Health Check ──────────────────────────────────────────
Wait+Screen@10s /Press Enter to run health checks/
Enter
…
Wait+Screen@180s /(Ready \| qwen2:0\.5b|TAPE\$ netclaw init)/
After pressing Enter to run the health checks, the wizard never reached either terminal state within 180 s — neither the chat-ready bar nor the post-exit shell prompt — and the last captured screen was still Netclaw Setup. So the Health Check step itself ran long on this macOS runner.
Root-cause hypothesis
Step 10 (HealthCheckStepViewModel) on a fresh reverse-proxy init: writes config → starts the daemon → polls /api/health/ready until ready (ReloadReadyTimeout = 90s, OverallHealthCheckTimeout = 5min). On a slow/loaded macOS GitHub runner, first daemon startup (JIT warm-up, SQLite provisioning, identity-file writes, provider/model checks) plus the readiness poll can exceed the tape's @180s budget, so vhs times out before the wizard reaches a terminal screen. This is a runner-speed/timeout-margin problem, not a logic bug — the wizard would eventually complete (the 5-minute overall budget is larger than the tape's 180 s wait).
Impact
Suggested directions (for investigation)
- Raise the tape's terminal-state wait (
Wait+Screen@180s …) for this tape, and/or scope a larger budget on the macOS runner specifically, so it comfortably exceeds the worst-case macOS health-check time (bounded by OverallHealthCheckTimeout = 5min).
- Measure first-daemon-startup latency on the macOS runner (capture timing in
netclaw-home-logs) to confirm the hypothesis and pick a margin.
- Consider an automatic single retry for native-smoke tapes (not scenarios) on macOS, with the retry logged loudly, since vhs/terminal capture is inherently timing-sensitive.
- The
init-wizard (Local) tape shares the same Step-10 health-check wait; check whether it is similarly at-risk on macOS and fix both together.
Notes
Summary
The native-smoke tape
init-wizard-reverse-proxyflakes on the macOS runner: it intermittently times out waiting for the wizard to reach a terminal state, failing theNative Smoke (macOS)job. The same tape passes on Linux and passes on most macOS runs — it is non-deterministic, not a real regression.Evidence it is flaky (not a code regression)
Observed on PR #1306 (
docker/container-hardening-1287-1303), whose diff is Docker-only (Dockerfile comment +scripts/docker/test-daemon-lifecycle.sh) and touches no wizard / TUI / tape / native-smoke / macOS code:smokerunb43604fe(first push)4f5b99a7(amend — Docker-only delta)The delta between those two commits cannot affect a wizard tape, so the difference is non-determinism. The same tape passed on Linux in the failing run, and passes on
devand other PRs (e.g. #1307).Failure signature
The timed-out step is the final terminal-state wait in
tests/smoke/tapes/init-wizard-reverse-proxy.tape:After pressing Enter to run the health checks, the wizard never reached either terminal state within 180 s — neither the chat-ready bar nor the post-exit shell prompt — and the last captured screen was still
Netclaw Setup. So the Health Check step itself ran long on this macOS runner.Root-cause hypothesis
Step 10 (
HealthCheckStepViewModel) on a fresh reverse-proxy init: writes config → starts the daemon → polls/api/health/readyuntil ready (ReloadReadyTimeout = 90s,OverallHealthCheckTimeout = 5min). On a slow/loaded macOS GitHub runner, first daemon startup (JIT warm-up, SQLite provisioning, identity-file writes, provider/model checks) plus the readiness poll can exceed the tape's@180sbudget, sovhstimes out before the wizard reaches a terminal screen. This is a runner-speed/timeout-margin problem, not a logic bug — the wizard would eventually complete (the 5-minute overall budget is larger than the tape's 180 s wait).Impact
Native Smoke (macOS)failures that block/interrupt PRs unrelated to the wizard (this one was a Docker-only PR).Suggested directions (for investigation)
Wait+Screen@180s …) for this tape, and/or scope a larger budget on the macOS runner specifically, so it comfortably exceeds the worst-case macOS health-check time (bounded byOverallHealthCheckTimeout = 5min).netclaw-home-logs) to confirm the hypothesis and pick a margin.init-wizard(Local) tape shares the same Step-10 health-check wait; check whether it is similarly at-risk on macOS and fix both together.Notes
fix/wizard-readiness-1302-1304) reworks this readiness path (monotonic restart generation + per-probe endpoint re-resolution) but does not address macOS startup latency, so it neither causes nor fixes this flake.tests/smoke/tapes/README.md. The relevant timeouts live insrc/Netclaw.Cli/Tui/Wizard/Steps/HealthCheckStepViewModel.cs:17,22.