Skip to content

Flaky CI: init-wizard-reverse-proxy native-smoke tape times out on macOS (wizard never reaches terminal state within 180s) #1308

@Aaronontheweb

Description

@Aaronontheweb

Summary

The native-smoke tape init-wizard-reverse-proxy flakes on the macOS runner: it intermittently times out waiting for the wizard to reach a terminal state, failing the Native Smoke (macOS) job. The same tape passes on Linux and passes on most macOS runs — it is non-deterministic, not a real regression.

Evidence it is flaky (not a code regression)

Observed on PR #1306 (docker/container-hardening-1287-1303), whose diff is Docker-only (Dockerfile comment + scripts/docker/test-daemon-lifecycle.sh) and touches no wizard / TUI / tape / native-smoke / macOS code:

Commit smoke run macOS Native Smoke
b43604fe (first push) 26867175059 ✅ pass
4f5b99a7 (amend — Docker-only delta) 26868137909 ❌ fail

The delta between those two commits cannot affect a wizard tape, so the difference is non-determinism. The same tape passed on Linux in the failing run, and passes on dev and other PRs (e.g. #1307).

Failure signature

File: …/init-wizard-reverse-proxy.tape
…
failed to execute command: timeout waiting for
  "Screen (Ready  \|  qwen2:0\.5b|TAPE\$ netclaw init)" to match;
  last value was: ╭─Netclaw Setup──────────…────╮   (still on the wizard)
recording failed
FAIL: vhs exited with status 1 for tape init-wizard-reverse-proxy

The timed-out step is the final terminal-state wait in tests/smoke/tapes/init-wizard-reverse-proxy.tape:

# ─── Step 10: Health Check ──────────────────────────────────────────
Wait+Screen@10s /Press Enter to run health checks/
Enter
…
Wait+Screen@180s /(Ready  \|  qwen2:0\.5b|TAPE\$ netclaw init)/

After pressing Enter to run the health checks, the wizard never reached either terminal state within 180 s — neither the chat-ready bar nor the post-exit shell prompt — and the last captured screen was still Netclaw Setup. So the Health Check step itself ran long on this macOS runner.

Root-cause hypothesis

Step 10 (HealthCheckStepViewModel) on a fresh reverse-proxy init: writes config → starts the daemon → polls /api/health/ready until ready (ReloadReadyTimeout = 90s, OverallHealthCheckTimeout = 5min). On a slow/loaded macOS GitHub runner, first daemon startup (JIT warm-up, SQLite provisioning, identity-file writes, provider/model checks) plus the readiness poll can exceed the tape's @180s budget, so vhs times out before the wizard reaches a terminal screen. This is a runner-speed/timeout-margin problem, not a logic bug — the wizard would eventually complete (the 5-minute overall budget is larger than the tape's 180 s wait).

Impact

Suggested directions (for investigation)

  • Raise the tape's terminal-state wait (Wait+Screen@180s …) for this tape, and/or scope a larger budget on the macOS runner specifically, so it comfortably exceeds the worst-case macOS health-check time (bounded by OverallHealthCheckTimeout = 5min).
  • Measure first-daemon-startup latency on the macOS runner (capture timing in netclaw-home-logs) to confirm the hypothesis and pick a margin.
  • Consider an automatic single retry for native-smoke tapes (not scenarios) on macOS, with the retry logged loudly, since vhs/terminal capture is inherently timing-sensitive.
  • The init-wizard (Local) tape shares the same Step-10 health-check wait; check whether it is similarly at-risk on macOS and fix both together.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions