Flaky CI: init-wizard-reverse-proxy native-smoke tape times out on macOS (wizard never reaches terminal state within 180s)

## Summary

The native-smoke tape **`init-wizard-reverse-proxy`** flakes on the **macOS** runner: it intermittently times out waiting for the wizard to reach a terminal state, failing the `Native Smoke (macOS)` job. The same tape passes on Linux and passes on most macOS runs — it is non-deterministic, not a real regression.

## Evidence it is flaky (not a code regression)

Observed on PR #1306 (`docker/container-hardening-1287-1303`), whose diff is **Docker-only** (Dockerfile comment + `scripts/docker/test-daemon-lifecycle.sh`) and touches no wizard / TUI / tape / native-smoke / macOS code:

| Commit | `smoke` run | macOS Native Smoke |
|--------|-------------|--------------------|
| `b43604fe` (first push) | [26867175059](https://github.com/netclaw-dev/netclaw/actions/runs/26867175059) | ✅ pass |
| `4f5b99a7` (amend — Docker-only delta) | [26868137909](https://github.com/netclaw-dev/netclaw/actions/runs/26868137909/job/79236731629) | ❌ fail |

The delta between those two commits cannot affect a wizard tape, so the difference is non-determinism. The same tape **passed on Linux in the failing run**, and passes on `dev` and other PRs (e.g. #1307).

## Failure signature

```
File: …/init-wizard-reverse-proxy.tape
…
failed to execute command: timeout waiting for
  "Screen (Ready  \|  qwen2:0\.5b|TAPE\$ netclaw init)" to match;
  last value was: ╭─Netclaw Setup──────────…────╮   (still on the wizard)
recording failed
FAIL: vhs exited with status 1 for tape init-wizard-reverse-proxy
```

The timed-out step is the final terminal-state wait in `tests/smoke/tapes/init-wizard-reverse-proxy.tape`:

```
# ─── Step 10: Health Check ──────────────────────────────────────────
Wait+Screen@10s /Press Enter to run health checks/
Enter
…
Wait+Screen@180s /(Ready  \|  qwen2:0\.5b|TAPE\$ netclaw init)/
```

After pressing Enter to run the health checks, the wizard never reached **either** terminal state within **180 s** — neither the chat-ready bar nor the post-exit shell prompt — and the last captured screen was still `Netclaw Setup`. So the **Health Check step itself ran long** on this macOS runner.

## Root-cause hypothesis

Step 10 (`HealthCheckStepViewModel`) on a fresh reverse-proxy init: writes config → starts the daemon → polls `/api/health/ready` until ready (`ReloadReadyTimeout = 90s`, `OverallHealthCheckTimeout = 5min`). On a slow/loaded macOS GitHub runner, first daemon startup (JIT warm-up, SQLite provisioning, identity-file writes, provider/model checks) plus the readiness poll can exceed the tape's `@180s` budget, so `vhs` times out before the wizard reaches a terminal screen. This is a runner-speed/timeout-margin problem, not a logic bug — the wizard would eventually complete (the 5-minute overall budget is larger than the tape's 180 s wait).

## Impact

- False `Native Smoke (macOS)` failures that block/​interrupt PRs unrelated to the wizard (this one was a Docker-only PR).
- Required a manual re-run / merge-override to land #1306.

## Suggested directions (for investigation)

- Raise the tape's terminal-state wait (`Wait+Screen@180s …`) for this tape, and/or scope a larger budget on the macOS runner specifically, so it comfortably exceeds the worst-case macOS health-check time (bounded by `OverallHealthCheckTimeout = 5min`).
- Measure first-daemon-startup latency on the macOS runner (capture timing in `netclaw-home-logs`) to confirm the hypothesis and pick a margin.
- Consider an automatic single retry for native-smoke **tapes** (not scenarios) on macOS, with the retry logged loudly, since vhs/terminal capture is inherently timing-sensitive.
- The `init-wizard` (Local) tape shares the same Step-10 health-check wait; check whether it is similarly at-risk on macOS and fix both together.

## Notes

- PR #1307 (`fix/wizard-readiness-1302-1304`) reworks this readiness path (monotonic restart generation + per-probe endpoint re-resolution) but does **not** address macOS startup latency, so it neither causes nor fixes this flake.
- Tape authoring conventions: `tests/smoke/tapes/README.md`. The relevant timeouts live in `src/Netclaw.Cli/Tui/Wizard/Steps/HealthCheckStepViewModel.cs:17,22`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky CI: init-wizard-reverse-proxy native-smoke tape times out on macOS (wizard never reaches terminal state within 180s) #1308

Summary

Evidence it is flaky (not a code regression)

Failure signature

Root-cause hypothesis

Impact

Suggested directions (for investigation)

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Commit	`smoke` run	macOS Native Smoke
`b43604fe` (first push)	26867175059	✅ pass
`4f5b99a7` (amend — Docker-only delta)	26868137909	❌ fail

Flaky CI: init-wizard-reverse-proxy native-smoke tape times out on macOS (wizard never reaches terminal state within 180s) #1308

Description

Summary

Evidence it is flaky (not a code regression)

Failure signature

Root-cause hypothesis

Impact

Suggested directions (for investigation)

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions