Skip to content

fix(wizard): gate init readiness on daemon restart generation + re-resolved endpoint (#1302, #1304)#1307

Merged
Aaronontheweb merged 1 commit into
netclaw-dev:devfrom
Aaronontheweb:fix/wizard-readiness-1302-1304
Jun 3, 2026
Merged

fix(wizard): gate init readiness on daemon restart generation + re-resolved endpoint (#1302, #1304)#1307
Aaronontheweb merged 1 commit into
netclaw-dev:devfrom
Aaronontheweb:fix/wizard-readiness-1302-1304

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

Two follow-ups from the #1279 wizard-readiness rework (#1282). The init wizard
writes config and polls /api/health/ready to confirm the daemon restarted onto
the new config. Both fixes harden that poll:

  • Wizard readiness: use authoritative daemon start-time/generation, not the PID-file timestamp proxy #1302 — restart signal. The poll required the daemon's PID-file start
    time
    to advance (current > before). A wall-clock step-back (NTP correction,
    VM resume, container host clock adjustment) between capturing the baseline and
    the restart makes that comparison false forever → the wizard polls the full
    90s and falsely reports "Daemon did not become ready" on a daemon that reloaded
    fine. Coarse/equal ticks and torn PID-file reads have the same effect. Replaced
    with a monotonic restart generation: DaemonRestartSignal.Generation
    advances once per restart-loop iteration and is surfaced as an
    X-Netclaw-Generation header on the anonymous /api/health/ready
    immune to clock movement, and on the anonymous endpoint so first-init needs no
    auth token.

  • DaemonApi endpoint is frozen at construction; a Daemon-section port change makes the init wizard poll the wrong port #1304 — frozen endpoint. DaemonApi resolved its endpoint once at
    construction. A Daemon-section port change brings the daemon back on the
    new port while the frozen endpoint keeps polling the dead old one until timeout.
    ProbeReadinessAsync now re-resolves the endpoint on every probe, picking
    up the just-written Daemon.Port (and still honoring an explicit
    NETCLAW_DAEMON_ENDPOINT / paired-client endpoint when set).

Changes

  • DaemonRestartSignal: add a monotonic Generation (AdvanceGeneration() /
    Volatile.Read).
  • Program.cs: advance the generation each restart-loop iteration; emit
    X-Netclaw-Generation on /api/health/ready.
  • DaemonApi: store _paths; replace IsHealthyAsync with ProbeReadinessAsync
    (bool Healthy, int? Generation), re-resolving the endpoint per probe and
    parsing the header.
  • HealthCheckStepViewModel: capture the pre-write generation via a probe; gate
    readiness on IsRestartedGeneration(before, current) (now a pure static helper)
    using the daemon-reported generation.
  • DaemonManager: remove the now-unused TryGetRecordedStartTime.

Validation

  • New/updated unit tests pass: DaemonRestartSignalTests (monotonic generation),
    DaemonApiAuthenticationTests (header parse + re-resolution to a changed
    port
    , proving DaemonApi endpoint is frozen at construction; a Daemon-section port change makes the init wizard poll the wrong port #1304), HealthCheckStepViewModelTests (generation gate;
    stale-same / newer / down / no-generation cases).
  • End-to-end: ran the built daemon and confirmed GET /api/health/ready returns
    200 with X-Netclaw-Generation: 1 and body "healthy" (backward compatible —
    the Docker HEALTHCHECK and smoke poll only check 2xx).
  • dotnet slopwatch analyze → 0 issues; copyright headers verified.

Closes #1302
Closes #1304

… re-resolved endpoint

The init wizard confirms a config-reload restart by polling /api/health/ready
after writing config. Two fragilities in that poll (follow-ups from netclaw-dev#1282):

netclaw-dev#1302 — it required the daemon's PID-file start TIME to advance, which a
wall-clock step-back (NTP correction, VM resume, container host clock
adjustment) could make impossible — freezing the gate so a healthy reloaded
daemon was falsely reported "did not become ready" until the 90s timeout.
Replace the timestamp proxy with a monotonic restart generation: the daemon
advances DaemonRestartSignal.Generation once per restart-loop iteration and
reports it via an X-Netclaw-Generation header on the anonymous
/api/health/ready (anonymous, so first-init needs no auth token). The wizard
gates readiness on that generation advancing past the value captured before
the config write.

netclaw-dev#1304 — DaemonApi resolved its endpoint once at construction, so a
Daemon-section port change would leave the post-restart readiness poll hitting
the dead old port until timeout. Re-resolve the endpoint on every probe
(ProbeReadinessAsync) — picking up the just-written Daemon.Port via the
daemon-config resolver while still honoring an explicit NETCLAW_DAEMON_ENDPOINT
or paired-client endpoint when one is set.

Removes the now-unused DaemonManager.TryGetRecordedStartTime.
@Aaronontheweb Aaronontheweb force-pushed the fix/wizard-readiness-1302-1304 branch from b9d1821 to 705ae03 Compare June 3, 2026 06:47
@Aaronontheweb Aaronontheweb added bug Something isn't working reliability Retries, resilience, graceful degradation labels Jun 3, 2026

@Aaronontheweb Aaronontheweb left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


await api.ProbeReadinessAsync(TestContext.Current.CancellationToken);

Assert.Equal(5300, captured!.RequestUri!.Port);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) June 3, 2026 06:54
@Aaronontheweb Aaronontheweb merged commit 7aafe7b into netclaw-dev:dev Jun 3, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working reliability Retries, resilience, graceful degradation

Projects

None yet

1 participant