fix(wizard): gate init readiness on daemon restart generation + re-resolved endpoint (#1302, #1304)#1307
Merged
Aaronontheweb merged 1 commit intoJun 3, 2026
Conversation
… re-resolved endpoint The init wizard confirms a config-reload restart by polling /api/health/ready after writing config. Two fragilities in that poll (follow-ups from netclaw-dev#1282): netclaw-dev#1302 — it required the daemon's PID-file start TIME to advance, which a wall-clock step-back (NTP correction, VM resume, container host clock adjustment) could make impossible — freezing the gate so a healthy reloaded daemon was falsely reported "did not become ready" until the 90s timeout. Replace the timestamp proxy with a monotonic restart generation: the daemon advances DaemonRestartSignal.Generation once per restart-loop iteration and reports it via an X-Netclaw-Generation header on the anonymous /api/health/ready (anonymous, so first-init needs no auth token). The wizard gates readiness on that generation advancing past the value captured before the config write. netclaw-dev#1304 — DaemonApi resolved its endpoint once at construction, so a Daemon-section port change would leave the post-restart readiness poll hitting the dead old port until timeout. Re-resolve the endpoint on every probe (ProbeReadinessAsync) — picking up the just-written Daemon.Port via the daemon-config resolver while still honoring an explicit NETCLAW_DAEMON_ENDPOINT or paired-client endpoint when one is set. Removes the now-unused DaemonManager.TryGetRecordedStartTime.
b9d1821 to
705ae03
Compare
Aaronontheweb
commented
Jun 3, 2026
|
|
||
| await api.ProbeReadinessAsync(TestContext.Current.CancellationToken); | ||
|
|
||
| Assert.Equal(5300, captured!.RequestUri!.Port); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two follow-ups from the #1279 wizard-readiness rework (#1282). The init wizard
writes config and polls
/api/health/readyto confirm the daemon restarted ontothe new config. Both fixes harden that poll:
Wizard readiness: use authoritative daemon start-time/generation, not the PID-file timestamp proxy #1302 — restart signal. The poll required the daemon's PID-file start
time to advance (
current > before). A wall-clock step-back (NTP correction,VM resume, container host clock adjustment) between capturing the baseline and
the restart makes that comparison false forever → the wizard polls the full
90s and falsely reports "Daemon did not become ready" on a daemon that reloaded
fine. Coarse/equal ticks and torn PID-file reads have the same effect. Replaced
with a monotonic restart generation:
DaemonRestartSignal.Generationadvances once per restart-loop iteration and is surfaced as an
X-Netclaw-Generationheader on the anonymous/api/health/ready—immune to clock movement, and on the anonymous endpoint so first-init needs no
auth token.
DaemonApi endpoint is frozen at construction; a Daemon-section port change makes the init wizard poll the wrong port #1304 — frozen endpoint.
DaemonApiresolved its endpoint once atconstruction. A
Daemon-section port change brings the daemon back on thenew port while the frozen endpoint keeps polling the dead old one until timeout.
ProbeReadinessAsyncnow re-resolves the endpoint on every probe, pickingup the just-written
Daemon.Port(and still honoring an explicitNETCLAW_DAEMON_ENDPOINT/ paired-client endpoint when set).Changes
DaemonRestartSignal: add a monotonicGeneration(AdvanceGeneration()/Volatile.Read).Program.cs: advance the generation each restart-loop iteration; emitX-Netclaw-Generationon/api/health/ready.DaemonApi: store_paths; replaceIsHealthyAsyncwithProbeReadinessAsync→
(bool Healthy, int? Generation), re-resolving the endpoint per probe andparsing the header.
HealthCheckStepViewModel: capture the pre-write generation via a probe; gatereadiness on
IsRestartedGeneration(before, current)(now a pure static helper)using the daemon-reported generation.
DaemonManager: remove the now-unusedTryGetRecordedStartTime.Validation
DaemonRestartSignalTests(monotonic generation),DaemonApiAuthenticationTests(header parse + re-resolution to a changedport, proving DaemonApi endpoint is frozen at construction; a Daemon-section port change makes the init wizard poll the wrong port #1304),
HealthCheckStepViewModelTests(generation gate;stale-same / newer / down / no-generation cases).
GET /api/health/readyreturns200withX-Netclaw-Generation: 1and body"healthy"(backward compatible —the Docker HEALTHCHECK and smoke poll only check 2xx).
dotnet slopwatch analyze→ 0 issues; copyright headers verified.Closes #1302
Closes #1304