[Bug]: Windows Scheduled Task gateway restart/health becomes inconsistent after ready; mixes known probe false negatives with cron/session stale state and post-ready HTTP loss
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
On Windows with the gateway installed as a Scheduled Task, openclaw gateway restart can repeatedly time out with:
Timed out after 60s waiting for gateway port 18789 to become healthy
Service runtime: status=unknown
Port 18789 is already in use
This environment appears to hit more than one problem at once:
- A known local loopback probe false negative on restart (
ws ... code=1008 reason=connect failed / device-required)
- Cron/job/session state corruption after restart (
runningAtMs / stale cron session state)
- An additional post-
ready instability where the gateway can log ready (...) and even bind 18789, but /health and / later stop responding or the port becomes free again
I am filing this because the first two have close neighbors in existing issues/PRs, but I have not found a single Windows issue that covers the full combined behavior end-to-end.
OpenClaw version
OpenClaw 2026.4.8 (9ece252)
Operating system
Windows (PowerShell 5.1.22621.4249)
Install method
npm global install + openclaw gateway install / Scheduled Task
Model
bailian/qwen3.5-plus
Provider / routing chain
Ali / Bailian
Additional provider/model setup details
- Node.js upgraded to
v22.22.2
- Repro observed both before and after upgrade from
2026.4.5 to 2026.4.8
- Repro observed with normal config and also with external channels/providers largely disabled during bisecting
Steps to reproduce
- Install/run gateway on Windows via Scheduled Task
- Have existing cron jobs in
~/.openclaw/cron/jobs.json
- Run
openclaw gateway restart
- Observe one or more of the following sequences:
Sequence A:
- CLI waits 60s and prints timeout
- log shows local WS probe closed with
1008 / connect failed
- gateway may actually already be alive
Sequence B:
- gateway reaches:
starting HTTP server...
ready (... plugins, ...s)
cron: started
- but
http://127.0.0.1:18789/health and / later time out or the port becomes free again
Sequence C:
- cron jobs recover from UI edits/restart into stale state
- previously seen local failures included
TypeError: Cannot read properties of undefined (reading 'runningAtMs')
- stale
runningAtMs / stale cron session state prevented clean recovery without manual intervention
Expected behavior
openclaw gateway restart should succeed when the restarted local gateway is already healthy enough to reject unauthenticated loopback probes
- Scheduled Task runtime and port ownership should stay consistent
- Cron startup should not preserve impossible stale running state
- Once the gateway logs
ready (...), /health and / should remain responsive instead of later hanging or disappearing
Actual behavior
Observed across repeated runs on 2026-04-08 and 2026-04-09:
openclaw gateway restart times out after 60s
- logs show loopback WS probe closure:
code=1008 reason=connect failed
cause":"device-required"
- sometimes port
18789 is reported busy while runtime status is unknown
- sometimes gateway logs
ready (...) and later port 18789 becomes free again
- sometimes
/health is briefly reachable, then later times out
- cron previously failed with missing or stale
runningAtMs-related state
Representative log lines:
2026-04-09T09:55:37.924+08:00 [gateway/ws] closed before connect ... code=1008 reason=connect failed
2026-04-09T10:02:48.014+08:00 Timed out after 60s waiting for gateway port 18789 to become healthy.
2026-04-09T10:02:48.045+08:00 Service runtime: status=unknown
2026-04-09T10:02:48.049+08:00 Gateway port 18789 status: free.
2026-04-09T10:05:23.293+08:00 [gateway] ready (0 plugins, 27.5s)
2026-04-09T10:05:28.021+08:00 [cron] cron: started
Related issues / likely overlap
#48771 and PR #48801: Windows/local restart false negative when loopback WS probe is closed with 1008 / connect failed / device required
#44920: stale cron runningAtMs after restart
#59511: local http://127.0.0.1:18789/health not usable after gateway run
#60295: different OS, but similar “restart times out while service state/port ownership is inconsistent”
What I found during local debugging
I did substantial local debugging because the machine was stuck in production use:
- upgraded OpenClaw from
2026.4.5 to 2026.4.8
- upgraded Node.js to
22.22.2
- isolated/remediated several local issues:
- old incompatible channel config fields after upgrade
- untracked local plugin auto-loading
- stale cron job/session state
- after that cleanup, the remaining issue was still reproducible:
- gateway reaches
ready (...)
- HTTP health/UI later become unreachable or unstable
I also locally patched the CLI to treat loopback HTTP /health and local 1008 policy closes as healthy enough for restart probing, which reduced one class of false negatives, but did not eliminate the post-ready instability.
That suggests there may still be a deeper Windows gateway/runtime bug after startup, beyond the already-known restart probe issue.
Impact and severity
High for Windows users relying on Scheduled Task mode:
- restart automation becomes unreliable
- control UI availability becomes inconsistent
- cron jobs can be left in broken/stale state after restart cycles
- users may see a mixture of “service is up”, “service is unknown”, and “port is free” across the same debugging session
Logs, screenshots, and evidence
I can provide:
- full
openclaw-2026-04-08.log / openclaw-2026-04-09.log
openclaw gateway restart terminal output
openclaw gateway status --json output from both healthy and unhealthy moments
- details of the stale cron/session state observed in
~/.openclaw/cron/jobs.json and session index cleanup
Additional information
If helpful, I can also open a follow-up issue with a narrower repro focused only on:
- Windows Scheduled Task + restart probe false negative
- Cron stale
runningAtMs / session state after restart
- Post-
ready HTTP hang / port disappearance
because on this machine they appeared stacked together.
[Bug]: Windows Scheduled Task gateway restart/health becomes inconsistent after ready; mixes known probe false negatives with cron/session stale state and post-ready HTTP loss
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
On Windows with the gateway installed as a Scheduled Task,
openclaw gateway restartcan repeatedly time out with:Timed out after 60s waiting for gateway port 18789 to become healthyService runtime: status=unknownPort 18789 is already in useThis environment appears to hit more than one problem at once:
ws ... code=1008 reason=connect failed/device-required)runningAtMs/ stale cron session state)readyinstability where the gateway can logready (...)and even bind18789, but/healthand/later stop responding or the port becomes free againI am filing this because the first two have close neighbors in existing issues/PRs, but I have not found a single Windows issue that covers the full combined behavior end-to-end.
OpenClaw version
OpenClaw 2026.4.8 (
9ece252)Operating system
Windows (PowerShell 5.1.22621.4249)
Install method
npm global install +
openclaw gateway install/ Scheduled TaskModel
bailian/qwen3.5-plusProvider / routing chain
Ali / Bailian
Additional provider/model setup details
v22.22.22026.4.5to2026.4.8Steps to reproduce
~/.openclaw/cron/jobs.jsonopenclaw gateway restartSequence A:
1008/connect failedSequence B:
starting HTTP server...ready (... plugins, ...s)cron: startedhttp://127.0.0.1:18789/healthand/later time out or the port becomes free againSequence C:
TypeError: Cannot read properties of undefined (reading 'runningAtMs')runningAtMs/ stale cron session state prevented clean recovery without manual interventionExpected behavior
openclaw gateway restartshould succeed when the restarted local gateway is already healthy enough to reject unauthenticated loopback probesready (...),/healthand/should remain responsive instead of later hanging or disappearingActual behavior
Observed across repeated runs on 2026-04-08 and 2026-04-09:
openclaw gateway restarttimes out after 60scode=1008 reason=connect failedcause":"device-required"18789is reported busy while runtime status isunknownready (...)and later port18789becomes free again/healthis briefly reachable, then later times outrunningAtMs-related stateRepresentative log lines:
Related issues / likely overlap
#48771and PR#48801: Windows/local restart false negative when loopback WS probe is closed with1008/connect failed/device required#44920: stale cronrunningAtMsafter restart#59511: localhttp://127.0.0.1:18789/healthnot usable after gateway run#60295: different OS, but similar “restart times out while service state/port ownership is inconsistent”What I found during local debugging
I did substantial local debugging because the machine was stuck in production use:
2026.4.5to2026.4.822.22.2ready (...)I also locally patched the CLI to treat loopback HTTP
/healthand local1008policy closes as healthy enough for restart probing, which reduced one class of false negatives, but did not eliminate the post-readyinstability.That suggests there may still be a deeper Windows gateway/runtime bug after startup, beyond the already-known restart probe issue.
Impact and severity
High for Windows users relying on Scheduled Task mode:
Logs, screenshots, and evidence
I can provide:
openclaw-2026-04-08.log/openclaw-2026-04-09.logopenclaw gateway restartterminal outputopenclaw gateway status --jsonoutput from both healthy and unhealthy moments~/.openclaw/cron/jobs.jsonand session index cleanupAdditional information
If helpful, I can also open a follow-up issue with a narrower repro focused only on:
runningAtMs/ session state after restartreadyHTTP hang / port disappearancebecause on this machine they appeared stacked together.