fix: integrate OpenClaw Gateway health signals#957
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce4a1d2b16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| task(); | ||
| cache.tasks[taskName] = { | ||
| key: cacheKey, | ||
| updatedAt: new Date().toISOString(), | ||
| }; | ||
| writeCache(cachePath, cache); |
There was a problem hiding this comment.
Only persist cache when maintenance task succeeds
This marks a task as cached immediately after invoking task(), but several wrapped maintenance routines (e.g., plugin install/upgrade and cleanup paths in config-sync.ts) catch and log their own failures instead of throwing. In that case a transient failure is still recorded as a cache miss completion, and the next startup becomes a cache hit that skips retrying the maintenance, leaving stale/missing plugins or runtime-deps state until some unrelated cache key change occurs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a82cafcc4c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| rpcRouter: params.status.gatewayReady === true || this.lastCoreProbe?.ok === true | ||
| ? 'ready' | ||
| : this.lastCoreProbe?.ok === false | ||
| ? 'blocked' |
There was a problem hiding this comment.
Let failed core probes override stale ready status
rpcRouter is currently derived with gatewayReady === true || lastCoreProbe?.ok === true, so once gatewayReady has ever been set to true, later failed core probes (lastCoreProbe.ok === false, e.g. repeated system-presence timeouts) can no longer surface as blocked. In that state, diagnostics endpoints will continue reporting core.rpcRouter: "ready" even when the router is actually failing, which makes incident triage and restart decisions inaccurate.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20280f4a06
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| )); | ||
| unsubscribers.push(subscribeHostEvent('gateway:health', (payload) => { | ||
| const current = get().health; | ||
| set({ health: { ...(current ?? { ok: true }), ok: true, openclawHealth: payload } }); |
There was a problem hiding this comment.
Preserve health failure state on gateway health events
This handler unconditionally sets health.ok to true whenever a gateway:health event arrives, which can overwrite a real unhealthy state (for example after a failed /api/gateway/health check) and make downstream UI/diagnostics read the gateway as healthy until another explicit poll runs. Because the payload is stored as openclawHealth but never consulted for ok, any degraded/failed health signal from the event stream is effectively masked.
Useful? React with 👍 / 👎.
Summary
system-presence,health,status,channels.status, anddoctor.memory.*signals over stderr string matching.healthandpresenceevents through dedicated Gateway events, host events, preload allowlists, and the renderer Gateway store.Test plan
pnpm exec vitest run tests/unit/gateway-event-dispatch.test.ts tests/unit/gateway-events.test.ts tests/unit/gateway-manager-diagnostics.test.ts tests/unit/gateway-ready-fallback.test.ts tests/unit/dreams-page.test.tsx tests/unit/channel-routes.test.ts tests/unit/harness-specs.test.tspnpm run typecheckpnpm run lint:check(passes with existingsrc/pages/Chat/ChatInput.tsx:selectedSkillwarning)pnpm run build:vitepnpm run comms:replaypnpm run comms:comparepnpm exec playwright test tests/e2e/openclaw-dreams.spec.ts tests/e2e/channels-health-diagnostics.spec.ts