Rescue: add watchdog core service and cron engine#46502
Rescue: add watchdog core service and cron engine#46502shichangs wants to merge 60 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR introduces the core service layer for a rescue watchdog feature: a new Key changes and findings:
Confidence Score: 3/5
|
|
@codex review |
|
Addressed the remaining actionable Greptile point in shichangs@f3c79d7. Changes in this push:
Local verification after the change:
|
…odex/rescue-watchdog-core
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e3c47fc9af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f1a209602
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 668ea0c796
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Revalidated the current PR head Current status from this pass:
Environment-only blockers I re-confirmed in that local clone:
I did not find any new actionable source issues on top of the current PR head. @joshavant @tyler6204 could you take another look when you have a chance, especially at the stale/conflicting mergeability signal? |
|
All current P1/P2 review threads on #46502 are resolved on the latest PR head I re-checked the final head locally against the current PR diff:
Local validation in this worktree is still partially blocked by environment limits:
So I could not rerun @joshavant @tyler6204 could you take another look when you have a chance? |
|
Re-checked #46502 on the current PR head Current status from this pass:
No source change was needed in this pass. @joshavant @tyler6204 could one of you take another look when convenient, especially at the stale mergeability signal? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2d9bc46033
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…-core # Conflicts: # src/agents/model-compat.test.ts # src/daemon/program-args.test.ts # src/gateway/probe.test.ts # src/gateway/probe.ts # src/gateway/server/ws-connection/message-handler.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f0d8679e57
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
All current P1/P2 review threads on #46502 are resolved again. Latest follow-up commit: What changed in this pass:
Local verification on this push:
Broader validation in this environment is still partially blocked by missing optional/shared dependencies:
@joshavant @tyler6204 could you take another look when convenient? |
|
Re-checked the current PR head Current status from this pass:
Targeted local verification on the same commit in the existing branch worktree:
The only remaining local blocker I hit here was CLI smoke from @joshavant @tyler6204 could one of you take another pass when convenient? |
|
Re-checked the current PR head Current status from this pass:
Local verification from the current branch worktree:
The current red CI also looks unrelated to this PR diff. The failing jobs are dying during
Those repos are not introduced by this PR diff, and the latest head only touches watchdog / daemon / gateway paths. @joshavant @tyler6204 could one of you take another pass when convenient, especially with the current CI install/auth failure in mind? |
|
Re-checked #46502 on the current head Current state:
Local verification on this head:
Notes:
@joshavant @tyler6204 could you take another pass when convenient? |
|
Follow-up from local self-review: I pushed shichangs@1ba5b62 to fix a test typing mismatch on the current PR head. What changed:
Local verification on this head:
Notes:
@joshavant @tyler6204 could you take another pass when convenient? |
|
Re-checked the current PR head Current state:
Local verification in this workspace is still blocked because @joshavant @tyler6204 could you take another pass when convenient? |
|
Revalidated the current PR head Current validation on this head:
Environment notes from this run:
@joshavant @tyler6204 could you take a look when you have a chance? |
|
Codex review: needs changes before merge. Summary Reproducibility: yes. for the concrete review findings: source comparison shows the PR head omits current main launchd enable/recovery behavior, and a PR-head changelog search shows no rescueWatchdog entry. The broader feature itself has unit-level evidence in the PR discussion but no live end-to-end repair reproduction. Next step before merge Security Review findings
Review detailsBest possible solution: Rebase the branch onto current main, preserve launchd restart semantics including enable-before-kickstart and kickstart-failure recovery, add the active-release changelog entry, then continue maintainer/security review of the rescueWatchdog contract. Do we have a high-confidence way to reproduce the issue? Yes for the concrete review findings: source comparison shows the PR head omits current main launchd enable/recovery behavior, and a PR-head changelog search shows no rescueWatchdog entry. The broader feature itself has unit-level evidence in the PR discussion but no live end-to-end repair reproduction. Is this the best way to solve the issue? No, not as-is. The isolated cron/service approach is plausible, but the branch must first preserve current launchd restart behavior and satisfy changelog policy before it is a maintainable merge candidate. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 443f7035a2e5. |
Summary
rescueWatchdogcron payload + runner, and wires the gateway cron service to execute that watchdog job against a monitored profile.openclaw onboard --rescue-watchdogUX, and no docs changes in this PR.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
rescueWatchdog, for watchdog jobs that monitor a target profile and repair it when unhealthy.0600plist permissions.AbortSignal, so watchdog restart attempts can be canceled cleanly instead of hanging until command timeouts.Security Impact (required)
Yes/No): YesYes/No): NoYes/No): YesYes/No): YesYes/No): NoYes, explain risk + mitigation:The new capability is intentionally narrow: an isolated cron job can probe and repair a monitored local profile. Risk is bounded by restricting the payload shape, disallowing delivery targets for
rescueWatchdog, resolving the monitored profile explicitly, and hardening the underlying managed-service restart and launchd plist write paths. The repair fallback uses exact argv viaprocess.execPathand avoids shell-based PATH resolution.Repro + Verification
Environment
Steps
Expected
rescueWatchdogcron job can be normalized, stored, validated, and executed as an isolated job.Actual
Evidence
Human Verification (required)
What you personally verified (not just CI), and how:
HOMEoverride, rejects symlink targets, and writes plist files with0600permissions via temp-file + rename.AbortSignalnow propagates through launchd/systemd/schtasks restart helpers into the process exec layer.rescueWatchdogcron payloads normalize correctly, are forced to isolated session targets, and reject delivery/failureDestination config.doctor --repair --non-interactive, and reports success/error summaries correctly.os.userInfo()failure falls back toos.homedir(), and unresolved trusted home fails closed.launchdtemp plist files are cleaned up if rename fails.Review Conversations
Compatibility / Migration
Yes/No): YesYes/No): NoYes/No): NoFailure Recovery (if this breaks)
src/daemon/*,src/process/exec.ts,src/cron/*,src/rescue/watchdog-shared.ts,src/gateway/server-cron.tsagentTurnjobs after the new payload kind was added.Risks and Mitigations
rescueWatchdogis explicitly restricted to isolated jobs, delivery is rejected, and schema/service/normalization paths have dedicated tests.