Onboarding: add opt-in rescue watchdog for self-healing gateway recovery#40658
Onboarding: add opt-in rescue watchdog for self-healing gateway recovery#40658shichangs wants to merge 8 commits intoopenclaw:mainfrom
Conversation
🔒 Aisle Security AnalysisWe found 3 potential security issue(s) in this PR:
1. 🟠 Rescue watchdog onboarding can leak rescue gateway token via weak loopback port ownership verification (TOCTOU/port hijack)
Description
Key issues:
Impact:
Vulnerable code: const ownershipMetadataUnavailable =
runtime.status !== "stopped" &&
(runtime.status === "unknown" || runtime.pid == null) &&
(portUsage.status !== "busy" || portUsage.listeners.length === 0);
const probeLooksLikeGateway =
probe?.ok === true || looksLikeAuthClose(probe?.close?.code, probe?.close?.reason);
if (probeLooksLikeGateway && (listenerOwnedByRuntime || ownershipMetadataUnavailable)) {
return;
}RecommendationTighten the port ownership check so the onboarding flow never sends the rescue token unless the listener is strongly attributable to the managed service. Recommended changes:
Example safer logic (fail closed when you cannot attribute the port): const probe = await probeGateway({ url: wsUrl, timeoutMs: 1_000 }).catch(() => null);
if (!probe?.ok) {
await sleep(POLL);
continue;
}
if (
runtime.status !== "running" ||
typeof runtime.pid !== "number" ||
portUsage.status !== "busy" ||
portUsage.listeners.length === 0 ||
!portUsage.listeners.some(l => l.pid === runtime.pid || l.ppid === runtime.pid)
) {
throw new Error("Refusing to send rescue token: could not prove managed service owns the port");
}This ensures the onboarding process does not disclose credentials to an untrusted local listener. 2. 🟡 TOCTOU symlink swap allows LaunchAgent plist write outside intended directory (parent directory not protected)
DescriptionThe new
Impact (local attack): if an attacker can rename/replace Vulnerable code (symlink protection is not anchored to a directory file descriptor; parent components can be swapped): await ensureSecureDirectory(path.dirname(plistPath));
...
await writeLaunchAgentPlistSecure(plistPath, plist);const dir = path.dirname(plistPath);
const tempPath = path.join(dir, `.${path.basename(plistPath)}.${process.pid}.tmp`);
const handle = await fs.open(tempPath, O_NOFOLLOW | ...);RecommendationAnchor operations to a verified, non-symlink directory and reduce TOCTOU windows. Practical hardening options (best to combine):
Example of additional directory verification (defense-in-depth): async function assertSecureDir(dir: string) {
const st = await fs.lstat(dir);
if (!st.isDirectory() || st.isSymbolicLink()) throw new Error(`Insecure dir: ${dir}`);
const mode = st.mode & 0o777;
if ((mode & 0o022) !== 0) throw new Error(`Dir is writable by group/other: ${dir}`);
}
await assertSecureDir(path.dirname(plistPath));
await writeLaunchAgentPlistSecure(plistPath, plist);
await assertSecureDir(path.dirname(plistPath)); // again before launchctl bootstrapThis does not fully eliminate TOCTOU without 3. 🟡 Symlink-following, non-atomic write of auth-profiles.json can overwrite arbitrary files and risk secret exposure
Description
Because
This becomes higher impact with rescue watchdog because it intentionally mirrors credentials into the rescue agent directory. Vulnerable code: // src/infra/json-file.ts
fs.writeFileSync(pathname, `${JSON.stringify(data, null, 2)}\n`, "utf8");
fs.chmodSync(pathname, 0o600);Call chain in the rescue flow:
RecommendationHarden credential/config writes against symlink attacks and make them atomic. Recommended changes:
Example secure implementation (POSIX-focused; best-effort on Windows): import { constants as fsConstants } from "node:fs";
import fs from "node:fs/promises";
import path from "node:path";
export async function saveJsonFileAtomicNoSymlink(pathname: string, data: unknown) {
const dir = path.dirname(pathname);
await fs.mkdir(dir, { recursive: true, mode: 0o700 });
// Refuse writing through an existing symlink
try {
const st = await fs.lstat(pathname);
if (st.isSymbolicLink()) throw new Error("refusing to write to symlink");
} catch (e: any) {
if (e?.code !== "ENOENT") throw e;
}
const tmp = path.join(dir, `.${path.basename(pathname)}.${process.pid}.tmp`);
const nofollow = typeof fsConstants.O_NOFOLLOW === "number" ? fsConstants.O_NOFOLLOW : 0;
const fh = await fs.open(tmp, fsConstants.O_WRONLY | fsConstants.O_CREAT | fsConstants.O_EXCL | nofollow, 0o600);
try {
await fh.writeFile(`${JSON.stringify(data, null, 2)}\n`, { encoding: "utf8" });
await fh.sync();
} finally {
await fh.close();
}
await fs.rename(tmp, pathname);
}Then update Analyzed PR: #40658 at commit Last updated on: 2026-03-12T13:54:27Z |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 813e755556
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Greptile SummaryThis PR adds opt-in rescue watchdog onboarding — a second isolated gateway profile that monitors and auto-restarts the primary profile. The implementation is well-scoped, correctly isolates the rescue profile env/state from the primary, preserves rescue-only credentials on re-runs, and includes solid unit + integration test coverage. Key findings:
Confidence Score: 3/5
Last reviewed commit: 813e755 |
|
Addressed the review feedback in follow-up commit What changed:
Local verification after the follow-up:
A new push is up now, so CI should rerun on the updated commits. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e0e57e7f2a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the remaining onboarding/rescue review items in follow-up commit 93b132d78. What changed:
Local verification on this head:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 93b132d78e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Follow-up is in 27f3d63. This addresses the remaining review items on this PR:
Validation:
Local |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27f3d6376e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Follow-up is in 1a2d69e. This closes out the remaining onboarding/CI issues on the current head:
Validation on this head:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a2d69e8dd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Follow-up is in 1807d4e. This fixes the remaining long-profile collision in rescue onboarding:
Validation on this head:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c455b7524
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Follow-up is in 22cc27d. This fixes the two remaining rescue onboarding regressions on the current head:
Validation on this head:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 22cc27d532
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Follow-up is in b1a4d4249. This tightens the remaining rescue/launchd hardening on the current head:
Validation on this head:
|
|
Follow-up in 05cd77e:
Local verification:
|
|
Follow-up in 413b857:
Local verification on this head:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 413b85725e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Follow-up in e8c8286:
Local verification:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8c8286195
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 621cb9eb23
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
621cb9e to
f59be9e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4f441a685c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Follow-up pushed in 1ebacdb. What changed:
Validation:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1ebacdbad8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
e9f43ae to
5d17041
Compare
|
All review threads have been resolved and CI is green. This PR adds an opt-in rescue watchdog for self-healing gateway recovery during onboarding. Would appreciate a review when you have a moment — cc @joshavant @tyler6204 |
5d17041 to
7e4d128
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e4d128950
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
a2bb059 to
706d1a4
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 706d1a44aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
706d1a4 to
b08b7e2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0772829039
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Extend config/cron/daemon type definitions with rescue watchdog fields. Add watchdog-shared module for building isolated rescue profile environments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…emd, and schtasks Support installing, starting, and stopping isolated rescue gateway services across macOS (launchd), Linux (systemd), and Windows (Task Scheduler). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the rescue watchdog cron job that probes the primary gateway health and triggers doctor-based recovery when the primary is unreachable. Includes timeout budgeting, probe URL normalization, and job validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire rescue watchdog setup into both interactive and non-interactive onboarding paths. Add --rescue-watchdog CLI flag, rescue profile provisioning, daemon service installation, and wizard UX integration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the --rescue-watchdog flag, wizard flow, and automation examples. Fix unrelated test mock typings exposed during development. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2d57ccd to
c645a23
Compare
|
Commit history cleanup: squashed 44 incremental commits into 5 logical commits:
Also updated the PR description to fill in all required template sections (Human Verification, Compatibility, Failure Recovery, Risks). This is the final force push for history cleanup — incremental commits only going forward. Ready for review. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c645a2316b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…off) Integrate main's kickstart-based restart with detached handoff logic while preserving the rescue watchdog signal parameter threading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5d5227c to
65804e6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d5227c9ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (rescueWatchdogEnabled && !primaryManagedServiceReady) { | ||
| await prompter.note( | ||
| "Rescue watchdog requires a healthy primary managed service. Gateway service install failed during onboarding, so rescue watchdog was skipped.", | ||
| "Rescue watchdog", | ||
| ); |
There was a problem hiding this comment.
Fail onboarding when explicit rescue setup gets skipped
When --rescue-watchdog is explicitly requested, this branch only shows a note and continues after primary managed-service install fails, so interactive onboarding can exit successfully without provisioning the watchdog the user asked for. This is inconsistent with the explicit-error behavior already used for other rescue setup failures and with the non-interactive path, and it can silently break automation/checklists that treat command success as confirmation that rescue protection is active.
Useful? React with 👍 / 👎.
| try { | ||
| return io.loadConfig(); | ||
| } catch (error) { | ||
| throw new Error( | ||
| `Rescue watchdog setup failed: existing rescue profile config at "${configPath}" could not be loaded: ${error instanceof Error ? error.message : String(error)}`, |
There was a problem hiding this comment.
Handle non-ENOENT errors when checking rescue config
This catch turns every fs.access failure into "config missing", so permission/I/O errors (for example EACCES) are treated the same as ENOENT. In that case onboarding skips ownership validation and proceeds as if no rescue profile exists, which can lead to overwriting an existing unmanaged rescue profile or producing confusing follow-on failures instead of failing fast with the real access error.
Useful? React with 👍 / 👎.
|
Addressed two remaining review items in 78a057e:
Also added regression tests in:
Validation in this sandbox is blocked by dependency install DNS failures ( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 78a057ea6e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| env[key] = value; | ||
| } | ||
| } | ||
| applyCliProfileEnv({ profile, env }); |
There was a problem hiding this comment.
Avoid dev port override in cross-profile watchdog env
buildRescueProfileEnv drops service-identity overrides for cross-profile runs, but then always calls applyCliProfileEnv; for target profile dev, that helper reintroduces OPENCLAW_GATEWAY_PORT=19001 when the variable is absent. In rescue watchdog runs monitoring a dev profile with a non-default configured gateway.port, resolveGatewayPort(cfg, env) will now prefer this injected env value and probe the wrong port, which can misclassify a healthy gateway as down and trigger unnecessary restart/repair actions.
Useful? React with 👍 / 👎.
AI-assisted: Yes (Codex). Fully tested locally.
Summary
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
openclaw onboardcan now offer an opt-in rescue watchdog during local onboarding.openclaw onboard --rescue-watchdogprovisions a second isolated rescue profile, managed gateway service, and rescue cron job.rescueWatchdogdetails when that option is enabled.Security Impact (required)
Yes/No): YesYes/No): YesYes/No): YesYes/No): YesYes/No): NoYes, explain risk + mitigation:This is an explicit opt-in feature that provisions a second local gateway profile with its own token, workspace, service, and cron job. The rescue profile does not clone channel/web config from the primary profile, does not deliver messages externally, and runs isolated cron checks against the primary profile only. Rescue auth-profile syncing now preserves rescue-local credentials while merging inherited primary credentials.
Repro + Verification
Environment
Steps
openclaw onboardwithout passing--rescue-watchdog.openclaw onboard --non-interactive --rescue-watchdog ....Expected
Actual
Evidence
All rescue-related tests (
onboard-rescue.test.ts,onboard-rescue.setup.test.ts,rescue-watchdog.test.ts,watchdog-shared.test.ts) pass. Unrelated test mock stabilization fixes are included.Human Verification (required)
What you personally verified (not just CI), and how:
--rescue-watchdogis forwarded in CLI mode and remainsundefinedwhen not explicitly passed.-rescueremain unsupported.cause.Review Conversations
Compatibility / Migration
Yes/No): YesYes/No): Yes (newrescueWatchdogconfig section, opt-in only)Yes/No): NoFailure Recovery (if this breaks)
Re-run onboarding without rescue, or remove the rescue profile/service and its cron job.
Rescue profile state dir (e.g.
~/.openclaw-rescue*) and its managed gateway service definition.Rescue config being written into the primary profile path, missing rescue prompt in interactive onboarding, or rescue auth profiles disappearing after onboarding reruns.
Risks and Mitigations
undefinedfor the unset case, and regression tests cover the prompt path.