Skip to content

fix(infra): actively kickstart launchd on supervised gateway restart#29078

Closed
cathrynlavery wants to merge 1 commit intoopenclaw:mainfrom
cathrynlavery:fix/gateway-supervised-restart-kickstart-v2
Closed

fix(infra): actively kickstart launchd on supervised gateway restart#29078
cathrynlavery wants to merge 1 commit intoopenclaw:mainfrom
cathrynlavery:fix/gateway-supervised-restart-kickstart-v2

Conversation

@cathrynlavery
Copy link
Contributor

@cathrynlavery cathrynlavery commented Feb 27, 2026

Summary

  • Problem: When an agent triggers a gateway restart in supervised (launchd) mode, the process exits expecting KeepAlive: true to respawn it. But launchd's ThrottleInterval (60s since fix(gateway): add ThrottleInterval to prevent launchd restart loop #27650) delays the restart, leaving the gateway unresponsive for a full minute on every intentional restart.
  • Why it matters: Agent-triggered restarts (via /restart or gateway-tool) should be near-instant. The 60s ThrottleInterval is correct for crash-loop prevention (fix(gateway): add ThrottleInterval to prevent launchd restart loop #27650), but penalizes deliberate restarts unnecessarily.
  • What changed: process-respawn.ts now calls triggerOpenClawRestart() (explicit launchctl kickstart -k) on macOS when OPENCLAW_LAUNCHD_LABEL is set, before returning mode: "supervised". Falls back to mode: "failed" on kickstart error so run-loop.ts can do in-process restart.
  • What did NOT change (scope boundary): No changes to restart.ts, run-loop.ts, plist generation, or ThrottleInterval value. No new shell commands — reuses existing triggerOpenClawRestart().

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Agent-triggered restarts on macOS now come back within seconds instead of waiting 60s (ThrottleInterval). No config changes required.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No — reuses existing triggerOpenClawRestart() which already calls launchctl kickstart
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS (Apple Silicon, Sequoia 26.2)
  • Runtime/container: Node.js via launchd LaunchAgent
  • Model/provider: Any
  • Integration/channel (if any): Any — triggered by agent /restart command
  • Relevant config (redacted): LaunchAgent plist with KeepAlive: true, ThrottleInterval: 60

Steps

  1. Gateway running via launchd LaunchAgent with KeepAlive: true and ThrottleInterval: 60
  2. Agent triggers /restart or gateway-tool restart
  3. process-respawn.ts detects supervised mode, exits with code 0

Expected

  • Gateway restarts within seconds via explicit launchctl kickstart -k

Actual (before fix)

  • Gateway exits and waits 60s for launchd ThrottleInterval before restarting

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

pnpm test:fast src/infra/process-respawn.test.ts — 9/9 passing (including 2 new kickstart tests + updated stock-plist test).

Human Verification (required)

  • Verified scenarios: Patched build installed locally, gateway running with patched code, OPENCLAW_LAUNCHD_LABEL confirmed in plist
  • Edge cases checked: Kickstart failure path (bad label → falls back to mode: "failed"); non-macOS platforms skip kickstart; missing OPENCLAW_LAUNCHD_LABEL skips kickstart
  • What you did not verify: Live agent-triggered /restart end-to-end (ready for testing now — patched build is installed and gateway is running)

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? NoOPENCLAW_LAUNCHD_LABEL is already set by the launchd plist
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert single commit; or unset OPENCLAW_LAUNCHD_LABEL env to skip the new code path entirely
  • Files/config to restore: src/infra/process-respawn.ts only
  • Known bad symptoms reviewers should watch for: Gateway not restarting at all (kickstart returns error but mode: "failed" fallback not triggering in run-loop.ts)

Risks and Mitigations

  • Risk: triggerOpenClawRestart() could fail silently on edge-case launchd configs
    • Mitigation: Failure returns mode: "failed" with detail string; run-loop.ts already handles this with in-process restart fallback + warning log

🤖 Generated with Claude Code

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 27, 2026

Greptile Summary

Actively triggers launchctl kickstart -k before exiting on macOS supervised restarts to ensure immediate gateway restart instead of waiting for launchd's ThrottleInterval (10-60s delay).

Key changes:

  • src/infra/process-respawn.ts: When OPENCLAW_LAUNCHD_LABEL is set on macOS, explicitly calls triggerOpenClawRestart() before returning mode: "supervised"
  • Falls back to mode: "failed" if kickstart fails, allowing run-loop.ts to perform in-process restart instead of leaving gateway dead
  • Test coverage includes success path, failure fallback, and platform-specific behavior

How it works:
The supervised restart flow now actively kicks launchd (launchctl kickstart -k) rather than passively relying on KeepAlive: true, which was subject to throttle delays. If kickstart fails (bad label, permission issues), the system gracefully falls back to in-process restart.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is well-isolated to macOS launchd restart behavior, has comprehensive test coverage including failure paths, proper fallback handling when kickstart fails, and reuses existing triggerOpenClawRestart() infrastructure. The implementation only activates when OPENCLAW_LAUNCHD_LABEL is explicitly set, leaving other platforms and scenarios unaffected. The PR solves a real user pain point (gateways staying dead after restart) without introducing new risk vectors.
  • No files require special attention

Last reviewed commit: c673e2a

@cathrynlavery
Copy link
Contributor Author

CI note: The check job failure is a pre-existing tsgo type error in src/agents/pi-embedded-runner-extraparams.test.ts:869 — same error reproduces on a clean origin/main checkout locally. Not caused by this PR's changes (which only touch src/infra/process-respawn.ts and its test file).

src/agents/pi-embedded-runner-extraparams.test.ts(869,14): error TS2352: Conversion of type '...' to type 'Model<"openai-responses">' may be a mistake...

When an agent triggers a gateway restart in supervised mode, the process
exits expecting launchd KeepAlive to respawn it. But ThrottleInterval
(default 10s, or 60s on older installs) can delay or prevent restart.

Now calls triggerOpenClawRestart() to issue an explicit launchctl
kickstart before exiting, ensuring immediate respawn. Falls back to
in-process restart if kickstart fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cathrynlavery cathrynlavery force-pushed the fix/gateway-supervised-restart-kickstart-v2 branch from c673e2a to 0f85d17 Compare February 27, 2026 19:53
steipete added a commit that referenced this pull request Feb 27, 2026
Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
@steipete steipete closed this Feb 27, 2026
@steipete
Copy link
Contributor

Landed on main and closed.

What we changed on landing:

  • Kept the launchd kickstart behavior for supervised macOS restarts.
  • Fixed fallback semantics: if kickstart cannot be confirmed, restartGatewayProcessWithFreshPid() now returns mode: "failed" so run-loop performs in-process restart fallback.
  • Updated process-respawn tests for the failure path.
  • Added changelog entry with contributor + PR reference.

Validation run before commit:

  • pnpm lint
  • pnpm build
  • pnpm test

SHAs:

  • Original PR head: 0f85d1710e943d638e79a80d4fc9511832ab06f9
  • Landed on main: 4aa2dc685

Thanks @cathrynlavery for the contribution.

jalehman pushed a commit to rodrigouroz/openclaw that referenced this pull request Feb 27, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
velvet-shark pushed a commit to lailoo/openclaw that referenced this pull request Feb 27, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
r4jiv007 pushed a commit to r4jiv007/openclaw that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
xiexikang pushed a commit to cclawd007/cclawd that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
mylukin pushed a commit to mylukin/openclaw that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
wanjizheng pushed a commit to wanjizheng/openclaw that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
wanjizheng pushed a commit to wanjizheng/openclaw that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
(cherry picked from commit bc28f15)
wanjizheng pushed a commit to wanjizheng/openclaw that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
(cherry picked from commit bc28f15)
wanjizheng pushed a commit to wanjizheng/openclaw that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
(cherry picked from commit bc28f15)
vincentkoc pushed a commit to Sid-Qin/openclaw that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
vincentkoc pushed a commit to rylena/rylen-openclaw that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
newtontech pushed a commit to newtontech/openclaw-fork that referenced this pull request Feb 28, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
wanjizheng pushed a commit to wanjizheng/openclaw that referenced this pull request Mar 1, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
wanjizheng pushed a commit to wanjizheng/openclaw that referenced this pull request Mar 1, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
steipete added a commit to Sid-Qin/openclaw that referenced this pull request Mar 2, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
safzanpirani pushed a commit to safzanpirani/clawdbot that referenced this pull request Mar 2, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
steipete added a commit to Sid-Qin/openclaw that referenced this pull request Mar 2, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
venjiang pushed a commit to venjiang/openclaw that referenced this pull request Mar 2, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
robertchang-ga pushed a commit to robertchang-ga/openclaw that referenced this pull request Mar 2, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
execute008 pushed a commit to execute008/openclaw that referenced this pull request Mar 2, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
dorgonman pushed a commit to kanohorizonia/openclaw that referenced this pull request Mar 3, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
execute008 pushed a commit to execute008/openclaw that referenced this pull request Mar 2, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
dorgonman pushed a commit to kanohorizonia/openclaw that referenced this pull request Mar 3, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
sachinkundu pushed a commit to sachinkundu/openclaw that referenced this pull request Mar 6, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
…back

Co-authored-by: Cathryn Lavery <cathryn@littlemight.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: gateway restart race condition causes launchd restart loop (macOS)

2 participants