Skip to content

fix(gateway): ignore inherited launchd env for respawn#85295

Merged
steipete merged 1 commit into
openclaw:mainfrom
YUHAO-corn:yuhao/fix-launchd-xpc-supervisor-85224
May 25, 2026
Merged

fix(gateway): ignore inherited launchd env for respawn#85295
steipete merged 1 commit into
openclaw:mainfrom
YUHAO-corn:yuhao/fix-launchd-xpc-supervisor-85224

Conversation

@YUHAO-corn

@YUHAO-corn YUHAO-corn commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: macOS gateway update respawn could treat inherited generic launchd env such as XPC_SERVICE_NAME as proof that the gateway itself is launchd-supervised.
  • Solution: narrow Darwin respawn-supervisor detection to the OpenClaw-owned OPENCLAW_LAUNCHD_LABEL.
  • What changed: generic launchd env vars remain in supervisor cleanup lists, but only OPENCLAW_LAUNCHD_LABEL now triggers Darwin launchd respawn supervision.
  • What did NOT change: systemd and Windows Scheduled Task detection are unchanged.

Motivation

  • Fixes a gateway lifecycle failure where a GUI-spawned gateway can exit expecting launchd to restart it, even though only the parent app's inherited launchd/XPC environment was present.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: Darwin respawn-supervisor detection no longer classifies inherited generic launchd env (XPC_SERVICE_NAME / LAUNCH_JOB_LABEL) as OpenClaw gateway supervision, while the OpenClaw-owned launchd marker still works.
  • Real environment tested: macOS local OpenClaw source checkout on darwin, branch yuhao/fix-launchd-xpc-supervisor-85224, commit 041c52846e611dd882328ae2a83bbd12538a7603.
  • Exact steps or command run after this patch:
node --import tsx -e 'const { detectRespawnSupervisor } = await import("./src/infra/supervisor-markers.ts"); const cases = [{name:"inherited XPC only", env:{XPC_SERVICE_NAME:"ai.openclaw.mac"}}, {name:"generic launch job only", env:{LAUNCH_JOB_LABEL:"ai.openclaw.gateway"}}, {name:"openclaw launchd marker", env:{OPENCLAW_LAUNCHD_LABEL:"ai.openclaw.gateway"}}]; console.log(`platform=${process.platform}`); for (const item of cases) console.log(`${item.name}: ${detectRespawnSupervisor(item.env, "darwin") ?? "null"}`);'
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):

Terminal capture from the patched checkout:

platform=darwin
inherited XPC only: null
generic launch job only: null
openclaw launchd marker: launchd
  • Observed result after fix: inherited XPC_SERVICE_NAME and LAUNCH_JOB_LABEL return null, so update respawn will not take the supervised-exit path from inherited parent launchd state. OPENCLAW_LAUNCHD_LABEL still returns launchd for real OpenClaw LaunchAgent services.
  • What was not tested: I did not run a full GUI-triggered update/restart against a live installed LaunchAgent; the proof validates the production supervisor detector behavior in the local macOS source checkout.
  • Before evidence (optional but encouraged): The previous implementation used all launchd hints for Darwin detection, so XPC_SERVICE_NAME alone matched the launchd hint list and returned launchd.

Root Cause (if applicable)

  • Root cause: detectRespawnSupervisor treated generic launchd/XPC environment variables as evidence that the current gateway process was launchd-supervised.
  • Missing detection / guardrail: There was no regression test for a gateway child process inheriting XPC_SERVICE_NAME from a launchd-managed parent app.
  • Contributing context (if known): Generated OpenClaw service environments already set the more specific OPENCLAW_LAUNCHD_LABEL, which is the safer marker for respawn supervision.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/infra/supervisor-markers.test.ts, src/infra/process-respawn.test.ts, src/daemon/service-env.test.ts.
  • Scenario the test should lock in: inherited macOS XPC_SERVICE_NAME and generic launch job env do not trigger supervised restart; OpenClaw-owned launchd labels still do.
  • Why this is the smallest reliable guardrail: the bug is in the supervisor detector and update respawn branch selection, so source-level tests cover the exact boundary without requiring a live 30-second restart cycle.
  • Existing test that already covers this (if any): existing launchd-positive tests covered real markers; this PR adds the inherited-env negative cases.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

macOS gateways spawned from a launchd-managed parent no longer exit under the false assumption that launchd will restart them unless OpenClaw's own launchd service marker is present.

Diagram (if applicable)

Before:
GUI parent launchd env -> gateway child sees XPC_SERVICE_NAME -> supervised exit -> no gateway restart

After:
GUI parent launchd env -> gateway child sees XPC_SERVICE_NAME -> unmanaged update respawn path -> gateway returns

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS (darwin)
  • Runtime/container: local Node/tsx source checkout
  • Model/provider: N/A
  • Integration/channel (if any): N/A
  • Relevant config (redacted): XPC_SERVICE_NAME=ai.openclaw.mac, LAUNCH_JOB_LABEL=ai.openclaw.gateway, OPENCLAW_LAUNCHD_LABEL=ai.openclaw.gateway

Steps

  1. Run the terminal capture command in the Real behavior proof section.
  2. Run the regression and changed-check commands listed below.

Expected

  • Generic inherited launchd env returns null.
  • OPENCLAW_LAUNCHD_LABEL returns launchd.

Actual

  • The patched checkout produced the terminal output shown above.

Evidence

  • node scripts/run-vitest.mjs src/cli/gateway-cli/run-loop.test.ts src/infra/supervisor-markers.test.ts src/infra/process-respawn.test.ts src/gateway/server-methods/update-managed-service-handoff.test.ts src/daemon/service-env.test.ts
  • pnpm check:changed
  • git diff --check

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 25, 2026, 12:49 PM ET / 16:49 UTC.

Summary
The branch updates gateway lifecycle tests to use OPENCLAW_LAUNCHD_LABEL for launchd-supervised handoff cases and adds macOS service-environment assertions for generated launchd markers.

PR surface: Tests +29. Total +29 across 2 files.

Reproducibility: no. high-confidence current-main reproduction remains. Source inspection shows current main ignores inherited XPC_SERVICE_NAME=ai.openclaw.mac, while the linked issue was source-reproducible against earlier shipped builds.

Review metrics: 1 noteworthy metric.

  • Runtime/config surface: 0 runtime/config files changed. The branch is currently a test-only guardrail despite a runtime-fix title, so maintainers should judge it as coverage for an already-fixed gateway path.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Refresh the real behavior proof against 480be425b373470033709be73d58bac8c6fe8968 with redacted terminal/log output.
  • Align the PR summary with current behavior: inherited app launchd/XPC env is ignored, while exact OpenClaw gateway launchd labels remain supervised.

Proof guidance:
Needs stronger real behavior proof before merge: The terminal proof is from older commit 041c52846e611dd882328ae2a83bbd12538a7603, while latest head is 480be425b373470033709be73d58bac8c6fe8968, and its LAUNCH_JOB_LABEL output no longer matches current source; refresh the PR body with redacted latest-head terminal/log output or use a maintainer proof override. After updating the PR body, ClawSweeper should re-review automatically; otherwise ask a maintainer to comment @clawsweeper re-review.

Risk before merge

  • The PR body and copied terminal proof are stale: they were captured at 041c52846e611dd882328ae2a83bbd12538a7603, while latest head is 480be425b373470033709be73d58bac8c6fe8968 and current source treats exact ai.openclaw.gateway launchd labels as supervised.
  • No full GUI-triggered update/restart against an installed LaunchAgent is shown; focused source proof may be enough for maintainers, but contributor proof currently needs refresh.

Maintainer options:

  1. Decide the mitigation before merge
    Land the narrow test guardrail once the PR body and latest-head proof match current behavior, or close it if maintainers consider existing main coverage sufficient.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
Needs refreshed latest-head proof or maintainer override; no automated code repair is needed for the current diff.

Security
Cleared: The diff changes only tests and does not alter dependencies, workflows, secrets handling, install scripts, or runtime execution paths.

Review details

Best possible solution:

Land the narrow test guardrail once the PR body and latest-head proof match current behavior, or close it if maintainers consider existing main coverage sufficient.

Do we have a high-confidence way to reproduce the issue?

No high-confidence current-main reproduction remains. Source inspection shows current main ignores inherited XPC_SERVICE_NAME=ai.openclaw.mac, while the linked issue was source-reproducible against earlier shipped builds.

Is this the best way to solve the issue?

Yes for a test-only guardrail: the patch is narrow and targets relevant gateway lifecycle coverage. The PR body should be refreshed because the runtime fix is already on current main and exact gateway launchd labels are intentionally still supervised.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5cfa577778df.

Label changes

Label justifications:

  • P1: The underlying macOS gateway respawn bug can leave users without a restarted gateway after update or restart.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The terminal proof is from older commit 041c52846e611dd882328ae2a83bbd12538a7603, while latest head is 480be425b373470033709be73d58bac8c6fe8968, and its LAUNCH_JOB_LABEL output no longer matches current source; refresh the PR body with redacted latest-head terminal/log output or use a maintainer proof override. After updating the PR body, ClawSweeper should re-review automatically; otherwise ask a maintainer to comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Tests +29. Total +29 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 0 0 0 0
Tests 2 33 4 +29
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 33 4 +29

What I checked:

  • PR diff is test-only: Live pull-files API for head 480be425b373470033709be73d58bac8c6fe8968 shows only src/cli/gateway-cli/run-loop.test.ts and src/daemon/service-env.test.ts changed; no runtime gateway code changes are in the current diff. (480be425b373)
  • Current detector behavior: Current main detects Darwin launchd supervision from OPENCLAW_LAUNCHD_LABEL or the exact OpenClaw gateway launchd job, rather than any inherited generic launchd variable. (src/infra/supervisor-markers.ts:43, 5cfa577778df)
  • Current inherited-env guardrail: Current main already tests that inherited XPC_SERVICE_NAME=ai.openclaw.mac returns null while exact gateway labels still return launchd. (src/infra/supervisor-markers.test.ts:16, 5cfa577778df)
  • Current respawn guardrail: Current process-respawn tests cover the inherited XPC_SERVICE_NAME=ai.openclaw.mac case taking the unmanaged in-process path instead of supervised launchd exit. (src/infra/process-respawn.test.ts:150, 5cfa577778df)
  • Service environment marker source: Current service environment generation sets OPENCLAW_LAUNCHD_LABEL for macOS gateway and node services, which is the marker the PR’s added tests lock down. (src/daemon/service-env.ts:412, 5cfa577778df)
  • Real proof mismatch: The PR body’s terminal proof is still from commit 041c52846e611dd882328ae2a83bbd12538a7603 and says LAUNCH_JOB_LABEL=ai.openclaw.gateway returns null, but latest head is 480be425b373470033709be73d58bac8c6fe8968 and current source intentionally treats the exact gateway label as supervised. (480be425b373)

Likely related people:

  • steipete: Current-main blame ties the supervisor-marker and service-env implementation to Peter Steinberger, and the latest PR head was committed by steipete while touching the same gateway lifecycle test surface. (role: recent area contributor and PR-head committer; confidence: high; commits: d63e8d4b4fe3, 480be425b373; files: src/infra/supervisor-markers.ts, src/daemon/service-env.ts, src/cli/gateway-cli/run-loop.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 22, 2026
@YUHAO-corn YUHAO-corn force-pushed the yuhao/fix-launchd-xpc-supervisor-85224 branch from f4949de to 041c528 Compare May 22, 2026 09:43
@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@openclaw-barnacle openclaw-barnacle Bot added cli CLI command changes and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 22, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@YUHAO-corn YUHAO-corn force-pushed the yuhao/fix-launchd-xpc-supervisor-85224 branch from 041c528 to 5cc0933 Compare May 25, 2026 15:40
@clawsweeper clawsweeper Bot removed merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 25, 2026
@steipete steipete force-pushed the yuhao/fix-launchd-xpc-supervisor-85224 branch from 5cc0933 to 2030665 Compare May 25, 2026 16:14
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 25, 2026
@steipete steipete force-pushed the yuhao/fix-launchd-xpc-supervisor-85224 branch from 2030665 to 480be42 Compare May 25, 2026 16:44
@steipete

Copy link
Copy Markdown
Contributor

Verification before landing:

  • Rebased head: 480be42
  • Focused proof: node scripts/run-vitest.mjs src/cli/gateway-cli/run-loop.test.ts src/daemon/service-env.test.ts --reporter=dot (121 tests passed)
  • Auto Review: clean, no accepted/actionable findings
  • GitHub CI: green on the exact pushed head
  • Current main merge sanity: clean merge-tree against origin/main

Thanks @YUHAO-corn!

@steipete steipete merged commit 177ebdc into openclaw:main May 25, 2026
98 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli CLI command changes gateway Gateway runtime P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: XS status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

respawnGatewayProcessForUpdate falsely reports mode=supervised on macOS when XPC_SERVICE_NAME is inherited from a launchd-managed parent

2 participants