fix: resolve gateway infinite restart loop (zombie PID + lock race) by jeffwnli · Pull Request #23416 · openclaw/openclaw

jeffwnli · 2026-02-22T09:16:39Z

Problem

When a gateway restart is triggered via SIGUSR1, the process spawns a detached child and calls process.exit(0). This bypasses the outer finally block, leaving a stale lock file on disk pointing to the now-exiting parent PID. The spawned child tries to acquire the lock, calls isPidAlive() which returns true for zombie processes (because kill(pid, 0) succeeds on zombies), and times out after 5 seconds. Meanwhile, systemd's Restart=always starts a competing process, creating a self-sustaining loop.

Fixes #21685
Fixes #7999

Root Causes & Fixes

Three independent, layered fixes — each breaks the loop on its own, all three together provide defense in depth:

1. Zombie process detection in `isPidAlive` (`src/shared/pid-alive.ts`)

kill(pid, 0) succeeds for zombie processes, causing the lock to treat a zombie lock owner as alive. Added /proc/<pid>/status check on Linux to detect Z (zombie) state and return false.

2. Release lock before `process.exit()` (`src/cli/gateway-cli/run-loop.ts`)

process.exit() inside an async IIFE bypasses the outer try/finally that releases the gateway lock. Added explicit await lock?.release() before calling exit(0) in both the restart-spawned and stop code paths.

3. Release lock before spawning the child (`src/cli/gateway-cli/run-loop.ts`)

Moved lock.release() to before restartGatewayProcessWithFreshPid() so the child can immediately acquire the lock without any race window against the parent.

4. Guard `entry.ts` top-level code with `isMainModule` (`src/entry.ts`)

The bundler exports shared symbols from dist/entry.js, causing lazy imports to re-execute its unguarded top-level runCli(process.argv) call, which starts a duplicate gateway that fails on lock/port contention and exits with code 1 — triggering another restart cycle.

Impact

Fix	Scope
Zombie detection	`isPidAlive` — all lock consumers
Lock before exit	`run-loop.ts` — restart + stop paths
Lock before spawn	`run-loop.ts` — restart path only
isMainModule guard	`entry.ts` — prevents duplicate gateway on import

Tests

New: src/shared/pid-alive.test.ts — covers zombie detection, invalid PIDs, running processes
Updated: src/cli/gateway-cli/run-loop.test.ts — lock release assertions for restart and stop paths

Greptile Summary

This PR fixes a critical infinite restart loop caused by three interconnected race conditions during gateway restart. The fix implements defense-in-depth with four independent changes:

Zombie PID detection (pid-alive.ts): Added Linux /proc/<pid>/status check to detect zombie processes, preventing isPidAlive from incorrectly returning true for zombies
Lock release before exit (run-loop.ts): Explicitly releases gateway lock before process.exit(0) in both restart and stop paths, preventing lock file from persisting after process exit
Lock release before spawn (run-loop.ts): Releases lock before spawning the restart child process, eliminating the race window where the child waits for the parent's zombie to be reaped
Entry point guard (entry.ts): Wraps top-level code with isMainModule check to prevent duplicate gateway startup when entry.js is imported as a shared dependency

All fixes are well-tested with comprehensive unit tests covering zombie detection, lock release ordering, and edge cases. The implementation is clean, follows the codebase patterns, and each fix independently breaks the restart loop while together providing robust defense.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
All four fixes are independently sound and work together for defense-in-depth. The zombie detection is Linux-specific and safely returns false on other platforms. Lock release ordering is explicitly tested and prevents the race condition. The isMainModule guard prevents duplicate gateway startup without affecting normal execution. Comprehensive test coverage validates all critical paths including zombie detection, lock release ordering, and edge cases.
No files require special attention

_{Last reviewed commit: aad980c}

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

kill(pid, 0) succeeds for zombie processes, causing the gateway lock to treat a zombie lock owner as alive. Read /proc/<pid>/status on Linux to check for 'Z' (zombie) state before reporting the process as alive. This prevents the lock from being held indefinitely by a zombie process during gateway restart. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

process.exit() called from inside an async IIFE bypasses the outer try/finally block that releases the gateway lock. This leaves a stale lock file pointing to a zombie PID, preventing the spawned child or systemctl restart from acquiring the lock. Release the lock explicitly before calling exit in both the restart-spawned and stop code paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move lock.release() before restartGatewayProcessWithFreshPid() so the spawned child can immediately acquire the lock without racing against a zombie parent. This eliminates the root cause of the restart loop where the child times out waiting for a lock held by its now-dead parent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…cate gateway start The bundler exports shared symbols from dist/entry.js, so other chunks import it as a dependency. When dist/index.js is the actual entry point (e.g. systemd service), lazy module loading eventually imports entry.js, triggering its unguarded top-level code which calls runCli(process.argv) a second time. This starts a duplicate gateway that fails on lock/port contention and crashes the process with exit(1), causing a restart loop. Wrap all top-level executable code in an isMainModule() check so it only runs when entry.ts is the actual main module, not when imported as a shared dependency by the bundler.

…jeffwnli

…jeffwnli)

steipete · 2026-02-22T09:38:46Z

Landed via temp rebase onto main.

Gate: pnpm check && pnpm build && pnpm test && pnpm check:docs (plus post-rebase pnpm check and focused vitest)
Land commit: 5573517
Merge commit: PLACEHOLDER_MERGE_SHA

Thanks @jeffwnli!

steipete · 2026-02-22T09:38:54Z

Correction with exact SHAs:

Land commit: 5573517
Merge commit: dd07c06

Thanks @jeffwnli!

@jeffwnli

* test(cli): use lightweight clears in daemon lifecycle setup * test(models): use lightweight clears in shared config setup * test(agents): use lightweight clears for stable subagent announce defaults * Sessions: persist prompt-token totals without usage * fix(security): normalize hook auth rate-limit client keys * refactor(cli): dedupe skills command report loading * refactor(cli): dedupe channel auth resolution flow * refactor(cli): dedupe allowlist command wiring * test(cli): dedupe update restart fallback scenario setup * test(cli): dedupe cron shared test fixtures * refactor(cli): extract fish completion line builders * test(cli): share nodes ios fixture helpers * refactor(cli): share npm install metadata helpers * refactor(cli): share pinned npm install record helper * refactor(slack): dedupe modal lifecycle interaction handlers * refactor(commands): share preview streaming migration logic * test(gateway): reuse last agent command assertion helper * test(discord): share provider lifecycle test harness * test(discord): share thread binding sweep fixtures * test(infra): dedupe shell env fallback test setup * refactor(discord): dedupe voice command runtime checks * test(discord): share model picker fallback fixtures * test(discord): share message handler draft fixtures * test(discord): share resolve-users guild probe fixture * test(inbound): share dispatch capture mock across channels * test(security): dedupe external marker sanitization assertions * test(wizard): share onboarding prompter scaffold * test(memory): share memory-tool manager mock fixture * test(subagents): dedupe focus thread setup fixtures * test(auth-profiles): dedupe cleared-state assertions * test(memory): share short-timeout test helper * test(outbound): share resolveOutboundTarget test suite * test(auth-profiles): dedupe oauth mode resolution setup * test(gateway): dedupe transcript seed fixtures in fs session tests * refactor(text): share code-region parsing for reasoning tags * refactor(node-host): share invoke type definitions * refactor(logging): share node createRequire resolution * test(models): dedupe auth-sync command assertions * test(pi): share overflow-compaction test setup * test(discord): dedupe guild permission route mocks * refactor(config): dedupe legacy stream-mode migration paths * test(gateway): dedupe tailscale header auth fixtures * test(browser): dedupe relay probe server scaffolding * test(cron): dedupe delivered-status run scaffolding * test(gateway): dedupe control-ui not-found fixture assertions * test(gateway): dedupe openai context assertions * test(config): dedupe traversal include assertions * test(config): dedupe nested redaction round-trip assertions * test(gateway): reuse shared openai timeout e2e helpers * test(gateway): dedupe chat history transcript helpers * test(gateway): dedupe canvas ws connect assertions * test(hooks): dedupe unsupported npm spec assertion * test(agent): reuse isolated agent mock setup * test(utils): share temp-dir helper across cli and web tests * test(browser): dedupe generated-token persistence assertions * test(browser): dedupe pw-session playwright mock wiring * test(agents): dedupe spawn-hook wait mocks and add readiness error coverage * test(agents): dedupe sanitize-session-history copilot fixtures * test: dedupe lifecycle oauth and prompt-limit fixtures * refactor(agents): share volc model catalog helpers * refactor(agents): reuse shared tool-policy base helpers * refactor: eliminate remaining duplicate blocks across draft streams and tests * refactor(core): dedupe gateway runtime and config tests * refactor(channels): dedupe message routing and telegram helpers * refactor(agents): dedupe plugin hooks and test helpers * chore: remove dead plugin hook loader * fix(security): harden gateway command/audit guardrails * test: dedupe telegram draft stream setup and extend state-dir env coverage * Agents: drop stale pre-compaction usage snapshots * docs(changelog): note next npm release for hook auth fix * test(telegram): dedupe native-command test setup * fix(gateway): block avatar symlink escapes * test: dedupe cron and slack monitor test harness setup * refactor(security): unify hook rate-limit and hook module loading * test(gateway): dedupe loopback cases and trim setup resets * test(agents): use lightweight clears in supervisor and session-status setup * test(auto-reply): centralize subagent command test reset setup * test(agents): centralize sessions tool gateway mock reset * test(telegram): centralize native command session-meta mock setup * test(browser): use lightweight clears in server lifecycle setup * test(gateway): use lightweight clears in cron service setup * test(commands): use lightweight clears in doctor memory search setup * test(outbound): dedupe shared setup hooks in message e2e * test(gateway): use lightweight clears in push handler setup * test(gateway): use lightweight clears in node invoke wake setup * test(gateway): use lightweight clears in node event setup * test(gateway): use lightweight clears for hook cron run fences * test(auto-reply): use lightweight clears in dispatch setup * test(agents): use lightweight clears in sandbox browser create setup * test(auto-reply): use lightweight clears in agent runner setup * test(plugins): use lightweight clears in wired hooks setup * test(gateway): use lightweight clears in client close setup * test(ui): use lightweight clears in theme and telegram media retry setup * test(agents): use lightweight clears in skills install e2e setup * test(gateway): use lightweight clears for chat-b reply spy fences * test(gateway): use lightweight clears for openai http agent fences * test(gateway): use lightweight clears for openresponses agent fences * test(core): use lightweight clears in update, child adapter, and copilot token setup * test(agents): dedupe sessions_spawn e2e reset setup * test(core): use lightweight clears in stable mock setup * test(agents): dedupe sessions_spawn allowlist reset setup * test(agents): drop redundant subagent registry cleanups * test(core): trim redundant mock resets in heartbeat suites * test(daemon): use lightweight clears in systemd mocks * test(infra): use lightweight clears in update startup mocks * test(gateway): use lightweight clears in agent handler tests * test(infra): use lightweight clears in message action threading setup * test(telegram): use lightweight clears in media handler setup * test(commands): use lightweight clears in agents/channels setup * fix: align draft/outbound typings and tests * test: stabilize pw-session cdp mocking in parallel runs * chore(docs): normalize security finding table formatting * fix(ci): add explicit mock types in pw-session mock setup * test(core): use lightweight clears in command and dispatch setup * test(agents): use lightweight clears in skills/sandbox setup * test(core): use lightweight clears in subagent and browser setup * test(core): use lightweight clears in runtime and telegram setup * test(core): trim redundant test resets and use mockClear * test(slack): use lightweight clear in interactions modal-close case * test(slack): avoid redundant reset in slash metadata wait case * test(reply): replace heavy resets in media and runner helper specs * test(agents): reduce reset overhead in session visibility and hooks specs * test(subagents): lighten session delete mock reset in announce spec * test(memory): prefer clear over reset in qmd spawn setup * test(agents): keep targeted resets minimal in overflow retry spec * chore: remove verified dead code paths * test(core): reduce mock reset overhead across unit and e2e specs * Agents: add fallback reply for tool-only completions * test(core): trim reset usage in gateway and install source specs * test(commands): use lightweight clears in config snapshot specs * refactor(gateway)!: remove legacy v1 device-auth handshake * test(subagents): use lightweight clears in sessions spawn suites * test(core): continue mock reset reductions in auth, gateway, npm install * test(core): continue reset-to-clear cleanup in subagent focus and web fetch * test(config): use lightweight clear in session pruning e2e setup * test(core): reduce reset overhead in messaging and agent e2e mocks * test(core): tighten reset usage in auth, registry restart, and memory search * fix: decouple owner display secret from gateway auth token * chore: remove dead macos relay and daemon code * test(core): use lightweight clear in cron, claude runner, and telegram delivery specs * Agents/Subagents: honor subagent alsoAllow grants * test(core): reduce mock reset overhead in targeted suites * fix(security): block HOME and ZDOTDIR env override injection * test(core): dedupe auth rotation and credential injection specs * test(agents): dedupe subagent announce direct-send variants * docs(changelog): add shell startup env override fix note * chore(test): make shell-env trusted-shell assertion platform-aware * test(commands): dedupe subagent status assertions * fix: harden exec allowlist wrapper resolution * test(agents): avoid full mock resets in cli credential specs * chore(test): harden models status mock restoration * test(core): dedupe command gating and trim announce reset overhead * test(agents): unify hook thread-target announce assertions * test(agents): collapse repeated announce direct-send scenarios * test(reply): merge duplicate runReplyAgent streaming and fallback cases * test(agents): use lightweight clear for active-run announce mock * test(agents): remove overflow compaction mock reset dependency * test(reply): use lightweight clears for runner-level mocks * test(agents): consolidate repeated announce deferral and fallback matrices * test(commands): replace subagent gateway reset with lightweight clear * TUI: preserve RTL text order in terminal output * docs(security): clarify dangerous control-ui bypass policy * feat(security): warn on dangerous config flags at startup * perf(test): bypass queue debounce in fast mode and tighten announce defaults * fix(security): harden channel token and id generation * refactor(security): unify secure id paths and guard weak patterns * fix(gateway): remove hello-ok host and commit fields * fix(security): block hook transform symlink escapes * refactor: unify exec wrapper resolution and parity fixtures * TUI: make Ctrl+C exit behavior reliably responsive * test(heartbeat): dedupe sandbox/session helpers and collapse ack cases * test(agents): simplify subagent announce suite imports and call assertions * test(heartbeat): reuse shared temp sandbox in model override suite * test(heartbeat): reuse shared sandbox for ghost reminder scenarios * perf(test): compact heartbeat session fixture writes * perf(test): shrink subagent announce fast-mode settle waits * fix: use SID-based ACL classification for non-English Windows * fix: detect zombie processes in isPidAlive on Linux kill(pid, 0) succeeds for zombie processes, causing the gateway lock to treat a zombie lock owner as alive. Read /proc/<pid>/status on Linux to check for 'Z' (zombie) state before reporting the process as alive. This prevents the lock from being held indefinitely by a zombie process during gateway restart. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: release gateway lock before process.exit in run-loop process.exit() called from inside an async IIFE bypasses the outer try/finally block that releases the gateway lock. This leaves a stale lock file pointing to a zombie PID, preventing the spawned child or systemctl restart from acquiring the lock. Release the lock explicitly before calling exit in both the restart-spawned and stop code paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: release gateway lock before spawning restart child Move lock.release() before restartGatewayProcessWithFreshPid() so the spawned child can immediately acquire the lock without racing against a zombie parent. This eliminates the root cause of the restart loop where the child times out waiting for a lock held by its now-dead parent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard entry.ts top-level code with isMainModule to prevent duplicate gateway start The bundler exports shared symbols from dist/entry.js, so other chunks import it as a dependency. When dist/index.js is the actual entry point (e.g. systemd service), lazy module loading eventually imports entry.js, triggering its unguarded top-level code which calls runCli(process.argv) a second time. This starts a duplicate gateway that fails on lock/port contention and crashes the process with exit(1), causing a restart loop. Wrap all top-level executable code in an isMainModule() check so it only runs when entry.ts is the actual main module, not when imported as a shared dependency by the bundler. * fix: tighten gateway restart loop handling (openclaw#23416) (thanks @jeffwnli) * chore: fix temp-path guard skip for *.test-helpers.ts * fix: include modelByChannel in config validator allowedChannels The hand-written config validator rejects `channels.modelByChannel` as "unknown channel id: modelByChannel" even though the Zod schema, TypeScript types, runtime code, and CLI docs all treat it as valid. The `defaults` meta-key was already whitelisted but `modelByChannel` was missed when the feature was added in 2026.2.21. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * also skip modelByChannel in plugin-auto-enable channel iteration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cover channels.modelByChannel validation/auto-enable * fix: finalize modelByChannel validator landing (openclaw#23412) (thanks @ProspectOre) * refactor: simplify windows ACL parsing and expand coverage * refactor(gateway): simplify restart flow and expand lock tests * refactor(plugin-sdk): unify channel dedupe primitives * fix(acp): wait for gateway connection before processing ACP messages - Move gateway.start() before AgentSideConnection creation - Wait for hello message to confirm connection is established - This fixes issues where messages were processed before gateway was ready Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden ACP gateway startup sequencing (openclaw#23390) (thanks @janckerchen) * Memory/QMD: normalize Han-script BM25 search queries * fix(stability): patch regex retries and timeout abort handling * fix: handle intentional signal daemon shutdown on abort (openclaw#23379) (thanks @frankekn) * refactor(signal): extract daemon lifecycle and typed exit handling * Exec: fail closed when sandbox host is unavailable * fix: harden exec sandbox fallback semantics (openclaw#23398) (thanks @bmendonca3) * test: stabilize temp-path guard across runtimes (openclaw#23398) * test: harden temp path guard detection (openclaw#23398) * fix(feishu): avoid template tmpdir join in dedup state path (openclaw#23398) * feat(feishu): persistent message deduplication to prevent duplicate replies Closes openclaw#23369 Feishu may redeliver the same message during WebSocket reconnects or process restarts. The existing in-memory dedup map is lost on restart, so duplicates slip through. This adds a dual-layer dedup strategy: - Memory cache (fast synchronous path, unchanged capacity) - Filesystem store (~/.openclaw/feishu/dedup/) that survives restarts TTL is extended from 30 min to 24 h. Disk writes use atomic rename and probabilistic cleanup to keep each per-account file under 10 k entries. Disk errors are caught and logged — message handling falls back to memory-only behaviour so it is never blocked. * fix(feishu): address dedup race condition, namespace isolation, and cache staleness - Prefix memoryCache keys with namespace to prevent cross-account false positives when different accounts receive the same message_id - Add inflight tracking map to prevent TOCTOU race where concurrent async calls for the same message both pass the check and both proceed - Remove expired-entry deletion from has() to avoid silent cache/disk divergence; actual cleanup happens probabilistically inside record() - Add time-based cache invalidation (30s) to DedupStore.load() so external writes are eventually picked up - Refresh cacheLoadedAt after flush() so we don't immediately re-read data we just wrote Co-authored-by: Cursor <cursoragent@cursor.com> * fix: tighten feishu dedupe boundary (openclaw#23377) (thanks @SidQin-cyber) * Feat/logger support log level validation0222 (openclaw#23436) * 1、环境变量**：新增 `OPENCLAW_LOG_LEVEL`，可取值 `silent|fatal|error|warn|info|debug|trace`。设置后同时覆盖**文件日志**与**控制台**的级别，优先级高于配置文件。 2、启动参数**：在 `openclaw gateway run` 上新增 `--log-level <level>`，对该次进程同时生效于文件与控制台；未传时仍使用环境变量或配置文件。 * fix(logging): make log-level override global and precedence-safe --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> * fix(telegram): prevent update offset skipping queued updates (openclaw#23284) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 92efaf9 Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com> Co-authored-by: obviyus <22031114+obviyus@users.noreply.github.com> Reviewed-by: @obviyus * fix: stop hardcoded channel fallback and auto-pick sole configured channel (openclaw#23357) (thanks @lbo728) Co-authored-by: lbo728 <extreme0728@gmail.com> * docs(security): clarify workspace memory trust boundary * Security: expand audit checks for mDNS and real-IP fallback * fix: land security audit severity + temp-path guard fixes (openclaw#23428) (thanks @bmendonca3) * test(heartbeat): use shared sandbox in sender target suite * perf(test): compact remaining heartbeat fixture writes * test(reply): align native trigger suite with fast-test fixture patterns * perf(test): speed subagent announce retry polling in fast mode * test(agents): dedupe auth profile rotation fixture setup * perf(test): trim background abort settle waits and dedupe cmd fixture * perf(test): trim nested subagent output wait floor in fast mode * perf(test): lower fast-mode nested output wait floor to 80ms * test(agents): remove dead shell-timeout override in safeBins suite * perf(test): lower fast-mode nested output wait floor to 70ms * perf(test): remove flaky transport timeout and dedupe safeBins checks * perf(test): mock compact module in auth rotation e2e * perf(test): reduce subagent announce fast-mode polling waits * perf(test): lower subagent fast-mode wait floors * perf(test): trim bash e2e sleep and poll windows * perf(test): narrow pi-embedded runner e2e import path * test: reclassify mocked runner/safe-bins suites as unit tests * test: reclassify auth-profile-rotation suite as unit test * test: reclassify mocked announce and sandbox suites as unit tests * perf(test): tighten background abort timing windows * test: reclassify sandbox merge and exec path suites as unit tests * perf(test): speed up sessions_spawn lifecycle suite setup * test: reclassify sessions_spawn lifecycle suite as unit test * perf(test): reduce bash e2e wait windows * fix(gateway): strip directive tags from non-streaming webchat broadcasts Closes openclaw#23053 The streaming path already strips [[reply_to_current]] and other directive tags via stripInlineDirectiveTagsForDisplay, but the non-streaming broadcastChatFinal path and the chat.inject path sent raw message content to webchat clients, causing tags to appear in rendered messages after streaming completes. * fix: add non-streaming directive-tag regression tests (openclaw#23298) (thanks @SidQin-cyber) * test: reclassify skills suites from e2e to unit lane * test: reclassify models-config suites from e2e to unit lane * test: harden models-config env isolation list * refactor: clarify strict loopback proxy audit rules * fix(session): resolve agent session path with configured sessions dir Co-authored-by: David Rudduck <david@rudduck.org.au> * fix(telegram): classify undici fetch errors as recoverable for retry (openclaw#16699) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 67b5bce Co-authored-by: Glucksberg <80581902+Glucksberg@users.noreply.github.com> Co-authored-by: obviyus <22031114+obviyus@users.noreply.github.com> Reviewed-by: @obviyus * fix(config): add missing comment field to BindingsSchema Strict validation (added in d1e9490) rejects the legitimate 'comment' field on bindings. This field is used for annotations in config files. Changes: - BindingsSchema: added comment: z.string().optional() - AgentBinding type: added comment?: string Fixes openclaw#23385 * fix: add bindings comment regression test (openclaw#23458) (thanks @echoVic) * fix(bluebubbles): treat null privateApiStatus as disabled, not enabled Bug: privateApiStatus cache expires after 10 minutes, returning null. The check '!== false' treats null as truthy, causing 500 errors when trying to use Private API features that aren't actually available. Root cause: In JavaScript, null !== false evaluates to true. Fix: Changed all checks from '!== false' to '=== true', so null (cache expired/unknown) is treated as disabled (safe default). Files changed: - extensions/bluebubbles/src/send.ts (line 376) - extensions/bluebubbles/src/monitor-processing.ts (line 423) - extensions/bluebubbles/src/attachments.ts (lines 210, 220) Fixes openclaw#23393 * fix: align BlueBubbles private-api null fallback + warning (openclaw#23459) (thanks @echoVic) * refactor(session): centralize transcript path option resolution * fix: add operator.read and operator.write to default CLI scopes (openclaw#22582) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 8569fc8 Co-authored-by: YuzuruS <1485195+YuzuruS@users.noreply.github.com> Co-authored-by: obviyus <22031114+obviyus@users.noreply.github.com> Reviewed-by: @obviyus * feat(workspace): add PROFILE-<name>.md bootstrap file support When OPENCLAW_PROFILE is set (and not "default"), automatically load a PROFILE-<profileName>.md file from the workspace as an additional bootstrap context file. This gives each profile instance its own personality/context overlay without needing hook configuration. Changes: - Add isProfileBootstrapName() helper to validate PROFILE-*.md pattern - Update loadWorkspaceBootstrapFiles() to load profile file when env var is set - Insert profile file in correct order (after USER.md, before HEARTBEAT.md) - Update loadExtraBootstrapFiles() to accept PROFILE-*.md filenames - Update filterBootstrapFilesForSession() to preserve profile files in subagent/cron sessions - Widen WorkspaceBootstrapFileName type to include dynamic profile filenames - Add comprehensive test coverage for all profile file scenarios - Update bootstrap-extra-files hook documentation The profile file is optional - if it doesn't exist, it's silently skipped without adding a [MISSING] marker. This makes it zero-config for multi-instance setups like hive clusters. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: add promoted release workflow for v*-turq.* tags --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Vignesh Natarajan <vigneshnatarajan92@gmail.com> Co-authored-by: SK Akram <skcodewizard786@gmail.com> Co-authored-by: jeffr <jeffr@local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: pickaxe <54486432+ProspectOre@users.noreply.github.com> Co-authored-by: janckerchen <janckerchen@gmail.com> Co-authored-by: Frank Yang <frank.ekn@gmail.com> Co-authored-by: Brian Mendonca <brianmendonca@Brians-MacBook-Air.local> Co-authored-by: SidQin-cyber <sidqin0410@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: maweibin <532282155@qq.com> Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com> Co-authored-by: obviyus <22031114+obviyus@users.noreply.github.com> Co-authored-by: lbo728 <extreme0728@gmail.com> Co-authored-by: David Rudduck <david@rudduck.org.au> Co-authored-by: Glucksberg <80581902+Glucksberg@users.noreply.github.com> Co-authored-by: echoVic <echoVic@users.noreply.github.com> Co-authored-by: Yuzuru Suzuki <navitima@gmail.com> Co-authored-by: YuzuruS <1485195+YuzuruS@users.noreply.github.com>

…jeffwnli

…jeffwnli)

…jeffwnli

…jeffwnli)

…jeffwnli

…jeffwnli)

…jeffwnli

…jeffwnli) (cherry picked from commit dd07c06) # Conflicts: # CHANGELOG.md # src/cli/gateway-cli/run-loop.test.ts # src/cli/gateway-cli/run-loop.ts # src/infra/infra-parsing.test.ts

…jeffwnli

…jeffwnli) (cherry picked from commit dd07c06) # Conflicts: # CHANGELOG.md # src/cli/gateway-cli/run-loop.test.ts # src/cli/gateway-cli/run-loop.ts # src/infra/infra-parsing.test.ts

…jeffwnli

…jeffwnli)

openclaw-barnacle bot added cli CLI command changes size: M labels Feb 22, 2026

steipete self-assigned this Feb 22, 2026

jeffr and others added 5 commits February 22, 2026 10:36

fix: tighten gateway restart loop handling (openclaw#23416) (thanks @…

5573517

…jeffwnli)

steipete force-pushed the fix/gateway-restart-loop branch from aad980c to 5573517 Compare February 22, 2026 09:38

steipete merged commit dd07c06 into openclaw:main Feb 22, 2026
3 checks passed

github-actions bot mentioned this pull request Feb 22, 2026

📡 Upstream Digest — 2026-02-22 10:17 UTC curtismercier/openclaw-mods#97

Open

gemini-code-assist bot mentioned this pull request Feb 23, 2026

Sync with remote: 2/23/2026 ArchitectVS7/OpenClaw#18

Merged

18 tasks

steipete mentioned this pull request Feb 23, 2026

Config hot-reload spawns orphan gateway process when running under systemd #7421

Closed

gabrielkoo pushed a commit to gabrielkoo/openclaw that referenced this pull request Feb 23, 2026

fix: tighten gateway restart loop handling (openclaw#23416) (thanks @…

ec57118

…jeffwnli)

mreedr pushed a commit to mreedr/openclaw-custom that referenced this pull request Feb 24, 2026

fix: tighten gateway restart loop handling (openclaw#23416) (thanks @…

d5f4641

…jeffwnli)

mylukin pushed a commit to mylukin/openclaw that referenced this pull request Feb 26, 2026

fix: tighten gateway restart loop handling (openclaw#23416) (thanks @…

966a68d

…jeffwnli)

github-actions bot mentioned this pull request Mar 1, 2026

cherry-pick: upstream bugfix commits (2026-03-01-0443) hughdidit/DAISy-Agency#140

Closed

6 tasks

zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026

fix: tighten gateway restart loop handling (openclaw#23416) (thanks @…

ec16b63

…jeffwnli)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve gateway infinite restart loop (zombie PID + lock race)#23416

fix: resolve gateway infinite restart loop (zombie PID + lock race)#23416
steipete merged 5 commits intoopenclaw:mainfrom
jeffwnli:fix/gateway-restart-loop

jeffwnli commented Feb 22, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

Uh oh!

steipete commented Feb 22, 2026

Uh oh!

steipete commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jeffwnli commented Feb 22, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Causes & Fixes

1. Zombie process detection in isPidAlive (src/shared/pid-alive.ts)

2. Release lock before process.exit() (src/cli/gateway-cli/run-loop.ts)

3. Release lock before spawning the child (src/cli/gateway-cli/run-loop.ts)

4. Guard entry.ts top-level code with isMainModule (src/entry.ts)

Impact

Tests

Greptile Summary

Confidence Score: 5/5

Uh oh!

Uh oh!

steipete commented Feb 22, 2026

Uh oh!

steipete commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeffwnli commented Feb 22, 2026 •

edited by greptile-apps bot

Loading

1. Zombie process detection in `isPidAlive` (`src/shared/pid-alive.ts`)

2. Release lock before `process.exit()` (`src/cli/gateway-cli/run-loop.ts`)

3. Release lock before spawning the child (`src/cli/gateway-cli/run-loop.ts`)

4. Guard `entry.ts` top-level code with `isMainModule` (`src/entry.ts`)