fix(agents): bound plugin-owned context-engine compaction with a safety timeout#84083
Conversation
|
Codex review: needs real behavior proof before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path. PR rating What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. PR egg Rarity: 💎 rare. What is this egg doing here?
Real behavior proof Risk before merge Maintainer options:
Next step before merge Review detailsBest possible solution: Retry the Codex review after fixing the execution failure. Do we have a high-confidence way to reproduce the issue? Unclear. The review failed before ClawSweeper could establish a reproduction path. Is this the best way to solve the issue? Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction. What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 3bc728eaa993. |
|
Companion plugin-side PR: Martian-Engineering/lossless-claw#712 — it bounds |
|
🦞✅ Source: Why human review is needed: Recommended next action: I added |
A plugin context engine that advertises `ownsCompaction` had its
`ContextEngine.compact()` awaited by the host with no timeout, no
watchdog, and no abort signal on all three engine-owned lanes (queued
`/compact`, overflow recovery, timeout recovery). The 900s
`compactWithSafetyTimeout` that protects native pi-agent-core
compaction was only ever applied to the native `activeSession.compact()`
call, so a slow or hung plugin `compact()` would hang the agent turn
indefinitely.
Wrap all three engine-owned `contextEngine.compact()` call sites in a
new `compactContextEngineWithSafetyTimeout` helper that applies the same
host-resolved finite timeout used for native compaction, and add an
optional `abortSignal` to the `ContextEngine.compact()` contract. The
run's abort signal is threaded into the helper (raced against the call
so a non-cooperating engine is still bounded) and into the `compact()`
params (so cooperating engines can cancel in-flight work). On timeout or
abort the queued lane surfaces a clean `{ ok: false }` result — matching
how the overflow/timeout run lanes already convert a thrown
`compact()` — instead of hanging or throwing a raw rejection.
…ty timeout
The owns-compaction safety timeout was wired into the three pi-embedded
`contextEngine.compact()` call sites but missed the two codex-harness
lanes a codex-harness agent with an `ownsCompaction` plugin actually
uses:
- `compactOwningContextEngine` (`app-server/compact.ts`) — the
context-engine-owned Codex compaction lane.
- `forceContextEngineCompactionForCodexOverflow`
(`app-server/run-attempt.ts`) — the Codex force-compaction-on-overflow
recovery path.
Both sites had a `try/catch` that converts a *thrown* `compact()` error
to a clean `{ ok: false }` result, but a hung `compact()` promise that
never settles was not caught and would stall the agent turn
indefinitely.
Wrap both with `compactContextEngineWithSafetyTimeout`, resolving the
timeout via the same host-resolved `resolveCompactionTimeoutMs(config)`
the pi-embedded sites use. On timeout/abort the helper rejects and the
existing `try/catch` converts it to `{ ok: false }` — no new failure
handling needed. `compact.ts` threads `params.abortSignal`;
`run-attempt.ts` threads the run-level `runAbortController.signal`.
To keep a single shared implementation, export
`compactContextEngineWithSafetyTimeout` and `resolveCompactionTimeoutMs`
through the existing `openclaw/plugin-sdk/agent-harness-runtime` surface
the codex extension already imports — no copy-pasted watchdog.
The fix now covers all five `ownsCompaction` `compact()` call sites:
3 pi-embedded + 2 codex-harness.
631363a to
9121a1a
Compare
|
Merged via squash.
Thanks @100yenadmin! |
…026.5.20) (#615) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.19` → `2026.5.20` | --- >⚠️ **Warning** > > Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/567) for more information. --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.5.20`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026520) [Compare Source](openclaw/openclaw@v2026.5.19...v2026.5.20) ##### Changes - Exec approvals: remove the old `cat SKILL.md && printf ... && <skill-wrapper>` allowlist compatibility path so skill files must be loaded with the read tool and only the real skill executable is auto-allowed. - Discord: let voice sessions follow configured Discord users into voice channels, with allowed-channel checks, multi-user handoff, bounded reconciliation, and DAVE recovery preservation. ([#​84264](openclaw/openclaw#84264)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Discord/voice: include bounded `IDENTITY.md`, `USER.md`, and `SOUL.md` profile context in realtime voice session instructions by default, with `voice.realtime.bootstrapContextFiles: []` available to disable it. ([#​84499](openclaw/openclaw#84499)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Dependencies: bump the bundled Codex harness to `@openai/codex` `0.132.0` and refresh the app-server model-list docs for the new catalog. - CLI/policy: add the bundled Policy plugin for policy-backed channel conformance checks, doctor lint findings, and opt-in workspace repair. ([#​80407](openclaw/openclaw#80407)) Thanks [@​giodl73-repo](https://github.com/giodl73-repo). - Agents/config: allow `agents.list[].experimental.localModelLean` so lean local-model mode can be enabled for one configured agent instead of globally. - Providers/xAI: add device-code OAuth login so remote and headless setups can authorize xAI without a localhost browser callback. ([#​84005](openclaw/openclaw#84005)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Providers/OpenRouter: honor provider-level `params.provider` routing policy for OpenRouter requests, with model and agent params overriding the defaults. Thanks [@​amknight](https://github.com/amknight). ##### Fixes - CLI/tasks: include stale-running task maintenance decisions in `openclaw tasks maintenance --json` so retained and reconcile candidates explain backing-session, cron, CLI, and wedged-subagent state. ([#​84691](openclaw/openclaw#84691)) Thanks [@​efpiva](https://github.com/efpiva). - Codex app-server: keep system-prompt reports working when bootstrap hooks provide workspace files with only a path and content, so hook-supplied SOUL/IDENTITY/TOOLS/USER context still reports injected characters correctly. ([#​84736](openclaw/openclaw#84736)) Thanks [@​JARVIS-Glasses](https://github.com/JARVIS-Glasses). - Providers/MiniMax music: stop advertising `durationSeconds` control and remove prompt-injected duration hints, so `music_generate` reports MiniMax duration as an unsupported override instead of suggesting MiniMax can enforce track length. Fixes [#​84508](openclaw/openclaw#84508). Thanks [@​neeravmakwana](https://github.com/neeravmakwana). - Doctor: warn when sandbox tool policy hides configured MCP server tools before provider requests. ([#​84699](openclaw/openclaw#84699)) Thanks [@​nxmxbbd](https://github.com/nxmxbbd). - WhatsApp: update Baileys to `7.0.0-rc12`. - Build: suppress per-locale `rolldown-plugin-dts:fake-js` CommonJS dts warnings emitted while bundling the intentionally-inlined `zod/v4/locales/*.d.cts` files, so `pnpm build` output stays readable after the 0.25.1 plugin bump. Thanks [@​romneyda](https://github.com/romneyda). - CLI/nodes: route lazy plugin-registration logs to stderr for JSON-mode `openclaw nodes` commands so stdout stays parseable. ([#​84684](openclaw/openclaw#84684)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Approvals: route manual `/approve` decisions through the trusted approval runtime so active exec and plugin approvals no longer look unknown or expired. - Mac app: update the About settings copyright year to 2026. ([#​84385](openclaw/openclaw#84385)) Thanks [@​pejmanjohn](https://github.com/pejmanjohn). - Dependencies: update `@openclaw/fs-safe` to `0.2.7` so OpenClaw's default Python-helper-off policy keeps best-effort Node write fallbacks for private stores, secret writes, run logs, and media attachments on Linux/macOS. - Infra/secrets: restore the fail-closed contract for `tryReadSecretFileSync` so credential loaders that pass `rejectSymlink: true` (Telegram, LINE, Zalo, IRC, Nextcloud Talk tokens) refuse symlinked credential files instead of silently accepting them, and the infra-state CI shard's secret-file symlink test passes again. Thanks [@​romneyda](https://github.com/romneyda). - Browser: honor the configured image sanitization limit for screenshots and labeled snapshots so browser-captured images follow the same resize policy as other image results. ([#​84595](openclaw/openclaw#84595)) - Doctor: remove unrecognized `models.providers.*.models[*].compat.thinkingFormat` values during `doctor --fix` so stale provider model config can validate after upgrade. Fixes [#​77803](openclaw/openclaw#77803). - Doctor: warn when `openclaw.json` stores plaintext secret-bearing config fields, including model provider API keys and sensitive provider headers. ([#​84718](openclaw/openclaw#84718)) Thanks [@​lukaIvanic](https://github.com/lukaIvanic). - Status: show the configured default, session-selected model, reason, clear hint, and docs link when a session remains pinned to a model that differs from `agents.defaults.model.primary`. - WebChat: clear stale typing indicators when session change events mark the active chat run complete. - Mac app: keep local packaging signed with a stable app identity for permission testing and fix Control UI production builds under current Vite/Highlight.js exports. - macOS app: update the embedded Peekaboo bridge to 3.2.1 so OpenClaw-hosted UI automation works with current Peekaboo CLI capture flows. - Cron: deliver preferred final assistant output for successful scheduled runs when trailing plain tool warnings remain in diagnostics instead of marking the run failed. - fix(mattermost): fail closed on missing channel type \[AI]. ([#​84091](openclaw/openclaw#84091)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - Recheck rebuilt system.run argv \[AI]. ([#​84090](openclaw/openclaw#84090)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - CLI: keep the private QA subcommand out of exported command descriptors unless `OPENCLAW_ENABLE_PRIVATE_QA_CLI=1`, so root help and subcommand markers match runtime registration. ([#​84519](openclaw/openclaw#84519)) - CLI/cron: bound `openclaw cron show` job lookup pagination so non-advancing or unbounded `cron.list` responses fail instead of hanging the command. Fixes [#​83856](openclaw/openclaw#83856). ([#​83989](openclaw/openclaw#83989)) - Agents/messages: stop message-tool-only turns after a successful source-channel `message` send while keeping transcript mirrors under the session write lock. ([#​84289](openclaw/openclaw#84289)) - Agents: filter silent heartbeat response-tool transcript artifacts out of embedded context snapshots so later user turns are not polluted by heartbeat no-op messages. ([#​83477](openclaw/openclaw#83477)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Agents/OpenAI: log repeated strict tool-schema downgrade diagnostics once per provider/model/tool signature, reducing duplicate debug noise while preserving `strict=false` fallback behavior. Fixes [#​82930](openclaw/openclaw#82930). ([#​82933](openclaw/openclaw#82933)) Thanks [@​galiniliev](https://github.com/galiniliev). - Agents/code mode: spell out the `exec` tool's JavaScript/TypeScript, no Node module, and catalog-bridge constraints in model-visible schema text so agents can use enabled tools without trial-and-error. ([#​84269](openclaw/openclaw#84269)) Thanks [@​Kaspre](https://github.com/Kaspre). - Codex: give `image_generate` dynamic-tool calls a 120s default watchdog when no per-call or configured image timeout is set, so image generation no longer falls back to the generic 30s bridge timeout. ([#​84254](openclaw/openclaw#84254)) Thanks [@​moritzmmayerhofer](https://github.com/moritzmmayerhofer). - Codex: avoid duplicate dynamic tool terminal diagnostics while large diagnostic backlogs drain without blocking tool responses. ([#​82937](openclaw/openclaw#82937)) Thanks [@​galiniliev](https://github.com/galiniliev). - CLI/message: include a stable top-level `messageId` in `openclaw message --json` output when channel sends return one. ([#​84191](openclaw/openclaw#84191)) Thanks [@​100menotu001](https://github.com/100menotu001). - Cron: preserve legacy top-level array `jobs.json` stores when loading or adding scheduled jobs so old cron jobs are no longer treated as an empty store during upgrade. Fixes [#​60799](openclaw/openclaw#60799). ([#​84433](openclaw/openclaw#84433)) Thanks [@​IWhatsskill](https://github.com/IWhatsskill). - Gateway/agents: use an agent's `identity.name` in Gateway agent summaries when `agents.list[].name` is unset, so configured agent labels remain visible in clients. ([#​84355](openclaw/openclaw#84355); refs [#​57835](openclaw/openclaw#57835)) Thanks [@​luoyanglang](https://github.com/luoyanglang). - Channels/replies: keep normal `/verbose` failed-tool progress compact in message-tool replies and prevent late text-only tool output from appearing after the final answer. ([#​84303](openclaw/openclaw#84303)) Thanks [@​VACInc](https://github.com/VACInc). - Plugins/hooks: apply a default 30-second timeout to `before_compaction` and `after_compaction` hooks so a hung plugin handler no longer blocks compaction completion. ([#​84153](openclaw/openclaw#84153)) - Discord: preserve disabled presentation buttons when adapting and rendering Discord message controls. ([#​84188](openclaw/openclaw#84188)) Thanks [@​100menotu001](https://github.com/100menotu001). - Twitch: add a test-only client-manager registry reset helper so non-isolated Twitch tests can clear cached managers between cases. Fixes [#​83887](openclaw/openclaw#83887). ([#​84244](openclaw/openclaw#84244)) Thanks [@​hclsys](https://github.com/hclsys). - Cron: run main-session scheduled work on a cron-owned wake lane while preserving reply delivery context, so background cron turns no longer block human main-session chat. Fixes [#​82766](openclaw/openclaw#82766). ([#​82767](openclaw/openclaw#82767)) Thanks [@​galiniliev](https://github.com/galiniliev). - Cron: use structured embedded-run denial metadata for isolated scheduled tasks so blocked exec requests fail the job without treating ordinary assistant prose as a denial. ([#​84067](openclaw/openclaw#84067)) Thanks [@​abnershang](https://github.com/abnershang). - Cron: keep recovered tool warnings diagnostic for successful scheduled runs so final cron output is delivered instead of being replaced by a post-processing warning. ([#​84045](openclaw/openclaw#84045)) Thanks [@​abnershang](https://github.com/abnershang). - Plugins/perf: thread explicit plugin discovery results through `loadBundledCapabilityRuntimeRegistry`, `resolveBundledPluginSources`, and `listChannelCatalogEntries` so callers that already hold a discovery result skip redundant filesystem walks. Thanks [@​SebTardif](https://github.com/SebTardif). - harden update restart script creation \[AI]. ([#​84088](openclaw/openclaw#84088)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - Docker: keep the bundled Codex plugin in official release image keep lists so the default OpenAI agent harness remains available after Docker pruning. Fixes [#​83613](openclaw/openclaw#83613). ([#​83626](openclaw/openclaw#83626)) Thanks [@​YuanHanzhong](https://github.com/YuanHanzhong). - CLI/channels: preserve the first line of `openclaw channels logs` output when the rolling tail window starts exactly on a line boundary, mirroring the already-fixed `readLogSlice` behavior in `src/logging/log-tail.ts`. - Control UI: treat terminal session status as authoritative over stale active-run flags so completed terminal runs stop showing abort/live UI. ([#​84057](openclaw/openclaw#84057)) - CLI: preserve embedded equals signs in inline root option values instead of truncating after the second separator. ([#​83995](openclaw/openclaw#83995)) Thanks [@​ThiagoCAltoe](https://github.com/ThiagoCAltoe). - Matrix/config: accept `messages.queue.byChannel.matrix` queue overrides and keep queue provider schema/type keys aligned for Matrix, Google Chat, and Mattermost. Thanks [@​bdjben](https://github.com/bdjben). - CLI: format `openclaw acp client` failures through the shared error formatter so object-shaped errors stay readable instead of printing `[object Object]`. Fixes [#​83904](openclaw/openclaw#83904). ([#​84080](openclaw/openclaw#84080)) - Providers/Ollama: default unknown-capabilities models to tool-capable so discovered native Ollama models can use tools when `/api/show` omits capabilities. ([#​84055](openclaw/openclaw#84055)) Thanks [@​dutifulbob](https://github.com/dutifulbob). - Installer/Windows: launch `install.ps1` onboarding as an attached child process so fresh native Windows installs do not freeze visibly at `Starting setup...` or corrupt the wizard's terminal rendering. - CLI/update: keep restart health checks working across one-version CLI/Gateway protocol skew and use the managed Gateway service Node for all follow-up commands even when the package root is unchanged, so `openclaw update` no longer silently switches the gateway to a different Node binary when multiple Node installations are present. Thanks [@​amknight](https://github.com/amknight). - CLI/gateway: include the running Gateway version in `gateway status` JSON output, preserving existing server metadata while falling back to status RPC data for read probes. Fixes [#​56222](openclaw/openclaw#56222). Thanks [@​galiniliev](https://github.com/galiniliev). - Memory/search: close local embedding providers when active-memory searches time out so pending local model loads and embedding contexts are aborted and released. ([#​83858](openclaw/openclaw#83858)) Thanks [@​brokemac79](https://github.com/brokemac79). - CLI/nodes: request pending node surface approval scopes before `openclaw nodes approve` so exec-capable node approval can use admin-scoped Gateway credentials instead of failing with `missing scope: operator.admin`. ([#​84392](openclaw/openclaw#84392)) Thanks [@​joshavant](https://github.com/joshavant). - Gateway: reject slow node event sends before outbound buffers grow unbounded and log the rejected payload diagnostic. ([#​84387](openclaw/openclaw#84387)) Thanks [@​samzong](https://github.com/samzong). - Agents: include bounded trajectory queued-writer diagnostics in `pi-trajectory-flush` timeout warnings so flush stalls show pending writes, queued bytes, and append state. Fixes [#​82961](openclaw/openclaw#82961). ([#​82962](openclaw/openclaw#82962)) Thanks [@​galiniliev](https://github.com/galiniliev). - Agents/subagents: recover stale completion announces by retrying unsupported transcript-wait wakes without transcript waiting and forcing a message-tool handoff when the requester run is already stale. Fixes [#​83699](openclaw/openclaw#83699). ([#​83700](openclaw/openclaw#83700)) Thanks [@​galiniliev](https://github.com/galiniliev). - Agents/subagents: constrain wildcard subagent target allowlists to configured agents while preserving explicitly listed compatibility targets. Fixes [#​84040](openclaw/openclaw#84040). ([#​84357](openclaw/openclaw#84357)) Thanks [@​joshavant](https://github.com/joshavant). - Providers/Anthropic: route Anthropic model refs selected with Claude CLI auth through the Claude CLI runtime so shorthand refs such as `anthropic/opus-4.7` no longer fall back to embedded Anthropic billing. Fixes [#​84222](openclaw/openclaw#84222). ([#​84374](openclaw/openclaw#84374)) Thanks [@​joshavant](https://github.com/joshavant). - Agents: honor explicit `models.providers.<id>.timeoutSeconds` values above the default idle watchdog for cloud and self-hosted providers, so long first-token waits no longer fall back at \~120s when the provider timeout is higher. ([#​83979](openclaw/openclaw#83979)) Thanks [@​yujiawei](https://github.com/yujiawei). - Agents/Codex: keep encrypted Responses reasoning replay provenance-bound so stale mirrored Codex transcripts drop invalid encrypted content before request assembly while preserving matching same-session replay. Fixes [#​83836](openclaw/openclaw#83836). ([#​84367](openclaw/openclaw#84367)) Thanks [@​joshavant](https://github.com/joshavant). - Agents/subagents: skip stale embedded-run wake probes for dormant completion requesters, so late subagent completions go straight to requester-agent/direct handoff instead of producing `reason=no_active_run` queue noise. ([#​82964](openclaw/openclaw#82964)) Thanks [@​galiniliev](https://github.com/galiniliev). - CLI: retry config snapshot reads after a transient failure so one rejected read no longer poisons later commands in the same process. ([#​83931](openclaw/openclaw#83931)) Thanks [@​honor2030](https://github.com/honor2030). - Media: decode URL path basenames before using them as remote media fallback filenames, so files like `My%20Report.pdf` are surfaced as `My Report.pdf`. Fixes [#​84050](openclaw/openclaw#84050). ([#​84052](openclaw/openclaw#84052)) Thanks [@​jbetala7](https://github.com/jbetala7). - WhatsApp: clarify inbound group diagnostics so observed but unregistered groups point to `channels.whatsapp.groups` without changing routing or sender authorization. ([#​83846](openclaw/openclaw#83846)) Thanks [@​neeravmakwana](https://github.com/neeravmakwana). - WhatsApp: drain pending outbound deliveries on a 30s periodic timer in addition to the reconnect handler, so messages enqueued while the provider is already connected no longer wait for the next reconnect to send. ([#​79083](openclaw/openclaw#79083)) Thanks [@​Oviemudiaga](https://github.com/Oviemudiaga). - CLI/TUI: include gateway plugin slash commands in TUI autocomplete, so connected sessions can suggest plugin-owned commands exposed by the running Gateway. ([#​83640](openclaw/openclaw#83640)) Thanks [@​se7en-agent](https://github.com/se7en-agent). - Gateway/mobile: restore QR setup-code handoff of bounded operator tokens for iOS and Android onboarding while keeping admin and pairing scopes out of bootstrap. ([#​83684](openclaw/openclaw#83684)) Thanks [@​ngutman](https://github.com/ngutman). - iOS: repair Release archive compilation for the TestFlight build. ([#​84255](openclaw/openclaw#84255)) Thanks [@​ngutman](https://github.com/ngutman). - Agents/compaction: bound plugin-owned CLI transcript compaction with the host safety timeout so a hung context engine can no longer stall post-turn cleanup. ([#​84083](openclaw/openclaw#84083)) Thanks [@​100yenadmin](https://github.com/100yenadmin). - Control UI/usage: truncate long context skill, tool, and file names in the usage panel while keeping the full name available on hover. ([#​42197](openclaw/openclaw#42197)) Thanks [@​Rain120](https://github.com/Rain120). - Codex: respect explicit `models auth order set` and `config.auth.order` precedence over stale `lastGood` in `/codex account`, and show `no working credential` when every explicit-order profile is ineligible instead of marking a lower-ranked profile as active. Fixes [#​84386](openclaw/openclaw#84386). ([#​84412](openclaw/openclaw#84412)) Thanks [@​openperf](https://github.com/openperf). - Agents: honor `messages.suppressToolErrors` for mutating tool failures so configured chat surfaces do not receive separate warning payloads. ([#​81561](openclaw/openclaw#81561)) Thanks [@​moeedahmed](https://github.com/moeedahmed). - Agents/fallback: surface billing guidance for mixed rate-limit plus billing fallback exhaustion instead of generic failure copy. Fixes [#​79396](openclaw/openclaw#79396). ([#​79489](openclaw/openclaw#79489)) Thanks [@​aayushprsingh](https://github.com/aayushprsingh). </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/615
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
…ty timeout (openclaw#84083) Merged via squash. Prepared head SHA: 9121a1a Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
Summary
A plugin context engine that advertises
ownsCompaction: true(e.g. thelossless-clawLCM plugin) had itsContextEngine.compact()awaited by thehost with no timeout, no watchdog, and no abort signal on every
engine-owned compaction lane — both the built-in pi-embedded runner and the
codex agent harness:
/compact(compact.queued.ts)run.ts)run.ts)extensions/codex/.../compact.ts)extensions/codex/.../run-attempt.ts)The 900 s
compactWithSafetyTimeoutsafety net (EMBEDDED_COMPACTION_TIMEOUT_MS)that protects native pi-agent-core compaction was only ever applied to the
native
activeSession.compact()call. A slow or hung plugincompact()wouldhang the agent turn indefinitely — the only backstop was the 48 h default run
timeout, which could not even interrupt the plugin call. The two codex-harness
sites already had a
try/catchthat converts a throwncompact()error to aclean
{ ok: false }result, but a hung promise that never settles was notcaught.
Changes
compactContextEngineWithSafetyTimeouthelper that wraps a pluginContextEngine.compact()in the same finite, host-resolved safety timeoutused for native compaction.
contextEngine.compact()call sites:/compact(compact.queued.ts),context-overflow recovery and timeout recovery (
run.ts).compactOwningContextEngine(
extensions/codex/src/app-server/compact.ts) andforceContextEngineCompactionForCodexOverflow(
extensions/codex/src/app-server/run-attempt.ts). These are the lanes acodex-harness agent with an
ownsCompactionplugin actually uses.compactContextEngineWithSafetyTimeout(+resolveCompactionTimeoutMs)through the existing
openclaw/plugin-sdk/agent-harness-runtimesurface thatthe codex extension already imports, so the host and the codex harness share
one implementation rather than copy-pasting the watchdog across the package
boundary.
abortSignalto theContextEngine.compact()contract. Therun's abort signal is both raced against the call (so a non-cooperating
engine is still bounded) and threaded into the
compact()params (socooperating engines can cancel their own in-flight work). The codex sites
thread
params.abortSignal(compact.ts) and the run-levelrunAbortController.signal(run-attempt.ts).{ ok: false }result — matching how the overflow/timeout run lanes alreadyconvert a thrown
compact()— instead of hanging or throwing a raw rejectionat callers.
This is additive to
feat/context-engine-intercept-compaction, which coversthe separate codex
session_before_compactintercept lane (interceptsCompaction);this PR covers the
ownsCompactionqueued/overflow/timeout lanes across boththe pi-embedded runner and the codex harness.
Fixes #84077
Test plan
pnpm tsgo:core+pnpm tsgo:core:testcleanpnpm tsgo:extensions+pnpm tsgo:extensions:testcleanoxlintclean on all changed files; plugin-sdk boundary checks passcompact()exceeding the timeout is bounded and rejects with a timeout error; the abort signal is threaded intocompact()params; abort fires before timeout and rejects promptlycompact()to a cleanok:false; run abort signal reaches engine-owned overflow compactioncompact()in the Codex compaction lane is bounded and surfaces asok:false; a hungcompact()during Codex overflow recovery is bounded so the run still proceeds; the caller / run-level abort signal is threaded into the codex-harnesscompact()Real behavior proof
Behavior addressed: hung
ownsCompactioncontext-engine compaction no longer waits forever. The host timeout now both bounds the await and aborts the signal passed intocontextEngine.compact(), then returns a clean failed compaction result.Real environment tested: local OpenClaw checkout
/Volumes/LEXAR/repos/openclaw-fix-owns-compaction-timeout, branchfix/owns-compaction-safety-timeout, headb5a35b16b55, Node vianode --import tsx, actualextensions/codex/src/app-server/compact.tsandsrc/agents/pi-embedded-runner/compaction-safety-timeout.ts.Exact steps or command run after the patch: created a real Codex app-server session binding under
/Volumes/LEXAR/Codex/pr-84083-proof, supplied an owning context engine whosecompact()never settles, configuredagents.defaults.compaction.timeoutSeconds=1, invokedmaybeCompactCodexAppServerSession(), and captured whether the signal delivered tocompact()aborted when the safety timeout fired.Evidence after fix:
{ "branch": "fix/owns-compaction-safety-timeout", "head": "b5a35b16b55", "behavior": "hung ownsCompaction contextEngine.compact() receives a host timeout abort signal and returns bounded failure", "elapsedMs": 1088, "compactCalls": 1, "receivedAbortSignal": true, "signalAborted": true, "signalReason": "Compaction timed out", "result": { "ok": false, "compacted": false, "reason": "context engine compaction failed: Compaction timed out" } }Observed result after fix: the hung owning engine returned through the host safety timeout in about 1.1 seconds, the engine saw an
AbortSignal, that signal aborted withCompaction timed out, and the caller received{ ok: false, compacted: false }instead of hanging.What was not tested: a real third-party plugin process wedged in production; this proof uses the in-process context-engine interface with an intentionally never-settling
compact()implementation.