You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Filed early as a focused report; expanded with the full investigation — all five call sites, the native-vs-plugin asymmetry, the related compaction-hook gap, diagrams, and the fix mapping.
Summary
When a plugin-owned context engine — one that advertises ownsCompaction: true (e.g. the lossless-claw / LCM plugin) — handles compaction, OpenClaw awaits the engine's ContextEngine.compact() with no timeout, no watchdog, and no abort signal, across five distinct call sites. The 900 s safety timeout (compactWithSafetyTimeout / EMBEDDED_COMPACTION_TIMEOUT_MS) that protects native pi-agent-core compaction was never applied to the plugin-owned lanes.
If a plugin engine's compact() is slow or hangs, the agent turn is unresponsive for the entire duration. The only backstop is the default 48-hour agent run timeout — and even that cannot interrupt the plugin call, because ContextEngine.compact() has no abortSignal in its contract.
User-visible symptom: "the agent stopped responding right after it started compacting."
A second, independent host-side gap in the same area — unbounded before_compaction / after_compaction hooks — is documented under Related host-side gap below.
The compaction-stall surface
flowchart TD
subgraph PE["pi-embedded runner"]
L1["compact.queued.ts<br/>queued /compact"]
L2["run.ts<br/>context-overflow recovery"]
L3["run.ts<br/>run-timeout recovery"]
end
subgraph CX["codex agent harness"]
L4["compact.ts<br/>compactOwningContextEngine"]
L5["run-attempt.ts<br/>forceContextEngineCompactionForCodexOverflow"]
end
L1 --> CE
L2 --> CE
L3 --> CE
L4 --> CE
L5 --> CE
CE["await contextEngine.compact()"]
CE --> X["BEFORE FIX: no timeout / no abort / no watchdog<br/>a hung plugin compact() is awaited forever"]
NAT["native pi-agent-core compaction"] --> NT["compactWithSafetyTimeout — 900 s bound"]
NT --> NOTE["the 5 plugin lanes above never received this wrapper"]
classDef bad fill:#ffe3e3,stroke:#d00000
classDef good fill:#e3ffe3,stroke:#1a7f1a
class CE,X bad
class NT,NOTE good
Sites 1–3 are the pi-embedded runner. Sites 4–5 are the codex agent harness — compact.queued.ts early-returns to maybeCompactCodexAppServerSession before its own lane-queued compact() is reached, so for a codex-harness agent with an ownsCompaction plugin (the common production setup) sites 4–5 are the lanes that actually run.
Each site has a try/catch that converts a throwncompact() error into a clean { ok: false } result — but a hang (a promise that never settles) is not a throw, so the catch does nothing for it.
Root cause
compactWithSafetyTimeout (src/agents/pi-embedded-runner/compaction-safety-timeout.ts, EMBEDDED_COMPACTION_TIMEOUT_MS = 900_000) does exist and is applied — but only around the native activeSession.compact() call inside compactEmbeddedPiSessionDirectOnce. A plugin engine that owns compaction bypasses delegateCompactionToRuntime and therefore never reaches that wrapper.
Verified: the installed 2026.5.18 dist chunk containing the contextEngine.compact() call sites has zero references to compactWithSafetyTimeout.
Aggravators:
ContextEngine.compact() (src/context-engine/types.ts) has no abortSignal parameter. A run-level abort (abortRun → activeSession.abortCompaction()) only aborts the native session — a plugin compact() in flight receives no cancellation.
waitForCompactionRetryWithAggregateTimeout re-arms indefinitely while isCompactionStillInFlight() is true, so its 60 s aggregate timeout degrades to an infinite wait if a compaction_start is never matched by a compaction_end.
The stall
sequenceDiagram
participant U as User
participant H as OpenClaw host
participant E as plugin compact
participant P as Summarizer provider
U->>H: message (new turn)
H->>E: await contextEngine.compact()
E->>P: summarizer call
P--xE: rate-limited / unreachable
E->>P: retry / fallback provider
P--xE: still failing
Note over E: minutes of retries and per-call timeouts
Note over H: no safety timeout — awaits indefinitely
Note over U: agent is unresponsive
Loading
The plugin-side reasons a compact() can run this long (unbounded sweep loops, a rate-limited summarizer) are tracked at Martian-Engineering/lossless-claw#711. This issue is the host-side half: regardless of why a plugin compact() is slow, OpenClaw must not await it unbounded.
Related host-side gap: compaction hooks
A second, independent host-side gap sits in the same area. The before_compaction and after_compaction plugin hooks have no default timeout: DEFAULT_VOID_HOOK_TIMEOUT_MS_BY_HOOK (src/plugins/hooks.ts) lists only agent_end, and runVoidHook applies a timeout only when one is resolved — so with no table entry and no plugin-supplied hook.timeoutMs, these hooks run fully unbounded.
In the codex agent harness these hooks fire on the strictly serialized notification queue:
flowchart LR
NQ["codex notification queue<br/>(strictly serialized)"]
NQ --> E1["item/started: contextCompaction"]
E1 --> BC["await before_compaction hook"]
BC --> HUNG["hook hangs — no default timeout"]
HUNG --> BLOCK["notification queue frozen"]
BLOCK --> TC["turn/completed never processed"]
TC --> HANG["the whole agent turn hangs"]
classDef bad fill:#ffe3e3,stroke:#d00000
class HUNG,BLOCK,TC,HANG bad
Loading
Environment
OpenClaw 2026.5.18 (traced in the installed npm dist; the structure is present across 2026.5.x).
A plugin context engine with ownsCompaction: true installed (reproduced with lossless-claw / LCM).
Reproduction
Install a context-engine plugin that sets ownsCompaction: true (e.g. lossless-claw).
Configure its summarizer to a model/provider that is slow or rate-limited.
Drive a long session past the compaction threshold, or hit max-token overflow to force compaction.
The plugin's compact() runs slowly; the host awaits it with no timeout; the agent turn never returns.
Expected vs actual
Expected: plugin-owned compaction is bounded by the same safety timeout as native compaction; on timeout the host fails the compaction cleanly (surfacing an error and/or falling back) instead of hanging the turn.
Actual: no timeout on any of the five plugin lanes; the turn hangs until the 48 h run timeout, which itself cannot abort the plugin call.
Fix
compact() no-timeout (this issue): PR fix(agents): bound plugin-owned context-engine compaction with a safety timeout #84083 — adds a compactContextEngineWithSafetyTimeout helper and wraps all fiveownsCompactioncompact() call sites; adds an optional abortSignal to the ContextEngine.compact() contract and threads the run abort signal through. The helper is shared with the codex extension via the openclaw/plugin-sdk/agent-harness-runtime surface (one implementation, no copy-paste).
feat(context-engine): add interceptCompaction contract for context-engine plugins #81164 — the in-flight interceptCompaction work wires an abort-aware path for the separate codex session_before_compact intercept lane (the interceptsCompaction capability). This issue concerns the ownsCompaction queued/overflow/timeout lanes, which that work does not cover.
Filed early as a focused report; expanded with the full investigation — all five call sites, the native-vs-plugin asymmetry, the related compaction-hook gap, diagrams, and the fix mapping.
Summary
When a plugin-owned context engine — one that advertises
ownsCompaction: true(e.g. thelossless-claw/ LCM plugin) — handles compaction, OpenClawawaits the engine'sContextEngine.compact()with no timeout, no watchdog, and no abort signal, across five distinct call sites. The 900 s safety timeout (compactWithSafetyTimeout/EMBEDDED_COMPACTION_TIMEOUT_MS) that protects native pi-agent-core compaction was never applied to the plugin-owned lanes.If a plugin engine's
compact()is slow or hangs, the agent turn is unresponsive for the entire duration. The only backstop is the default 48-hour agent run timeout — and even that cannot interrupt the plugin call, becauseContextEngine.compact()has noabortSignalin its contract.User-visible symptom: "the agent stopped responding right after it started compacting."
A second, independent host-side gap in the same area — unbounded
before_compaction/after_compactionhooks — is documented under Related host-side gap below.The compaction-stall surface
flowchart TD subgraph PE["pi-embedded runner"] L1["compact.queued.ts<br/>queued /compact"] L2["run.ts<br/>context-overflow recovery"] L3["run.ts<br/>run-timeout recovery"] end subgraph CX["codex agent harness"] L4["compact.ts<br/>compactOwningContextEngine"] L5["run-attempt.ts<br/>forceContextEngineCompactionForCodexOverflow"] end L1 --> CE L2 --> CE L3 --> CE L4 --> CE L5 --> CE CE["await contextEngine.compact()"] CE --> X["BEFORE FIX: no timeout / no abort / no watchdog<br/>a hung plugin compact() is awaited forever"] NAT["native pi-agent-core compaction"] --> NT["compactWithSafetyTimeout — 900 s bound"] NT --> NOTE["the 5 plugin lanes above never received this wrapper"] classDef bad fill:#ffe3e3,stroke:#d00000 classDef good fill:#e3ffe3,stroke:#1a7f1a class CE,X bad class NT,NOTE goodThe five
ownsCompactioncompact()call sites/compactsrc/agents/pi-embedded-runner/compact.queued.ts/compactsrc/agents/pi-embedded-runner/run.tssrc/agents/pi-embedded-runner/run.tsextensions/codex/src/app-server/compact.ts→compactOwningContextEngineextensions/codex/src/app-server/run-attempt.ts→forceContextEngineCompactionForCodexOverflowSites 1–3 are the pi-embedded runner. Sites 4–5 are the codex agent harness —
compact.queued.tsearly-returns tomaybeCompactCodexAppServerSessionbefore its own lane-queuedcompact()is reached, so for a codex-harness agent with anownsCompactionplugin (the common production setup) sites 4–5 are the lanes that actually run.Each site has a
try/catchthat converts a throwncompact()error into a clean{ ok: false }result — but a hang (a promise that never settles) is not a throw, so the catch does nothing for it.Root cause
compactWithSafetyTimeout(src/agents/pi-embedded-runner/compaction-safety-timeout.ts,EMBEDDED_COMPACTION_TIMEOUT_MS = 900_000) does exist and is applied — but only around the nativeactiveSession.compact()call insidecompactEmbeddedPiSessionDirectOnce. A plugin engine that owns compaction bypassesdelegateCompactionToRuntimeand therefore never reaches that wrapper.Verified: the installed 2026.5.18
distchunk containing thecontextEngine.compact()call sites has zero references tocompactWithSafetyTimeout.Aggravators:
ContextEngine.compact()(src/context-engine/types.ts) has noabortSignalparameter. A run-level abort (abortRun→activeSession.abortCompaction()) only aborts the native session — a plugincompact()in flight receives no cancellation.waitForCompactionRetryWithAggregateTimeoutre-arms indefinitely whileisCompactionStillInFlight()is true, so its 60 s aggregate timeout degrades to an infinite wait if acompaction_startis never matched by acompaction_end.The stall
sequenceDiagram participant U as User participant H as OpenClaw host participant E as plugin compact participant P as Summarizer provider U->>H: message (new turn) H->>E: await contextEngine.compact() E->>P: summarizer call P--xE: rate-limited / unreachable E->>P: retry / fallback provider P--xE: still failing Note over E: minutes of retries and per-call timeouts Note over H: no safety timeout — awaits indefinitely Note over U: agent is unresponsiveThe plugin-side reasons a
compact()can run this long (unbounded sweep loops, a rate-limited summarizer) are tracked at Martian-Engineering/lossless-claw#711. This issue is the host-side half: regardless of why a plugincompact()is slow, OpenClaw must notawaitit unbounded.Related host-side gap: compaction hooks
A second, independent host-side gap sits in the same area. The
before_compactionandafter_compactionplugin hooks have no default timeout:DEFAULT_VOID_HOOK_TIMEOUT_MS_BY_HOOK(src/plugins/hooks.ts) lists onlyagent_end, andrunVoidHookapplies a timeout only when one is resolved — so with no table entry and no plugin-suppliedhook.timeoutMs, these hooks run fully unbounded.In the codex agent harness these hooks fire on the strictly serialized notification queue:
flowchart LR NQ["codex notification queue<br/>(strictly serialized)"] NQ --> E1["item/started: contextCompaction"] E1 --> BC["await before_compaction hook"] BC --> HUNG["hook hangs — no default timeout"] HUNG --> BLOCK["notification queue frozen"] BLOCK --> TC["turn/completed never processed"] TC --> HANG["the whole agent turn hangs"] classDef bad fill:#ffe3e3,stroke:#d00000 class HUNG,BLOCK,TC,HANG badEnvironment
dist; the structure is present across 2026.5.x).ownsCompaction: trueinstalled (reproduced withlossless-claw/ LCM).Reproduction
ownsCompaction: true(e.g.lossless-claw).compact()runs slowly; the hostawaits it with no timeout; the agent turn never returns.Expected vs actual
Fix
compact()no-timeout (this issue): PR fix(agents): bound plugin-owned context-engine compaction with a safety timeout #84083 — adds acompactContextEngineWithSafetyTimeouthelper and wraps all fiveownsCompactioncompact()call sites; adds an optionalabortSignalto theContextEngine.compact()contract and threads the run abort signal through. The helper is shared with the codex extension via theopenclaw/plugin-sdk/agent-harness-runtimesurface (one implementation, no copy-paste).before_compaction/after_compactiondefault entries (30 s) toDEFAULT_VOID_HOOK_TIMEOUT_MS_BY_HOOK.Related
interceptCompactionwork wires an abort-aware path for the separate codexsession_before_compactintercept lane (theinterceptsCompactioncapability). This issue concerns theownsCompactionqueued/overflow/timeout lanes, which that work does not cover.compactFullSweep/compactUntilUnder).