Skip to content

Plugin-owned context engines (ownsCompaction) run compaction with no safety timeout — host can hang the agent indefinitely #84077

@100yenadmin

Description

@100yenadmin

Filed early as a focused report; expanded with the full investigation — all five call sites, the native-vs-plugin asymmetry, the related compaction-hook gap, diagrams, and the fix mapping.

Summary

When a plugin-owned context engine — one that advertises ownsCompaction: true (e.g. the lossless-claw / LCM plugin) — handles compaction, OpenClaw awaits the engine's ContextEngine.compact() with no timeout, no watchdog, and no abort signal, across five distinct call sites. The 900 s safety timeout (compactWithSafetyTimeout / EMBEDDED_COMPACTION_TIMEOUT_MS) that protects native pi-agent-core compaction was never applied to the plugin-owned lanes.

If a plugin engine's compact() is slow or hangs, the agent turn is unresponsive for the entire duration. The only backstop is the default 48-hour agent run timeout — and even that cannot interrupt the plugin call, because ContextEngine.compact() has no abortSignal in its contract.

User-visible symptom: "the agent stopped responding right after it started compacting."

A second, independent host-side gap in the same area — unbounded before_compaction / after_compaction hooks — is documented under Related host-side gap below.

The compaction-stall surface

flowchart TD
    subgraph PE["pi-embedded runner"]
        L1["compact.queued.ts<br/>queued /compact"]
        L2["run.ts<br/>context-overflow recovery"]
        L3["run.ts<br/>run-timeout recovery"]
    end
    subgraph CX["codex agent harness"]
        L4["compact.ts<br/>compactOwningContextEngine"]
        L5["run-attempt.ts<br/>forceContextEngineCompactionForCodexOverflow"]
    end
    L1 --> CE
    L2 --> CE
    L3 --> CE
    L4 --> CE
    L5 --> CE
    CE["await contextEngine.compact()"]
    CE --> X["BEFORE FIX: no timeout / no abort / no watchdog<br/>a hung plugin compact() is awaited forever"]
    NAT["native pi-agent-core compaction"] --> NT["compactWithSafetyTimeout — 900 s bound"]
    NT --> NOTE["the 5 plugin lanes above never received this wrapper"]
    classDef bad fill:#ffe3e3,stroke:#d00000
    classDef good fill:#e3ffe3,stroke:#1a7f1a
    class CE,X bad
    class NT,NOTE good
Loading

The five ownsCompaction compact() call sites

# Lane File / function Trigger
1 queued /compact src/agents/pi-embedded-runner/compact.queued.ts manual /compact
2 overflow recovery src/agents/pi-embedded-runner/run.ts context-overflow / force-on-max-token
3 timeout recovery src/agents/pi-embedded-runner/run.ts run-timeout recovery
4 codex owned compaction extensions/codex/src/app-server/compact.tscompactOwningContextEngine codex budget / manual compaction
5 codex overflow extensions/codex/src/app-server/run-attempt.tsforceContextEngineCompactionForCodexOverflow codex turn overflow

Sites 1–3 are the pi-embedded runner. Sites 4–5 are the codex agent harnesscompact.queued.ts early-returns to maybeCompactCodexAppServerSession before its own lane-queued compact() is reached, so for a codex-harness agent with an ownsCompaction plugin (the common production setup) sites 4–5 are the lanes that actually run.

Each site has a try/catch that converts a thrown compact() error into a clean { ok: false } result — but a hang (a promise that never settles) is not a throw, so the catch does nothing for it.

Root cause

compactWithSafetyTimeout (src/agents/pi-embedded-runner/compaction-safety-timeout.ts, EMBEDDED_COMPACTION_TIMEOUT_MS = 900_000) does exist and is applied — but only around the native activeSession.compact() call inside compactEmbeddedPiSessionDirectOnce. A plugin engine that owns compaction bypasses delegateCompactionToRuntime and therefore never reaches that wrapper.

Verified: the installed 2026.5.18 dist chunk containing the contextEngine.compact() call sites has zero references to compactWithSafetyTimeout.

Aggravators:

  • ContextEngine.compact() (src/context-engine/types.ts) has no abortSignal parameter. A run-level abort (abortRunactiveSession.abortCompaction()) only aborts the native session — a plugin compact() in flight receives no cancellation.
  • waitForCompactionRetryWithAggregateTimeout re-arms indefinitely while isCompactionStillInFlight() is true, so its 60 s aggregate timeout degrades to an infinite wait if a compaction_start is never matched by a compaction_end.

The stall

sequenceDiagram
    participant U as User
    participant H as OpenClaw host
    participant E as plugin compact
    participant P as Summarizer provider
    U->>H: message (new turn)
    H->>E: await contextEngine.compact()
    E->>P: summarizer call
    P--xE: rate-limited / unreachable
    E->>P: retry / fallback provider
    P--xE: still failing
    Note over E: minutes of retries and per-call timeouts
    Note over H: no safety timeout — awaits indefinitely
    Note over U: agent is unresponsive
Loading

The plugin-side reasons a compact() can run this long (unbounded sweep loops, a rate-limited summarizer) are tracked at Martian-Engineering/lossless-claw#711. This issue is the host-side half: regardless of why a plugin compact() is slow, OpenClaw must not await it unbounded.

Related host-side gap: compaction hooks

A second, independent host-side gap sits in the same area. The before_compaction and after_compaction plugin hooks have no default timeout: DEFAULT_VOID_HOOK_TIMEOUT_MS_BY_HOOK (src/plugins/hooks.ts) lists only agent_end, and runVoidHook applies a timeout only when one is resolved — so with no table entry and no plugin-supplied hook.timeoutMs, these hooks run fully unbounded.

In the codex agent harness these hooks fire on the strictly serialized notification queue:

flowchart LR
    NQ["codex notification queue<br/>(strictly serialized)"]
    NQ --> E1["item/started: contextCompaction"]
    E1 --> BC["await before_compaction hook"]
    BC --> HUNG["hook hangs — no default timeout"]
    HUNG --> BLOCK["notification queue frozen"]
    BLOCK --> TC["turn/completed never processed"]
    TC --> HANG["the whole agent turn hangs"]
    classDef bad fill:#ffe3e3,stroke:#d00000
    class HUNG,BLOCK,TC,HANG bad
Loading

Environment

  • OpenClaw 2026.5.18 (traced in the installed npm dist; the structure is present across 2026.5.x).
  • A plugin context engine with ownsCompaction: true installed (reproduced with lossless-claw / LCM).

Reproduction

  1. Install a context-engine plugin that sets ownsCompaction: true (e.g. lossless-claw).
  2. Configure its summarizer to a model/provider that is slow or rate-limited.
  3. Drive a long session past the compaction threshold, or hit max-token overflow to force compaction.
  4. The plugin's compact() runs slowly; the host awaits it with no timeout; the agent turn never returns.

Expected vs actual

  • Expected: plugin-owned compaction is bounded by the same safety timeout as native compaction; on timeout the host fails the compaction cleanly (surfacing an error and/or falling back) instead of hanging the turn.
  • Actual: no timeout on any of the five plugin lanes; the turn hangs until the 48 h run timeout, which itself cannot abort the plugin call.

Fix

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions