Skip to content

[Bug] Compaction preflight throws MissingAgentHarnessError("claude-cli") for claude-cli runtime sessions over contextThreshold #84857

@kingchurch

Description

@kingchurch

Summary

maybeCompactAgentHarnessSession in dist/pi-embedded-dfmy3LtH.js calls selectAgentHarness(runtime) on the agent's primary model runtime without the isCliRuntimeAlias bypass that the main dispatch path in dist/attempt-execution-CU4DLDLC.js:283 uses (isCliProvider(cliExecutionProvider, params.cfg)).

When the agent's primary model is configured with agentRuntime.id: "claude-cli" (e.g., anthropic/claude-opus-4-7 with model-scoped Claude CLI runtime policy — the form OpenClaw itself writes when Claude CLI auth is reused, per #82344 in 2026.5.17), the compaction preflight throws:

MissingAgentHarnessError: Requested agent harness "claude-cli" is not registered.

claude-cli is a CLI backend (registered by anthropic plugin), not a registered agent harness. Every code path that wants to act on a claude-cli runtime should branch via isCliRuntimeAlias and dispatch through runCliAgent (or skip embedded-harness setup entirely for preflight-only checks).

This is a direct sibling of #84222 ("route Anthropic model refs selected with Claude CLI auth through the Claude CLI runtime"), which fixed the same regression class on the main dispatch path in 2026.5.20-beta.1. The compaction preflight is a parallel call site that wasn't covered.

User-visible impact

When the bug fires, the inbound channel turn fails before any model dispatch happens. The error propagates up through the reply pipeline and is delivered as the assistant's reply text through the channel:

Claw User   [10:24 PM]
Yo
Claw Agent  [10:24 PM]
⚠️ Requested agent harness "claude-cli" is not registered.

It also blocks compaction itself — the session is never actually compacted, so it keeps growing toward hard context overflow, and every subsequent inbound message hits the same failure path until the user /news the session.

Affected version

  • OpenClaw 2026.5.20-beta.1 (abac0c5) — confirmed
  • Earlier versions presumably affected since the introduction of agentRuntime.id as a canonical config key (2026.5.4-ish per changelog: "Agents/runtime: add agentRuntime.id as the canonical config key"). The doctor migration in 2026.5.12: WhatsApp runtime silently rebinds, agent loses all tools #83491 actively writes this config shape, so any user re-running openclaw doctor --fix after upgrading inherits the vulnerability.

Configuration that triggers:

"models": {
  "anthropic/claude-opus-4-7": {
    "agentRuntime": { "id": "claude-cli" }
  }
}

Plus a Claude CLI auth profile (anthropic:claude-cli) and agents.defaults.cliBackends.claude-cli.command pointing at the claude binary — the canonical setup for Claude Max subscription routing.

Deterministic repro

  1. OpenClaw 2026.5.20-beta.1 (or any version with [Bug]: Regression after upgrading from 2026.5.7 → 2026.5.12: embedded runtime detected as third-party harness while direct Claude CLI still works #82344 / 2026.5.12: WhatsApp runtime silently rebinds, agent loses all tools #83491 migrations applied), Anthropic agent with the config block above.
  2. Drive any session (Slack DM, dashboard chat, TUI, etc.) past lossless-claw.contextThreshold × the agent's effective context window. With contextThreshold: 0.5 and a 1M Opus context, that's any session >500K tokens. With LCM defaults that auto-derive from the summary model, the threshold can be much lower (~80K), so a few-turns session quickly triggers it.
  3. Send any inbound message.
  4. Expected: real model reply. Actual: ⚠️ Requested agent harness "claude-cli" is not registered. delivered to the channel within ~300ms.

Reproduces 100% deterministically once the session is over threshold.

Log fingerprint

Two simultaneous lane-task errors within ~1ms of each other, then channel delivery of the warning:

[diagnostic] lane task error: lane=session:agent:<id>:<channel>:direct:<peer> durationMs=N error="MissingAgentHarnessError: Requested agent harness \"claude-cli\" is not registered."
[diagnostic] lane task error: lane=main                                       durationMs=N error="MissingAgentHarnessError: Requested agent harness \"claude-cli\" is not registered."
[slack] delivered reply to channel:<id>     ← warning text

The dual-lane pattern is the double-enqueue in pi-embedded-dfmy3LtH.js:155:

return enqueueCommandInLane(sessionLane, () => enqueueGlobal(async () => { ... }));

— outer wraps inner, both surface the inner throw.

Root cause — code references

In dist/pi-embedded-dfmy3LtH.js:

// line 117-123
const ceHarnessPolicy = resolveAgentHarnessPolicy({
  provider: ceProvider,
  modelId: ceModelId,
  config: params.config,
  agentId: agentIds.sessionAgentId,
  sessionKey: params.sessionKey
});
// ...
// line 142
const harnessResult = await maybeCompactAgentHarnessSession({
  ...params,
  contextEngine,
  contextTokenBudget,
  contextEngineRuntimeContext
});

maybeCompactAgentHarnessSession calls into dist/selection-CwzX-GXW.js and reaches line 15325:

// dist/selection-CwzX-GXW.js
throw new MissingAgentHarnessError(runtime);

The branch that throws is hit because runtime === "claude-cli" and pluginHarnesses.find(h => h.id === runtime) returns undefined.

The correct pattern lives in dist/attempt-execution-CU4DLDLC.js:

// line 283
if (!isRawModelRun && isCliProvider(cliExecutionProvider, params.cfg)) {
  // ... runCliAgent dispatch, bypasses embedded harness lookup entirely
}

The compaction preflight needs the same bypass — for CLI runtimes, harness selection is not the right abstraction; the session needs to be flagged as CLI-managed and the compaction either skipped (CLI owns context) or dispatched through a CLI-runtime-aware code path.

Suggested fix

In pi-embedded maybeCompactAgentHarnessSession (or the function it delegates harness resolution to), short-circuit when isCliRuntimeAlias(harnessPolicy.runtime) === true:

import { n as isCliRuntimeAlias } from "./model-runtime-aliases-D0zjgBCZ.js";

// before calling selectAgentHarness, mirror the attempt-execution bypass:
if (isCliRuntimeAlias(harnessPolicy.runtime)) {
  // CLI-managed sessions: skip embedded-harness compaction preflight,
  // let the CLI dispatch path handle the session (or delegate compaction
  // to the LCM context engine which can operate independently of harness
  // selection — it just needs a summary model).
  return undefined; // continue to non-harness compaction path
}

The same guard pattern was applied in #84222 for the main dispatch path; this issue is asking for the parallel patch on the compaction preflight path.

Workaround

User-side: /new (or /reset) the affected session to drop it back under the compaction threshold. This is destructive of live conversation state (Claude CLI --session-id rotated, prompt cache invalidated, in-flight tool context lost) and only delays recurrence until the new session also grows past threshold.

Config-side: raise plugins.entries.lossless-claw.config.contextThreshold to a higher fraction (e.g., 0.7 instead of 0.5) to reduce how often the preflight fires. This is a frequency reduction, not a fix — the bug still surfaces eventually.

There is no config-side workaround that avoids the bug entirely while keeping a claude-cli runtime configured.

Related issues / PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions