Skip to content

Claude CLI sessions reset on every gateway restart due to ephemeral loopback port in mcpConfigHash #64386

@zomars

Description

@zomars

Summary

Every gateway restart silently wipes all persisted Claude CLI session memory. The first turn after a restart for any session logs cli session reset: provider=claude-cli reason=mcp and starts a fresh claude -p instead of claude --resume <id>. Users experience this as "the agent suddenly forgot everything."

Reproducible on main. Affects any deployment using the claude-cli backend with the loopback MCP bridge (i.e. the default configuration since #35676).

Root cause

Two independently-correct commits interact badly:

  1. 12100719b8 — "fix: preserve cli sessions across model changes" introduced mcpConfigHash as a CLI-session-reuse invalidation key in src/agents/cli-session.ts. At that time the hashed mergedConfig contained only user-authored MCP state (plugin .mcp.json, inline mcpServers from bundle manifests) — stable across restarts, so hashing was safe.

  2. 3de09fbe74 — "fix: restore claude cli loopback mcp bridge (fix(mcp): harden MCP loopback server and scope CLI tool access to claude-cli #35676)" merged the in-gateway loopback server into the same mergedConfig via additionalConfig in prepareCliBundleMcpConfig (src/agents/cli-runner/prepare.ts:116-138). The loopback URL is constructed in src/gateway/mcp-http.loopback-runtime.ts:22-38:

    url: `http://127.0.0.1:${port}/mcp`,

    The port comes from src/gateway/mcp-http.ts:26 which calls startMcpLoopbackServer(port = 0) — OS-assigned ephemeral port, different on every gateway start. That literal port ends up in the JSON that src/agents/cli-runner/bundle-mcp.ts:301-302 hashes:

    const serializedConfig = `${JSON.stringify(params.mergedConfig, null, 2)}\n`;
    const mcpConfigHash = crypto.createHash("sha256").update(serializedConfig).digest("hex");

Result: mcpConfigHash changes on every gateway start, so resolveCliSessionReuse (src/agents/cli-session.ts:148-151) returns { invalidatedReason: "mcp" } on the first turn of every previously-persisted session after every restart.

The auth token is not the culprit — it's referenced as ${OPENCLAW_MCP_TOKEN} and resolved via env, so it never enters the hashed bytes. The port is the sole offender because it is a literal in the URL.

Layering concern

Beyond the immediate bug, there is a layering issue: bundle-mcp.ts treats all entries in mcpServers as equivalent user-authored config, but the loopback entry is gateway-internal runtime state. Its contribution to session identity should be "is OpenClaw's own tool surface attached" (boolean), not "which ephemeral port did we bind today." Any future ephemeral value merged into mergedConfig (PID, tempdir path, per-start identifier) will reintroduce the same class of bug.

Evidence from a live gateway

2026-04-10T07:03:25 [gateway] signal SIGTERM received
2026-04-10T07:03:58 [gateway] MCP loopback server listening on http://127.0.0.1:62949/mcp
2026-04-10T07:03:58 [gateway] ready
2026-04-10T07:40:43 [agent] cli session reset: provider=claude-cli reason=mcp   <- session A, first turn after restart
2026-04-10T07:56:47 [agent] cli session reset: provider=claude-cli reason=mcp   <- session B, first turn after restart

One reset per distinct session, on its first post-restart turn — exactly the pattern expected when the stored hash was computed against a previous ephemeral port.

Why tests did not catch it

  • src/agents/cli-runner/bundle-mcp.test.ts asserts hash presence and format (/^[0-9a-f]{64}$/) but never asserts stability under loopback port churn.
  • src/agents/cli-session.test.ts tests resolveCliSessionReuse with hand-crafted hashes; there is no end-to-end test that the hash computed at run N+1 equals the hash persisted at run N across a simulated gateway restart.

Suggested fix

Compute mcpConfigHash from the user-authored mergedConfig before additionalConfig (the loopback) is merged in, then merge the loopback on top only for writing the actual mcp.json / CLI args. Rough shape in src/agents/cli-runner/bundle-mcp.ts#prepareCliBundleMcpConfig:

const hashableConfig = mergedConfig; // user-authored only
if (params.additionalConfig) {
  mergedConfig = applyMergePatch(mergedConfig, params.additionalConfig) as BundleMcpConfig;
}
return await prepareModeSpecificBundleMcpConfig({
  mode,
  backend: params.backend,
  mergedConfig,                // includes loopback — used to write mcp.json
  hashSource: hashableConfig,  // excludes loopback — used for session identity
  env: params.env,
});

prepareModeSpecificBundleMcpConfig then hashes params.hashSource ?? params.mergedConfig.

Regression tests to add

  1. Run prepareCliBundleMcpConfig twice with identical user MCP state but two different loopback ports; assert prepared1.mcpConfigHash === prepared2.mcpConfigHash.
  2. Add or modify a real plugin MCP server between the two runs; assert the hash does change (so the fix is not over-corrective).

Regression window

2026-04-04 (merge of #35676) → present. Any release that ships both 12100719b8 and 3de09fbe74 is affected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions