Skip to content

WebChat can silently rotate agent:main:main after gateway restart, hiding prior session/checkpoints #70330

@Squirbie

Description

@Squirbie

Summary

After a gateway service reinstall/restart, a WebChat reconnect followed by a normal visible user message caused the active agent:main:main WebChat session to be reset/rotated. The previous transcript was archived to *.jsonl.reset.*, the active session store started pointing at a new sessionId, and the Control UI showed only the new short session and its checkpoint.

From the user's perspective this looked like data loss:

  • prior upper chat history disappeared from the main WebChat view
  • previous compaction checkpoints were no longer shown for the active session
  • the new session started at the first post-reconnect message
  • the agent behaved as if it had no prior context

The transcript was recoverable on disk, but only with manual file/session-store repair.

This appears related to:

This report adds a narrower failure mode: the reset/rotation happened immediately after gateway restart/reconnect even though the user did not intentionally request a new/reset session in the visible message.

Environment

  • OpenClaw CLI: 2026.4.21 (f788c88)
  • Control UI / WebChat client reported in logs: webchat v2026.4.15
  • OS: macOS 26.4.1 arm64
  • Gateway service: macOS LaunchAgent
  • Gateway mode: local loopback, token auth
  • Default WebChat session key: agent:main:main
  • Model: openai-codex/gpt-5.4

What happened

  1. The main WebChat session had been running under:

    key: agent:main:main
    old sessionId: 315349e0-36d8-4edb-a0de-2ebadbffb471
    
  2. Another agent was fixing a gateway warning and ran:

    openclaw gateway install --force && openclaw gateway restart
    
  3. The gateway received SIGTERM, and the WebChat socket disconnected.

  4. The gateway came back up. The Control UI reconnected and fetched chat.history, sessions.list, models, etc.

  5. The user sent a normal visible recovery message in WebChat to continue the previous task.

  6. The prior active transcript was archived:

    ~/.openclaw/agents/main/sessions/315349e0-36d8-4edb-a0de-2ebadbffb471.jsonl.reset.2026-04-22T19-07-46.300Z
    
  7. sessions.json then pointed agent:main:main at a new session:

    key: agent:main:main
    new sessionId: 600f198c-7155-4ed7-b4f8-7eb6eeaf20bb
    sessionFile: ~/.openclaw/agents/main/sessions/600f198c-7155-4ed7-b4f8-7eb6eeaf20bb.jsonl
    
  8. The new transcript's first user message was the post-reconnect recovery message. The stored message did not visibly contain /new or /reset.

  9. The UI now showed the new short session and no longer showed the old session/checkpoints under the active WebChat row.

  10. The new session later compacted with a summary beginning with No prior history, which made the agent behave as if it had lost context.

Sanitized log evidence

Approximate local timeline:

03:51:52  exec: openclaw gateway install --force && openclaw gateway restart
03:51:55  gateway: SIGTERM received; shutting down
03:51:55  ws: webchat disconnected
03:58:34  gateway ready
04:01:43  ws: webchat connected
04:01:44  ws: chat.history OK
04:01:44  ws: sessions.list OK
04:02:56  ws: second webchat connected
04:06:52  ws: chat.history OK
04:07:46  ws: chat.send OK
04:07:46  previous transcript archived to .jsonl.reset.*
04:16:08  context overflow on new sessionId 600f198c...
04:17:42  auto-compaction succeeded

Disk/session-store evidence:

old live file missing:
  ~/.openclaw/agents/main/sessions/315349e0-36d8-4edb-a0de-2ebadbffb471.jsonl

old archive present:
  ~/.openclaw/agents/main/sessions/315349e0-36d8-4edb-a0de-2ebadbffb471.jsonl.reset.2026-04-22T19-07-46.300Z

old checkpoint files still present:
  315349e0-...checkpoint.14f08e4d-....jsonl
  315349e0-...checkpoint.4de9839f-....jsonl
  315349e0-...checkpoint.b7376126-....jsonl
  ...

new active session file:
  ~/.openclaw/agents/main/sessions/600f198c-7155-4ed7-b4f8-7eb6eeaf20bb.jsonl

sessions.json after the incident:

{
  "agent:main:main": {
    "sessionId": "600f198c-7155-4ed7-b4f8-7eb6eeaf20bb",
    "sessionFile": "~/.openclaw/agents/main/sessions/600f198c-7155-4ed7-b4f8-7eb6eeaf20bb.jsonl",
    "origin": {
      "provider": "webchat",
      "surface": "webchat",
      "chatType": "direct"
    },
    "compactionCount": 1
  }
}

The prior archive had about 1000 JSONL entries, while the new active session had under 200 at the time of inspection.

Expected behavior

After gateway restart/reconnect:

  • WebChat should continue using the existing active session unless the user explicitly confirms a new/reset action.
  • A normal visible chat message should not silently rotate agent:main:main.
  • If a reset/new action happens, the UI should make it clear:
    • old session id
    • new session id
    • whether this was triggered by UI button, slash command, RPC, or internal recovery
    • where the archived transcript can be found
  • Prior reset archives/checkpoints should remain discoverable from the Control UI.

Actual behavior

  • The active WebChat session key continued to be agent:main:main, but its sessionId rotated from 315349e0... to 600f198c....
  • The old transcript/checkpoints were still on disk, but not visible as the active WebChat history/checkpoints.
  • The visible first message in the new session did not contain /new or /reset, so the user had no obvious explanation for why the session rotated.
  • Manual recovery required:
    • backing up sessions.json and both transcript files
    • copying the .jsonl.reset.* archive back to the old .jsonl name
    • patching sessions.json so agent:main:main pointed back to the old sessionId
    • reattaching checkpoint metadata
    • restarting the gateway

Why this is serious

This is not just a cosmetic history-list issue. It breaks operational continuity for long-running WebChat agents:

  • the agent may lose the actual task context
  • checkpoint UI no longer corresponds to the user's expected session
  • the user may continue in a fresh context and get wrong or nonsensical actions
  • recovery requires low-level edits in ~/.openclaw/agents/*/sessions

For write-capable/local-admin agents, this can create risky follow-up behavior because the user thinks they are continuing the prior state while the agent is actually in a fresh or reconstructed state.

Possible root cause area

From inspecting the installed dist, WebChat reset/new can be triggered through the same chat.send path:

const RESET_COMMAND_RE = /^\/(new|reset)(?:\s+([\s\S]*))?$/i;
...
const resetCommandMatch = message.match(RESET_COMMAND_RE);
if (resetCommandMatch && requestedSessionKey) {
  ...
  const resetResult = await runSessionResetFromAgent(...);
  ...
  const postResetMessage = normalizeOptionalString(resetCommandMatch[2]) ?? "";
  if (postResetMessage) message = postResetMessage;
}

Because the reset command is stripped and only the post-reset message is stored, the resulting transcript can look like a normal first message even though a reset/new command was processed.

That makes this class of bug difficult to diagnose after the fact.

Suggested fixes

Minimum product safety changes:

  1. Do not implement the WebChat "New session" button as handleSendChat("/new"). Use a dedicated RPC such as sessions.reset / sessions.create and include an explicit trigger: "ui-button" or similar audit field.

  2. For WebChat, reject /new and /reset inside generic chat.send unless the request includes an explicit allowSessionControl: true flag generated by a confirmed UI action.

  3. Persist a reset audit event in the session store or transcript:

    {
      "type": "session_reset",
      "trigger": "slash-command | ui-button | rpc | recovery",
      "oldSessionId": "...",
      "newSessionId": "...",
      "sessionKey": "agent:main:main",
      "timestamp": "...",
      "postResetMessagePresent": true
    }
  4. Do not silently strip /reset some message into some message without preserving a visible/auditable reset marker.

  5. After gateway restart/reconnect, clear stale pending "new/reset" UI state and stale drafts/idempotency state so a reconnect cannot accidentally replay a reset command.

  6. Surface reset archives/checkpoints in Control UI. Even if agent:main:main intentionally remains one logical row, users need a first-class way to reopen or restore prior reset archives.

  7. Add a confirmation dialog for reset/new on WebChat that shows the current session key and session id. This is especially important for long-running direct WebChat sessions.

Workaround used locally

Manual recovery was possible because the old transcript/checkpoints were still on disk:

  1. back up sessions.json and all relevant transcript files
  2. copy 315349e0...jsonl.reset... back to 315349e0...jsonl
  3. patch sessions.json so agent:main:main points back to 315349e0...
  4. reattach checkpoint metadata for existing 315349e0...checkpoint.*.jsonl files
  5. restart the gateway

This restored the old WebChat view, but it is too risky for normal users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions