Skip to content

[Bug]: Context overflow reset can map sessionFile to nonexistent transcript, orphaning real session history #75151

@MickeMG

Description

@MickeMG

Summary

A long-lived Telegram direct agent session hit context-overflow / compaction failure, then the active sessions.json mapping was rotated to a new sessionId whose transcript file was never created. As a result, chat.history / session lookup returned an empty history even though the real prior transcript still existed on disk under the previous session id.

From the user perspective this looked like the agent was "cut off without compaction" and had forgotten all recent work. Manual forensic recovery showed the prior transcript was intact, but orphaned from the active session mapping.

This appears related to the existing context overflow / compaction / session mapping issue family, but this case adds a specific failure mode:

  • reset/rotation chooses a new sessionId
  • sessions.json points the direct session key at that new id and sessionFile
  • the new .jsonl transcript file does not exist
  • the real active/recent work remains in the previous large .jsonl
  • history lookup for the session key returns empty or stale data

Environment

  • OpenClaw: 2026.4.26
  • Runtime: Node v22.22.0
  • OS: Darwin arm64
  • Channel: Telegram direct agent session
  • Provider/model: openai-codex/gpt-5.5
  • Affected pattern: long-lived tool-heavy agent session with many compactions / summaries

Observed state

The session index entry for the Telegram direct agent key pointed to a new session id similar to:

{
  "sessionId": "6855a9b8-...",
  "sessionFile": ".../agents/<agent>/sessions/6855a9b8-....jsonl",
  "compactionCount": 10,
  "authProfileOverrideCompactionCount": 10,
  "systemSent": false,
  "abortedLastRun": false
}

But the referenced file did not exist:

.../agents/<agent>/sessions/6855a9b8-....jsonl -> ENOENT / exists false

Meanwhile, the real recent transcript existed and contained the latest user/assistant/tool work:

.../agents/<agent>/sessions/59eb5fe8-....jsonl
size: ~14 MB
rows: ~5356
latest content: recent successful user/assistant/tool turns

Trajectory evidence for the real transcript showed repeated context/precheck failures:

promptError: "Context overflow: prompt too large for the model (precheck)"
promptErrorSource: "precheck"
context.compiled events truncated at trajectory-event-size-limit
originalBytes often ~273K-515K+

Some runs succeeded, but several subsequent runs failed before assistant output. This created brittle behavior around compaction/recovery and eventually produced the bad mapping to a missing transcript.

Actual behavior

  1. Long-lived session grows very large.
  2. Context precheck overflows repeatedly.
  3. Compaction/truncation attempts occur, but the active transcript is not reliably rotated/shrunk enough.
  4. Runtime resets/rotates the active session mapping to a new session id.
  5. The new transcript file is not created / prewarmed.
  6. sessions.json points at the missing file.
  7. chat.history / session lookup returns no useful messages.
  8. The agent appears to have lost continuity even though the old transcript still exists.

Expected behavior

If OpenClaw rotates/resets a session after compaction/precheck failure, it should guarantee one of these safe outcomes:

  1. The session key points to a real transcript containing a compacted continuity summary, or
  2. The new transcript file is created/prewarmed before the session store is updated, or
  3. The old transcript remains mapped until the new transcript is durable, or
  4. Recovery uses the last valid transcript/checkpoint instead of returning empty history.

At minimum, sessions.json should not point sessionFile at a nonexistent .jsonl.

Why this matters

For long-lived agents, this becomes a practical "memory wipe" even though the data is still present on disk. It is especially painful for tool-heavy business/ops agents where compaction is supposed to preserve working state.

Related issues

Possibly related / same bug family:

Suspected root cause

Based on local forensics, likely a combination of:

  • oversized append-only transcript
  • compaction target / precheck budget mismatch
  • no active transcript rotation/shrink after compaction in this path
  • reset path updating session metadata before the new transcript exists
  • queued/nested followups possibly retaining stale sessionFile references

Suggested fixes / mitigations

  • Ensure compaction-success path can rotate/shrink active transcript when it remains too large.
  • Add a guard: never persist a sessionFile path in sessions.json unless the file exists or has just been durably created.
  • In reset/rotation path, prewarm/write the new transcript before switching the session key mapping.
  • If new transcript creation fails, keep mapping to the previous valid transcript and surface an explicit recovery warning.
  • Consider a max active transcript byte cap for individual .jsonl files, not only session index maintenance.
  • Add a repair command/check that detects sessions.json entries pointing at missing transcript files and suggests/remaps to the most recent valid transcript/checkpoint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions