Skip to content

Heartbeat isolatedSession=true replays prior heartbeat context, causing deterministic overflow/restart loop #84218

@reboost-openclaw-team

Description

@reboost-openclaw-team

Summary

Heartbeat runs configured with isolatedSession=true and lightContext=true can still receive a large replay of prior heartbeat context. The docs describe isolatedSession: true as a "fresh session each run (no conversation history)", but the compiled prompt can include context-engine summaries and prior assistant/tool heartbeat outputs associated with the stable heartbeat session key.

On our production VPS this became a deterministic loop:

  • heartbeat session key: agent:trent:main:heartbeat
  • configured model before mitigation: ollama/nemotron-3-nano:30b
  • estimated prompt: ~124,349 tokens
  • prompt budget before reserve: ~111,616 tokens
  • overflow: ~12,733 tokens
  • messages reported by overflow diagnostic: 70
  • auto-compaction attempts succeeded/retried, then the same precheck failed again
  • after attempt 3/3, OpenClaw restarted the heartbeat session id and repeated on the next tick

In one 6-hour window on this deployment alone, we observed approximately 280 overflow prechecks and 70 restart cycles. The first visible crossing of the nemotron context limit was no later than 2026-05-17T22:15:27Z, and the pattern continued afterward until we mitigated locally by moving the heartbeat lane to a larger-context model and reducing cadence.

Repro config shape

A single agent heartbeat is enough when the model context window is smaller than the accumulated replay:

{
  "agents": {
    "list": [
      {
        "id": "trent",
        "heartbeat": {
          "every": "5m",
          "model": "ollama/nemotron-3-nano:30b",
          "isolatedSession": true,
          "lightContext": true,
          "target": "none"
        }
      }
    ]
  }
}

The same class should reproduce with any model around a ~112K usable prompt window or smaller once enough heartbeat output has accumulated.

Documentation vs observed behavior

Docs say:

  • isolatedSession: true = "fresh session each run (no conversation history)"
  • lightContext: true = "only inject HEARTBEAT.md from bootstrap files"

Observed behavior:

  • isolatedSession=true creates a new session id, but not a fresh model context.
  • lightContext=true trims bootstrap files only; it does not stop context-engine/session replay of prior heartbeat summaries, assistant outputs, or tool results.
  • Prior heartbeat no-change outputs can be promoted into future heartbeat context, increasing each future prompt.

Source-read root cause

From reading the installed 2026.5.18 dist source, the substrate appears to:

  • derive a stable isolated heartbeat session key like <base>:heartbeat
  • call resolveCronSession(... forceNew: true ...) to create a new session id
  • then pass SessionKey: runSessionKey, where runSessionKey is the stable isolated heartbeat session key
  • pass bootstrapContextMode="lightweight" for lightContext=true

So the session id is fresh, but context is still rebuilt against a stable heartbeat session key that can hydrate old heartbeat activity.

Evidence from compiled context

A failing heartbeat trajectory context.compiled event showed:

  • system prompt original chars: 55,152
  • messages retained in trajectory: 65
  • original message array length: 70
  • visible retained messages included:
    • 6 user/context summaries
    • 54 assistant heartbeat/no-change outputs
    • 4 heartbeat tool results
    • a truncation marker
  • the replayed messages were prior heartbeat summaries and noisy no-change heartbeat replies, not current tick data

The actual heartbeat transcript .jsonl was absent after precheck failure; the evidence was in the trajectory file.

Expected behavior

When isolatedSession=true, a heartbeat tick should be truly bounded/fresh by default:

  • no prior heartbeat assistant replies
  • no prior heartbeat tool results
  • no context-engine replay of previous heartbeat ticks
  • include only current HEARTBEAT.md, current time, pending system events/commitments, and explicitly configured bounded context

If preserving some heartbeat history is desired, it should be opt-in and bounded.

Actual behavior

Prior heartbeat activity is replayed into the next heartbeat prompt despite a fresh session id. Once the replay exceeds the model context window, reactive compaction/restart does not solve it because the same oversized context is regenerated on retry.

Impact

A quiet maintenance feature can become a load loop:

  • repeated context-overflow prechecks
  • repeated auto-compaction attempts
  • repeated heartbeat session restarts
  • degraded /readyz from event-loop delay/utilization
  • no useful heartbeat maintenance work completed

Suggested fix

Please add one of:

  1. A real ephemeral heartbeat mode that prevents any prior heartbeat output/context-engine replay from entering the next tick.
  2. Make isolatedSession=true enforce no prior heartbeat history by default.
  3. Add an explicit bound knob such as heartbeat.maxHistoryMessages, heartbeat.maxContextMessages, or heartbeat.replayHistory=false.

Also consider preventing notify=false / no-change heartbeat outputs from being promoted into future heartbeat context.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions