Skip to content

[Bug]: Lost 2 days of agent context to silent compaction — no warning, no save, no recovery #5429

@EmpireCreator

Description

@EmpireCreator

Summary

Lost ~45 hours of agent work/context due to compaction with no automatic save mechanism. The agent accumulated significant context (skills installed, integrations configured, user priorities discussed) but wasn't writing to disk in real-time. When compaction cleared the history, all context was lost.

The loss was discovered when the agent reported having no memory of the prior 45 hours and asked what had been discussed. The user had to re-explain priorities, decisions, and completed work from scratch. The agent couldn't self-recover — it didn't know what it didn't know.

The memoryFlush config option exists but isn't enabled by default, and still relies on the agent actually complying with the prompt. Suggesting automatic checkpointing before compaction that doesn't depend on agent behavior.

Steps to reproduce

  1. Start a long-running main session (multi-day)
  2. Do significant work: install skills, configure integrations, discuss priorities, make decisions
  3. Don't explicitly write context to workspace files (rely on in-context memory)
  4. Let context reach compaction threshold (~90%+ usage)
  5. Compaction clears history silently
  6. Ask agent about prior work — it has no memory of it

Expected behavior

Either:

  • Agent is prompted/forced to save important context before compaction (memoryFlush on by default), OR
  • OpenClaw automatically checkpoints critical session state to disk before compaction, OR
  • Clear warning appears when context is nearing limit so user/agent can act

Actual behavior

  • Compaction happens silently with no warning
  • Nothing is saved to disk automatically
  • Agent loses all accumulated context
  • User must re-explain everything
  • No indication that compaction occurred or what was lost

Environment

  • OpenClaw version: 2026.1.29
  • OS: macOS Tahoe 26.2 (25C56)
  • Install method: npm (global)
  • Model: anthropic/claude-opus-4-5-20251101
  • Session type: Main session, long-running
  • Initial install: January 28, 2026 (version 2026.1.24-3)
  • Updated mid-session: January 30, 2026 → version 2026.1.29
  • Note: Update/restart occurred during the ~45 hour context gap; unclear if restart contributed to context loss or if purely compaction-related

Workaround implemented

We implemented a multi-layer protection scheme, though its effectiveness and impact on performance are untested:

1. Enabled compaction.memoryFlush in config

"compaction": {
  "mode": "safeguard",
  "memoryFlush": {
    "enabled": true,
    "softThresholdTokens": 8000,
    "prompt": "Write everything important from this session to memory/YYYY-MM-DD.md immediately. Use memory-log or edit directly. Include: decisions made, tasks completed, context needed for continuity. Reply NO_REPLY when done.",
    "systemPrompt": "CRITICAL: Session nearing compaction. You MUST save all important context to memory/YYYY-MM-DD.md NOW or it will be lost forever."
  }
}

2. Created a memory-log skill for frictionless logging

Install:

mkdir -p ~/.openclaw/skills/memory-log
# Create the script below at ~/.openclaw/skills/memory-log/memory-log
chmod +x ~/.openclaw/skills/memory-log/memory-log
mkdir -p ~/bin && ln -sf ~/.openclaw/skills/memory-log/memory-log ~/bin/memory-log

Script (~/.openclaw/skills/memory-log/memory-log):

#!/bin/bash
MEMORY_DIR="${CLAWD_WORKSPACE:-$HOME/<workspace>}/memory"
TODAY=$(date +%Y-%m-%d)
FILE="$MEMORY_DIR/$TODAY.md"
TIME=$(date +%H:%M)

mkdir -p "$MEMORY_DIR"

# Health check mode
if [[ "$1" == "--check" ]]; then
  [[ ! -f "$FILE" ]] && echo "⚠️ MISSING" && exit 1
  SIZE=$(wc -c < "$FILE" | tr -d ' ')
  [[ $SIZE -lt 100 ]] && echo "⚠️ SPARSE ($SIZE bytes)" && exit 1
  echo "✅ OK ($SIZE bytes)"
  exit 0
fi

# Section mode: memory-log -s "Section Name" "entry text"
if [[ "$1" == "-s" ]]; then
  [[ ! -f "$FILE" ]] && echo "# $TODAY" > "$FILE"
  grep -q "^## $2" "$FILE" || echo -e "\n## $2" >> "$FILE"
  echo "- [$TIME] $3" >> "$FILE"
  exit 0
fi

# Default: append timestamped entry
[[ ! -f "$FILE" ]] && echo "# $TODAY" > "$FILE"
echo "- [$TIME] $1" >> "$FILE"

Usage:

memory-log "installed coding-agent skill"
memory-log -s "Config Changes" "enabled memoryFlush"
memory-log --check  # Returns exit code 1 if file missing/sparse

3. Added HEARTBEAT.md memory check

## Memory Health Check (EVERY heartbeat - do this first!)
1. Check if `memory/YYYY-MM-DD.md` exists for today
2. If missing or <100 bytes and there's been session activity → Alert user immediately

4. Added rule to AGENTS.md

### ⚠️ REAL-TIME LOGGING RULE (Non-Negotiable)
After completing ANY significant work item, **immediately** append it to `memory/YYYY-MM-DD.md`. 
Do not batch. Do not wait until end of session. Compaction can happen at any time and will erase 
everything not written to disk.

Why this workaround isn't ideal

  • Complex setup — New users won't know to do any of this until they've lost data
  • Still relies on agent discipline — If agent ignores the memoryFlush prompt, data is still lost
  • Documentation gap — Memory persistence strategies aren't prominently documented
  • Default behavior is surprising — Out-of-box experience leads to silent data loss for long-running sessions
  • Uncertain impact — Unclear if this workaround negatively affects OpenClaw's performance or capabilities; would appreciate guidance on whether this approach is sound or if there's a better pattern

Suggested improvements

  1. Default memoryFlush to enabled — With a sensible default prompt
  2. Automatic session checkpoint — Serialize critical state to disk before compaction (agent-independent)
  3. Pre-compaction warning — Inject visible warning at 80-90% context usage
  4. Post-compaction summary — Inject "Session was compacted. Previous context summary: ..." so agent knows what was lost
  5. Documentation — Prominent "Memory & Persistence" guide explaining compaction behavior and best practices

Logs or screenshots

No sensitive logs to share. The issue is behavioral — compaction works as designed, but the default behavior leads to unexpected data loss for users expecting session continuity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions