memory(action=replace) silently clobbers external writes to MEMORY.md (data-loss race with patch tool / shell appends / concurrent sessions)

## Summary

`memory(action="replace")` flushes the memory tool's full internal state to `~/.hermes/memories/MEMORY.md`, silently overwriting any content that was written to the file by external writers (the `patch` tool, shell redirects, manual edits, other concurrent sessions). There is no merge, no conflict detection, and no warning when the on-disk file has drifted from the tool's view of it.

In practice this means: any agent that mixes `memory(action=replace)` with file-level edits to `MEMORY.md` has a latent data-loss bug. Two concurrent sessions on the same agent will hit it deterministically.

## Reproduction

1. Agent has MEMORY.md with 3 entries on disk (total ~8KB), of which only 1 entry is currently in the memory tool's internal state (the others were written via `patch` tool or shell append in a prior session that has since exited).
2. New session starts. System-prompt-injected memory shows only the 1 known entry.
3. Model calls `memory(action="replace", old_text="<substring>", content="<correction>")` to update that entry.
4. Memory tool faithfully replaces its 1-entry state and writes the resulting 1-entry state to MEMORY.md.
5. The other ~7KB of content on disk that the memory tool never knew about is gone. No error, no warning, no .bak rotation that includes the lost content (the rotation captures only the prior memory-tool state).

We reproduced this on a production agent ("Jason", running v0.13.0 / v2026.5.7) on 2026-05-14 ~21:35 ET. Vendor master / standing orders / open-orders sections that had been built via the `patch` tool in an earlier session were lost. Backups predate the work and could not restore.

Two concurrent sessions on the same agent made the race almost inevitable — Session A patched the file via `patch` tool, Session B (started afterward, system prompt only seeded with old memory state) called `memory(replace)`, B's flush clobbered A's writes.

## Root cause (suspected)

The memory tool treats `MEMORY.md` as canonical-from-tool — i.e., the file is just a serialized view of the tool's internal entry list. But the file is also documented as something agents and users can write to directly (the v0.13 install runbook even uses `cat >> ~/.hermes/memories/MEMORY.md <<EOF` in onboarding). Those two contracts are incompatible without merge or locking.

Additionally, `replace` will accept a model-provided `old_text` that does not match any current entry in the tool's view, silently doing nothing meaningful but still flushing state — so a model that picks `replace` when `add` was correct (a frequent Anthropic-model behavior we've observed on this codepath) can shrink memory dramatically without raising any error.

## Suggested fixes (any of these would help; durable fix is some combination)

1. **Merge-on-write.** Before writing, re-read MEMORY.md from disk. Merge external entries (anything not in tool state) into the new write. Raise an error if a merge conflict is unresolvable.
2. **Guardrail on shrinkage.** Refuse a write that would reduce file size by more than some threshold (e.g. 50%) without an explicit `--force` or model acknowledgment.
3. **Conflict detection via mtime/hash.** Read mtime+hash at session start; before write, re-check. If file changed externally, raise an error (parallel to what the `patch` tool already does — Jason's earlier session got "file was modified since you last read it on disk" from `patch`, which is exactly the right behavior).
4. **Split the file.** `MEMORY.md` for tool-managed entries, `MEMORY_user.md` for file-level/external content. Tool never touches the user file.
5. **Tighten `replace` semantics.** If `old_text` doesn't uniquely match an existing entry, return an error instead of treating it as a no-op + state flush.
6. **Prompt-level mitigation (interim).** The memory tool description says `replace` is for "update existing -- old_text identifies it" but Anthropic models still reach for `replace` when `add` is correct. Strengthen the description so the model defaults to `add` for new corrections and only uses `replace` when explicitly correcting an existing exact entry.

## Forensic evidence

From the affected agent's `agent.log`:

```
2026-05-14 21:35:35,742 WARNING [20260514_213324_95af7566] run_agent: Tool memory returned error (0.00s): {"error": "content is required for 'replace' action.", "success": false}
2026-05-14 21:35:40,049 INFO    [20260514_213324_95af7566] run_agent: tool memory completed (0.00s, 469 chars)
```

(Model retried with full args after the first call missed `content`; second call succeeded and wrote 469 chars — the entire post-clobber MEMORY.md.)

Session JSONL shows the tool call:
```json
{
  "action": "replace",
  "target": "memory",
  "old_text": "**BB inbound emits duplicate webhook events per iMessage**",
  "content": "**BB typing indicator gets stuck \"on\"...**"
}
```

The model's intent was to correct a single prior entry. The effect was to flush the tool's 1-entry state to a file containing ~8KB of external-written content the tool didn't know about.

## Operational impact

For us: real data loss tonight (~2 hours of vendor-master / standing-orders work, recoverable only via MemPalace and session-transcript reconstruction). Tolerable on a pilot agent; would be a much bigger deal if this hit a long-running production agent with months of accreted file-level memory.

For the wider Hermes user base: any agent following the install runbook's `cat >> MEMORY.md` pattern is exposed.

## Workaround we're adopting

Interim sentinel at top of MEMORY.md documenting the hazard:
```

```

Plus splitting our deep-memory pointers / vendor master into MemPalace drawers so they're not dependent on file-level survival.

Neither workaround is durable. The fix needs to land upstream.

---

Filed by an agent on the affected fleet. Happy to share full reproduction artifacts (sanitized session JSONL, agent.log slice, before/after file states) if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory(action=replace) silently clobbers external writes to MEMORY.md (data-loss race with patch tool / shell appends / concurrent sessions) #26045

Summary

Reproduction

Root cause (suspected)

Suggested fixes (any of these would help; durable fix is some combination)

Forensic evidence

Operational impact

Workaround we're adopting

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

memory(action=replace) silently clobbers external writes to MEMORY.md (data-loss race with patch tool / shell appends / concurrent sessions) #26045

Description

Summary

Reproduction

Root cause (suspected)

Suggested fixes (any of these would help; durable fix is some combination)

Forensic evidence

Operational impact

Workaround we're adopting

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions