fix codex memory flush tool surface#85220
Conversation
|
Codex review: needs maintainer review before merge. Latest ClawSweeper review: 2026-05-22 06:02 UTC / May 22, 2026, 2:02 AM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. from source inspection rather than an executed repro. Current main forwards memoryFlushWritePath into tool construction but still lets the Codex app-server path reach native/sandbox shell tooling while the shared Pi contract restricts memory runs to read plus append-only write. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the targeted Codex/memory fix after a maintainer explicitly accepts the read/write-only memory-flush contract and required checks pass. Do we have a high-confidence way to reproduce the issue? Yes, from source inspection rather than an executed repro. Current main forwards memoryFlushWritePath into tool construction but still lets the Codex app-server path reach native/sandbox shell tooling while the shared Pi contract restricts memory runs to read plus append-only write. Is this the best way to solve the issue? Yes. The branch fixes both source-backed causes by pre-creating the canonical file before the maintenance run and aligning Codex dynamic/native/sandbox tool selection with the existing memory-flush contract, subject to maintainer acceptance of the compatibility change. Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 03125c8e132d. |
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Frosted Proofling Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
79dec79 to
c4d747f
Compare
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
What changed
readand append-onlywrite.Root cause
The Telegram failure was directly caused by a missing canonical memory file.
The session-memory hook can write timestamped captures such as
memory/YYYY-MM-DD-HHMM.md, while the pre-compaction memory flush plan targets the canonical daily filememory/YYYY-MM-DD.md. On a first flush for the day, that canonical file may not exist yet. In the observed session, the maintenance turn ran a command equivalent totail -80 memory/YYYY-MM-DD.md; because the canonical file did not exist, the shell command failed withNo such file or directoryand the failure surfaced in Telegram.The available Codex tools were not the direct cause of that filesystem error. They were the reason the Codex memory flush was able to take a shell-based path at all.
Harness behavior mismatch
OpenClaw already treats memory-triggered flush runs as a constrained maintenance mode in the shared/Pi harness path. For those runs, the shared tool factory exposes only:
readwrite, wrapped so it can append only to the planned canonical memory fileThat prevents a memory flush from using general shell or arbitrary file-edit tools. It also means the model is guided into the intended operation: inspect relevant context, then append durable memory to the one target file chosen by the flush plan.
Codex memory-triggered runs were still getting the app-server native Code Mode shell/file surface. That bypassed the same restriction the Pi harness path already gets, so Codex could use
bash/native file tools instead of the guarded append-only write tool. This PR makes Codex memory flush behavior match the Pi/shared harness behavior.Why this fixes it
This fix addresses both parts of the failure chain:
memory/YYYY-MM-DD.mdtarget is created before the flush run, so first flushes of the day have a valid target path.Pre-creating the file fixes the observed
No such file or directoryfailure. Restricting the Codex tool surface fixes the harness-contract bug that let a memory flush use general shell/file tools instead of the intended append-only memory path.Validation
node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-memory.test.ts -t "runs a memory flush turn"node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts -t "limits Codex memory flush|points yielded sandbox_exec"pnpm tsgo:testpnpm exec oxlint extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts src/auto-reply/reply/agent-runner-memory.ts src/auto-reply/reply/agent-runner-memory.test.tsgit diff --checkKnown unrelated test drift
A full run of
extensions/codex/src/app-server/run-attempt.test.tscurrently fails two image-cleanup tests because current behavior emits an extrathread/unsubscriberequest while the tests expect onlythread/start|resumeandturn/start. Those failures reproduce when selected by themselves and do not touch the memory flush path.