fix codex memory flush tool surface by amknight · Pull Request #85220 · openclaw/openclaw

amknight · 2026-05-22T04:32:36Z

What changed

Pre-create the canonical memory flush target file before the maintenance agent run starts.
Disable the Codex app-server native Code Mode tool surface for memory-triggered flush runs.
Keep Codex memory flush dynamic tools narrowed to OpenClaw-managed read and append-only write.
Prevent sandbox shell dynamic tools from being added back during memory flush runs.
Add regression coverage for both the core memory runner and the Codex dynamic-tool surface.

Root cause

The Telegram failure was directly caused by a missing canonical memory file.

The session-memory hook can write timestamped captures such as memory/YYYY-MM-DD-HHMM.md, while the pre-compaction memory flush plan targets the canonical daily file memory/YYYY-MM-DD.md. On a first flush for the day, that canonical file may not exist yet. In the observed session, the maintenance turn ran a command equivalent to tail -80 memory/YYYY-MM-DD.md; because the canonical file did not exist, the shell command failed with No such file or directory and the failure surfaced in Telegram.

The available Codex tools were not the direct cause of that filesystem error. They were the reason the Codex memory flush was able to take a shell-based path at all.

Harness behavior mismatch

OpenClaw already treats memory-triggered flush runs as a constrained maintenance mode in the shared/Pi harness path. For those runs, the shared tool factory exposes only:

read
write, wrapped so it can append only to the planned canonical memory file

That prevents a memory flush from using general shell or arbitrary file-edit tools. It also means the model is guided into the intended operation: inspect relevant context, then append durable memory to the one target file chosen by the flush plan.

Codex memory-triggered runs were still getting the app-server native Code Mode shell/file surface. That bypassed the same restriction the Pi harness path already gets, so Codex could use bash/native file tools instead of the guarded append-only write tool. This PR makes Codex memory flush behavior match the Pi/shared harness behavior.

Why this fixes it

This fix addresses both parts of the failure chain:

The canonical memory/YYYY-MM-DD.md target is created before the flush run, so first flushes of the day have a valid target path.
Codex memory flushes no longer get native shell/file Code Mode access; they use OpenClaw's restricted read plus append-only write surface, matching the Pi harness behavior.

Pre-creating the file fixes the observed No such file or directory failure. Restricting the Codex tool surface fixes the harness-contract bug that let a memory flush use general shell/file tools instead of the intended append-only memory path.

Validation

node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-memory.test.ts -t "runs a memory flush turn"
node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts -t "limits Codex memory flush|points yielded sandbox_exec"
pnpm tsgo:test
pnpm exec oxlint extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts src/auto-reply/reply/agent-runner-memory.ts src/auto-reply/reply/agent-runner-memory.test.ts
git diff --check

Known unrelated test drift

A full run of extensions/codex/src/app-server/run-attempt.test.ts currently fails two image-cleanup tests because current behavior emits an extra thread/unsubscribe request while the tests expect only thread/start|resume and turn/start. Those failures reproduce when selected by themselves and do not touch the memory flush path.

clawsweeper · 2026-05-22T04:34:13Z

Codex review: needs maintainer review before merge.

Latest ClawSweeper review: 2026-05-22 06:02 UTC / May 22, 2026, 2:02 AM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR pre-creates the canonical memory flush target file and restricts Codex memory-triggered runs to managed read/write tools with regression coverage.

Reproducibility: yes. from source inspection rather than an executed repro. Current main forwards memoryFlushWritePath into tool construction but still lets the Codex app-server path reach native/sandbox shell tooling while the shared Pi contract restricts memory runs to read plus append-only write.

PR rating
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🐚 platinum hermit
Summary: Good normal bug-fix PR with focused tests and no blocking findings; merge readiness depends on maintainer acceptance of the compatibility change.

Rank-up moves:

Record maintainer acceptance of the read/write-only Codex memory-flush contract before merge.

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Not applicable: The external contributor proof gate does not apply because this is a MEMBER PR with the protected maintainer label; the PR body lists targeted regression, type, lint, and diff checks.

Risk before merge

Merging intentionally removes Codex native shell/file Code Mode and sandbox_exec/sandbox_process from memory-triggered runs, so custom memory-flush prompts that relied on that unintended shell surface will fail closed under the read/write-only contract.
The PR body reports unrelated drift in the full Codex app-server test file, so final merge confidence should come from the focused regression checks plus required CI until that separate drift is resolved.

Maintainer options:

Accept the restricted memory-flush contract (recommended)
Land the PR with the explicit decision that Codex memory flushes no longer expose native shell/file Code Mode or sandbox_exec and instead match the shared read/write-only memory contract.
Pause for a compatibility exception
If maintainers want custom Codex memory-flush prompts to retain shell access, pause this PR and define that exception or migration path before changing the tool surface.

Next step before merge
Protected maintainer-labeled PR needs human acceptance of the intentional compatibility change; there is no narrow automated repair to queue.

Security
Cleared: The diff narrows memory-flush tool access, validates the target as a relative workspace path before pre-creating it, and adds no dependency, workflow, secret, or supply-chain surface.

Review details

Best possible solution:

Land the targeted Codex/memory fix after a maintainer explicitly accepts the read/write-only memory-flush contract and required checks pass.

Do we have a high-confidence way to reproduce the issue?

Yes, from source inspection rather than an executed repro. Current main forwards memoryFlushWritePath into tool construction but still lets the Codex app-server path reach native/sandbox shell tooling while the shared Pi contract restricts memory runs to read plus append-only write.

Is this the best way to solve the issue?

Yes. The branch fixes both source-backed causes by pre-creating the canonical file before the maintenance run and aligning Codex dynamic/native/sandbox tool selection with the existing memory-flush contract, subject to maintainer acceptance of the compatibility change.

Label justifications:

P1: The PR addresses an observed Codex memory flush failure during compaction that surfaced through a real channel workflow.
merge-risk: 🚨 compatibility: The diff deliberately removes native shell/file and sandbox shell tools from Codex memory-triggered runs, which can break existing custom prompts that relied on that surface.
rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🌊 off-meta tidepool, patch quality is 🐚 platinum hermit, and Good normal bug-fix PR with focused tests and no blocking findings; merge readiness depends on maintainer acceptance of the compatibility change.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external contributor proof gate does not apply because this is a MEMBER PR with the protected maintainer label; the PR body lists targeted regression, type, lint, and diff checks.

What I checked:

Current main Codex tool surface: Current main builds Codex dynamic tools, filters app-server-owned tools, then can re-add sandbox_exec/sandbox_process whenever sandboxing is active and the native surface is disabled; the current code has no memory-trigger special case. (extensions/codex/src/app-server/run-attempt.ts:3603, 03125c8e132d)
Shared memory flush contract: The shared OpenClaw tool factory already treats trigger=memory plus memoryFlushWritePath as a constrained run, requiring the write path and exposing only read plus append-only write before later policy filters. (src/agents/pi-tools.ts:475, 03125c8e132d)
Canonical memory target: The memory-core flush plan directs writes to the daily canonical memory/YYYY-MM-DD.md path and tells the agent to create memory/ if needed and append rather than overwrite. (extensions/memory-core/src/flush-plan.ts:13, 03125c8e132d)
PR implementation: The PR adds ensureMemoryFlushTargetFile before the maintenance run, disables Codex native tool surface for memory flushes, filters memory flush dynamic tools to read/write, and blocks sandbox_exec from being re-added for memory flushes. (extensions/codex/src/app-server/run-attempt.ts:3581, c4d747f6d816)
Target pre-create implementation: The PR validates the memory flush relative path lexically under the workspace, creates the parent directory, opens the target with append mode, and calls that before runWithModelFallback starts the flush turn. (src/auto-reply/reply/agent-runner-memory.ts:89, c4d747f6d816)
Regression coverage: The PR adds tests that assert the target file is ensured before the embedded agent runs and that Codex memory flush dynamic tools resolve to only read/write with native surface disabled. (extensions/codex/src/app-server/run-attempt.test.ts:763, c4d747f6d816)

Likely related people:

@TurboTheTurtle: Blame shows c2004fe owns the current memory flush runner handoff and much of the Codex app-server dynamic tool build path that this PR adjusts. (role: recent area contributor; confidence: high; commits: c2004fe6622d; files: src/auto-reply/reply/agent-runner-memory.ts, extensions/codex/src/app-server/run-attempt.ts)
@joshavant: Commit ba06376 recently changed the Codex native/sandbox tool-surface checks that control whether sandbox_exec is exposed. (role: recent sandbox/tool-surface contributor; confidence: high; commits: ba06376c7955; files: extensions/codex/src/app-server/run-attempt.ts)
@frankekn: History shows prior memory-flush contract work forwarding memoryFlushWritePath and protecting bootstrap files during memory flush runs. (role: memory flush contract contributor; confidence: medium; commits: 8306eabf85ea, 96e4975922de; files: src/agents/pi-tools.ts, src/auto-reply/reply/agent-runner-memory.ts, src/agents/pi-embedded-runner/run.ts)
@amknight: The author also has recent merged current-main work in the same Codex app-server run-attempt surface, beyond this proposed branch. (role: recent adjacent contributor; confidence: medium; commits: cc4e30b3d9c2; files: extensions/codex/src/app-server/run-attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 03125c8e132d.

clawsweeper · 2026-05-22T04:39:12Z

ClawSweeper PR egg

✨ Hatched: 🥚 common Frosted Proofling

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

Merged PRs are hatchable.
Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: keeps receipts.
Image traits: location merge queue dock; accessory miniature diff map; palette pearl, teal, and neon green; mood patient; pose holding its accessory up for inspection; shell paper lantern shell; lighting subtle sparkle highlights; background small review tokens.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Frosted Proofling in ClawSweeper.

What is this egg doing here?

Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

openclaw-barnacle Bot added extensions: codex size: S maintainer Maintainer-authored PR labels May 22, 2026

fix codex memory flush tool surface

c4d747f

amknight force-pushed the fix/codex-memory-flush-tools branch from 79dec79 to c4d747f Compare May 22, 2026 04:42

amknight marked this pull request as ready for review May 22, 2026 04:46

clawsweeper Bot added the merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. label May 22, 2026

amknight changed the title ~~[codex] fix codex memory flush tool surface~~ fix codex memory flush tool surface May 22, 2026

kevinslin self-assigned this May 22, 2026

kevinslin approved these changes May 22, 2026

View reviewed changes

amknight merged commit d0a74db into main May 22, 2026
170 checks passed

amknight deleted the fix/codex-memory-flush-tools branch May 22, 2026 06:23

github-actions Bot mentioned this pull request May 22, 2026

📡 Upstream Digest — 2026-05-22 07:37 UTC curtismercier/openclaw-mods#914

Open

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026

fix codex memory flush tool surface (openclaw#85220)

3084cfb

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026

fix codex memory flush tool surface (openclaw#85220)

ebcd75f

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026

fix codex memory flush tool surface (openclaw#85220)

e73e7d5

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026

fix codex memory flush tool surface (openclaw#85220)

981332e

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026

fix codex memory flush tool surface (openclaw#85220)

34ca719

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026

fix codex memory flush tool surface (openclaw#85220)

4b98b56

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026

fix codex memory flush tool surface (openclaw#85220)

f8e72d0

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026

fix codex memory flush tool surface (openclaw#85220)

253999e

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026

fix codex memory flush tool surface (openclaw#85220)

90a5818

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026

fix codex memory flush tool surface (openclaw#85220)

fec4e1c

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026

fix codex memory flush tool surface (openclaw#85220)

18f48e8

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix codex memory flush tool surface#85220

fix codex memory flush tool surface#85220
amknight merged 1 commit into
mainfrom
fix/codex-memory-flush-tools

amknight commented May 22, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 22, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

amknight commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Root cause

Harness behavior mismatch

Why this fixes it

Validation

Known unrelated test drift

Uh oh!

clawsweeper Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 22, 2026

Hatch command

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amknight commented May 22, 2026 •

edited

Loading

clawsweeper Bot commented May 22, 2026 •

edited

Loading