Skip to content

fix codex memory flush tool surface#85220

Merged
amknight merged 1 commit into
mainfrom
fix/codex-memory-flush-tools
May 22, 2026
Merged

fix codex memory flush tool surface#85220
amknight merged 1 commit into
mainfrom
fix/codex-memory-flush-tools

Conversation

@amknight

@amknight amknight commented May 22, 2026

Copy link
Copy Markdown
Member

What changed

  • Pre-create the canonical memory flush target file before the maintenance agent run starts.
  • Disable the Codex app-server native Code Mode tool surface for memory-triggered flush runs.
  • Keep Codex memory flush dynamic tools narrowed to OpenClaw-managed read and append-only write.
  • Prevent sandbox shell dynamic tools from being added back during memory flush runs.
  • Add regression coverage for both the core memory runner and the Codex dynamic-tool surface.

Root cause

The Telegram failure was directly caused by a missing canonical memory file.

The session-memory hook can write timestamped captures such as memory/YYYY-MM-DD-HHMM.md, while the pre-compaction memory flush plan targets the canonical daily file memory/YYYY-MM-DD.md. On a first flush for the day, that canonical file may not exist yet. In the observed session, the maintenance turn ran a command equivalent to tail -80 memory/YYYY-MM-DD.md; because the canonical file did not exist, the shell command failed with No such file or directory and the failure surfaced in Telegram.

The available Codex tools were not the direct cause of that filesystem error. They were the reason the Codex memory flush was able to take a shell-based path at all.

Harness behavior mismatch

OpenClaw already treats memory-triggered flush runs as a constrained maintenance mode in the shared/Pi harness path. For those runs, the shared tool factory exposes only:

  • read
  • write, wrapped so it can append only to the planned canonical memory file

That prevents a memory flush from using general shell or arbitrary file-edit tools. It also means the model is guided into the intended operation: inspect relevant context, then append durable memory to the one target file chosen by the flush plan.

Codex memory-triggered runs were still getting the app-server native Code Mode shell/file surface. That bypassed the same restriction the Pi harness path already gets, so Codex could use bash/native file tools instead of the guarded append-only write tool. This PR makes Codex memory flush behavior match the Pi/shared harness behavior.

Why this fixes it

This fix addresses both parts of the failure chain:

  1. The canonical memory/YYYY-MM-DD.md target is created before the flush run, so first flushes of the day have a valid target path.
  2. Codex memory flushes no longer get native shell/file Code Mode access; they use OpenClaw's restricted read plus append-only write surface, matching the Pi harness behavior.

Pre-creating the file fixes the observed No such file or directory failure. Restricting the Codex tool surface fixes the harness-contract bug that let a memory flush use general shell/file tools instead of the intended append-only memory path.

Validation

  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-memory.test.ts -t "runs a memory flush turn"
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts -t "limits Codex memory flush|points yielded sandbox_exec"
  • pnpm tsgo:test
  • pnpm exec oxlint extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts src/auto-reply/reply/agent-runner-memory.ts src/auto-reply/reply/agent-runner-memory.test.ts
  • git diff --check

Known unrelated test drift

A full run of extensions/codex/src/app-server/run-attempt.test.ts currently fails two image-cleanup tests because current behavior emits an extra thread/unsubscribe request while the tests expect only thread/start|resume and turn/start. Those failures reproduce when selected by themselves and do not touch the memory flush path.

@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Latest ClawSweeper review: 2026-05-22 06:02 UTC / May 22, 2026, 2:02 AM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR pre-creates the canonical memory flush target file and restricts Codex memory-triggered runs to managed read/write tools with regression coverage.

Reproducibility: yes. from source inspection rather than an executed repro. Current main forwards memoryFlushWritePath into tool construction but still lets the Codex app-server path reach native/sandbox shell tooling while the shared Pi contract restricts memory runs to read plus append-only write.

PR rating
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🐚 platinum hermit
Summary: Good normal bug-fix PR with focused tests and no blocking findings; merge readiness depends on maintainer acceptance of the compatibility change.

Rank-up moves:

  • Record maintainer acceptance of the read/write-only Codex memory-flush contract before merge.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Not applicable: The external contributor proof gate does not apply because this is a MEMBER PR with the protected maintainer label; the PR body lists targeted regression, type, lint, and diff checks.

Risk before merge

  • Merging intentionally removes Codex native shell/file Code Mode and sandbox_exec/sandbox_process from memory-triggered runs, so custom memory-flush prompts that relied on that unintended shell surface will fail closed under the read/write-only contract.
  • The PR body reports unrelated drift in the full Codex app-server test file, so final merge confidence should come from the focused regression checks plus required CI until that separate drift is resolved.

Maintainer options:

  1. Accept the restricted memory-flush contract (recommended)
    Land the PR with the explicit decision that Codex memory flushes no longer expose native shell/file Code Mode or sandbox_exec and instead match the shared read/write-only memory contract.
  2. Pause for a compatibility exception
    If maintainers want custom Codex memory-flush prompts to retain shell access, pause this PR and define that exception or migration path before changing the tool surface.

Next step before merge
Protected maintainer-labeled PR needs human acceptance of the intentional compatibility change; there is no narrow automated repair to queue.

Security
Cleared: The diff narrows memory-flush tool access, validates the target as a relative workspace path before pre-creating it, and adds no dependency, workflow, secret, or supply-chain surface.

Review details

Best possible solution:

Land the targeted Codex/memory fix after a maintainer explicitly accepts the read/write-only memory-flush contract and required checks pass.

Do we have a high-confidence way to reproduce the issue?

Yes, from source inspection rather than an executed repro. Current main forwards memoryFlushWritePath into tool construction but still lets the Codex app-server path reach native/sandbox shell tooling while the shared Pi contract restricts memory runs to read plus append-only write.

Is this the best way to solve the issue?

Yes. The branch fixes both source-backed causes by pre-creating the canonical file before the maintenance run and aligning Codex dynamic/native/sandbox tool selection with the existing memory-flush contract, subject to maintainer acceptance of the compatibility change.

Label justifications:

  • P1: The PR addresses an observed Codex memory flush failure during compaction that surfaced through a real channel workflow.
  • merge-risk: 🚨 compatibility: The diff deliberately removes native shell/file and sandbox shell tools from Codex memory-triggered runs, which can break existing custom prompts that relied on that surface.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🌊 off-meta tidepool, patch quality is 🐚 platinum hermit, and Good normal bug-fix PR with focused tests and no blocking findings; merge readiness depends on maintainer acceptance of the compatibility change.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external contributor proof gate does not apply because this is a MEMBER PR with the protected maintainer label; the PR body lists targeted regression, type, lint, and diff checks.

What I checked:

  • Current main Codex tool surface: Current main builds Codex dynamic tools, filters app-server-owned tools, then can re-add sandbox_exec/sandbox_process whenever sandboxing is active and the native surface is disabled; the current code has no memory-trigger special case. (extensions/codex/src/app-server/run-attempt.ts:3603, 03125c8e132d)
  • Shared memory flush contract: The shared OpenClaw tool factory already treats trigger=memory plus memoryFlushWritePath as a constrained run, requiring the write path and exposing only read plus append-only write before later policy filters. (src/agents/pi-tools.ts:475, 03125c8e132d)
  • Canonical memory target: The memory-core flush plan directs writes to the daily canonical memory/YYYY-MM-DD.md path and tells the agent to create memory/ if needed and append rather than overwrite. (extensions/memory-core/src/flush-plan.ts:13, 03125c8e132d)
  • PR implementation: The PR adds ensureMemoryFlushTargetFile before the maintenance run, disables Codex native tool surface for memory flushes, filters memory flush dynamic tools to read/write, and blocks sandbox_exec from being re-added for memory flushes. (extensions/codex/src/app-server/run-attempt.ts:3581, c4d747f6d816)
  • Target pre-create implementation: The PR validates the memory flush relative path lexically under the workspace, creates the parent directory, opens the target with append mode, and calls that before runWithModelFallback starts the flush turn. (src/auto-reply/reply/agent-runner-memory.ts:89, c4d747f6d816)
  • Regression coverage: The PR adds tests that assert the target file is ensured before the embedded agent runs and that Codex memory flush dynamic tools resolve to only read/write with native surface disabled. (extensions/codex/src/app-server/run-attempt.test.ts:763, c4d747f6d816)

Likely related people:

  • @TurboTheTurtle: Blame shows c2004fe owns the current memory flush runner handoff and much of the Codex app-server dynamic tool build path that this PR adjusts. (role: recent area contributor; confidence: high; commits: c2004fe6622d; files: src/auto-reply/reply/agent-runner-memory.ts, extensions/codex/src/app-server/run-attempt.ts)
  • @joshavant: Commit ba06376 recently changed the Codex native/sandbox tool-surface checks that control whether sandbox_exec is exposed. (role: recent sandbox/tool-surface contributor; confidence: high; commits: ba06376c7955; files: extensions/codex/src/app-server/run-attempt.ts)
  • @frankekn: History shows prior memory-flush contract work forwarding memoryFlushWritePath and protecting bootstrap files during memory flush runs. (role: memory flush contract contributor; confidence: medium; commits: 8306eabf85ea, 96e4975922de; files: src/agents/pi-tools.ts, src/auto-reply/reply/agent-runner-memory.ts, src/agents/pi-embedded-runner/run.ts)
  • @amknight: The author also has recent merged current-main work in the same Codex app-server run-attempt surface, beyond this proposed branch. (role: recent adjacent contributor; confidence: medium; commits: cc4e30b3d9c2; files: extensions/codex/src/app-server/run-attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 03125c8e132d.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Frosted Proofling

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: keeps receipts.
Image traits: location merge queue dock; accessory miniature diff map; palette pearl, teal, and neon green; mood patient; pose holding its accessory up for inspection; shell paper lantern shell; lighting subtle sparkle highlights; background small review tokens.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Frosted Proofling in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@amknight amknight force-pushed the fix/codex-memory-flush-tools branch from 79dec79 to c4d747f Compare May 22, 2026 04:42
@amknight amknight marked this pull request as ready for review May 22, 2026 04:46
@clawsweeper clawsweeper Bot added the merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. label May 22, 2026
@amknight amknight changed the title [codex] fix codex memory flush tool surface fix codex memory flush tool surface May 22, 2026
@kevinslin kevinslin self-assigned this May 22, 2026
@amknight amknight merged commit d0a74db into main May 22, 2026
170 checks passed
@amknight amknight deleted the fix/codex-memory-flush-tools branch May 22, 2026 06:23
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex maintainer Maintainer-authored PR merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants