Skip to content

[codex] fix Codex budget native compaction trigger#87158

Draft
Steady-ai wants to merge 1 commit into
openclaw:mainfrom
Steady-ai:codex/codex-budget-compaction-trigger
Draft

[codex] fix Codex budget native compaction trigger#87158
Steady-ai wants to merge 1 commit into
openclaw:mainfrom
Steady-ai:codex/codex-budget-compaction-trigger

Conversation

@Steady-ai

@Steady-ai Steady-ai commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Start Codex native app-server compaction for the budget preflight trigger instead of treating it like an unrelated automatic trigger.
  • Keep omitted and overflow triggers on the existing Codex-owned skip path.
  • Document that agents.defaults.compaction.maxActiveTranscriptBytes starts native Codex compaction for Codex harness sessions.

Duplicate check

I checked current origin/main plus open PRs/issues before opening this PR. Current main still skipped trigger: "budget" in extensions/codex/src/app-server/compact.ts and had a test pinning that skip in compact.test.ts. I did not find an open PR from this fork for this compaction trigger; the only open Steady-ai PR was #86716 for Discord delivery accounting. I also inspected nearby compaction/Codex PRs including #81916, #63636, #54585, #86160, #83229, #82856, #86094, and #73092; none made budget start native Codex compaction.

Real behavior proof

Behavior addressed: Codex harness sessions with agents.defaults.compaction.maxActiveTranscriptBytes call compaction with trigger: "budget". Current source skipped that trigger before sending thread/compact/start, so the local transcript-byte guard could leave long-lived channel mirrors above the configured cleanup threshold.

Real environment tested: OpenClaw source worktree rebased on origin/main at d88681662b, PR head 47f33748d4, on WSL Ubuntu 24.04 / Node 22.22.0. The source-runtime probe used the real maybeCompactCodexAppServerSession entry point and real Codex session-binding file I/O, with the app-server client boundary instrumented only to capture the outgoing app-server request.

Exact steps or command run after this patch: Ran a node --import tsx - source-runtime probe from the PR worktree. The probe wrote a temporary Codex app-server binding, invoked maybeCompactCodexAppServerSession once with trigger: "budget" and once with trigger: "overflow", then printed the request/result shape. I then ran the focused checks listed in Verification.

Evidence after fix: Copied terminal output from the source-runtime probe:

[agent/embedded] started codex app-server compaction
[agent/embedded] skipping codex app-server compaction for non-manual trigger
{
  "head": "47f33748d4",
  "behavior": "codex budget compaction trigger starts native app-server compaction",
  "budget": {
    "request": {
      "method": "thread/compact/start",
      "params": {
        "threadId": "thread-proof-budget"
      }
    },
    "ok": true,
    "compacted": false,
    "reason": null,
    "signal": "thread/compact/start",
    "pending": true,
    "tokensBefore": 456
  },
  "overflow": {
    "additionalRequests": 0,
    "ok": true,
    "compacted": false,
    "reason": "codex app-server owns automatic compaction",
    "skippedReason": "non_manual_trigger",
    "trigger": "overflow"
  }
}

Observed result after fix: Budget preflight compaction now sends thread/compact/start and returns the existing pending native-compaction result shape. The overflow trigger sends no additional app-server request and still returns the existing non_manual_trigger skip result.

What was not tested: I did not run a live Discord or Telegram gateway replay from this source branch. The live failure was observed in an installed runtime; this PR ports the source-level behavior and locks it with focused Codex extension tests.

Risk

Low. The change is limited to the Codex app-server compaction trigger gate. It does not wait for native compaction completion, add credentials, change network targets, or replace Codex compaction with OpenClaw summarization.

Verification

  • OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/compact.test.ts -> 1 file / 19 tests passed
  • node_modules/.bin/oxfmt --check extensions/codex/src/app-server/compact.ts extensions/codex/src/app-server/compact.test.ts docs/plugins/codex-harness-runtime.md docs/reference/session-management-compaction.md -> passed
  • git diff --check -> passed
  • pnpm docs:list -> passed
  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.extensions.json extensions/codex/src/app-server/compact.ts extensions/codex/src/app-server/compact.test.ts -> passed
  • node scripts/check-docs-mdx.mjs docs/plugins/codex-harness-runtime.md docs/reference/session-management-compaction.md -> passed

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation extensions: codex size: XS labels May 27, 2026
@openclaw-barnacle openclaw-barnacle Bot added the triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. label May 27, 2026
@Steady-ai Steady-ai force-pushed the codex/codex-budget-compaction-trigger branch from 4a4bb8f to 47f3374 Compare May 27, 2026 04:35
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 1:17 AM ET / 05:17 UTC.

Summary
The PR changes Codex app-server compaction so trigger: "budget" sends thread/compact/start, keeps omitted and overflow triggers on the skip path, and documents the intended config behavior.

PR surface: Source +5, Tests +29, Docs +10. Total +44 across 4 files.

Reproducibility: yes. source inspection gives a high-confidence reproduction path: current channel preflight returns before the Codex byte-budget path, and current CLI budget compaction throws on the helper's pending native result shape. I did not execute tests because this review was read-only.

Review metrics: 1 noteworthy metric.

  • Existing config behavior documented: 1 key behavior expanded. The PR changes what docs promise for agents.defaults.compaction.maxActiveTranscriptBytes, so runtime parity matters before merge.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🧂 unranked krab
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Fix or remove the documented maxActiveTranscriptBytes Codex behavior claim by wiring the actual caller path or narrowing the docs.
  • [P1] Add caller handling for pending/native-owned budget compaction or coordinate with fix(codex): recover context overflow and budget skip paths #87879 if maintainers prefer the skip contract.
  • Provide redacted real runtime proof through the CLI/channel budget compaction path after the caller behavior is fixed.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The PR includes terminal output from a source-runtime probe, but the app-server client boundary is instrumented and the real caller path is not exercised, so contributor proof should be updated with redacted installed/runtime output after the caller fix. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] Merging this PR as-is can start native Codex compaction from the helper while the CLI lifecycle still treats the pending { ok: true, compacted: false } result as a failed turn, leaving native thread state changed while OpenClaw reports failure.
  • [P1] The new docs promise agents.defaults.compaction.maxActiveTranscriptBytes starts native Codex compaction, but current channel preflight returns before that budget check for Codex sessions.

Maintainer options:

  1. Wire the full budget lifecycle before merge (recommended)
    Update the caller paths so the documented budget trigger is actually reached and pending native thread/compact/start results are handled without failing the turn, with focused regression and real runtime proof.
  2. Prefer the skip contract instead
    If maintainers want Codex app-server to own automatic budget compaction, pause or close this PR and use the open skip-handling path in fix(codex): recover context overflow and budget skip paths #87879 as the canonical fix.

Next step before merge

  • [P1] Maintainers should choose the desired Codex budget-compaction contract and coordinate this draft with the open skip-handling PR before any automated repair work.

Security
Cleared: The diff does not change dependencies, credentials, permissions, workflows, package resolution, or new external code execution paths.

Review findings

  • [P1] Wire the documented budget path through real callers — docs/plugins/codex-harness-runtime.md:217-221
  • [P1] Handle pending native compaction before enabling budget — extensions/codex/src/app-server/compact.ts:134
Review details

Best possible solution:

Decide the Codex budget contract first, then either wire and prove the full caller lifecycle for budget-started native compaction or keep the native-owned skip path and land the narrower caller fix in the canonical related PR.

Do we have a high-confidence way to reproduce the issue?

Yes, source inspection gives a high-confidence reproduction path: current channel preflight returns before the Codex byte-budget path, and current CLI budget compaction throws on the helper's pending native result shape. I did not execute tests because this review was read-only.

Is this the best way to solve the issue?

No. The helper change is too narrow unless the real caller paths also accept or exercise budget-started native compaction; the safer alternative is to coordinate with the open skip-handling PR if maintainers prefer Codex-owned automatic compaction.

Full review comments:

  • [P1] Wire the documented budget path through real callers — docs/plugins/codex-harness-runtime.md:217-221
    The docs now say agents.defaults.compaction.maxActiveTranscriptBytes starts Codex native compaction, but the current channel preflight returns at the Codex runtime guard before it reads maxActiveTranscriptBytes or calls compactEmbeddedAgentSession with trigger: "budget". This PR only changes the harness helper, so the documented channel transcript cleanup path still never runs for Codex sessions.
    Confidence: 0.9
  • [P1] Handle pending native compaction before enabling budget — extensions/codex/src/app-server/compact.ts:134
    Allowing trigger: "budget" through this helper makes it send thread/compact/start and then return the existing pending shape (ok: true, compacted: false). The current CLI budget lifecycle sends trigger: "budget" and throws for non-compacted native outcomes without fallback, so an over-budget Codex CLI turn can still fail immediately after starting native compaction unless the caller learns to treat this pending native result as non-fatal.
    Confidence: 0.87

Overall correctness: patch is incorrect
Overall confidence: 0.88

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 309fdd95dad5.

Label changes

Label justifications:

  • P2: This is a focused Codex compaction bugfix with limited blast radius, but it needs caller-path fixes before merge.
  • merge-risk: 🚨 session-state: The patch can start native Codex compaction without the OpenClaw caller accepting the pending result, creating session/native-thread state drift around failed budget compaction turns.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦪 silver shellfish and patch quality is 🧂 unranked krab.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR includes terminal output from a source-runtime probe, but the app-server client boundary is instrumented and the real caller path is not exercised, so contributor proof should be updated with redacted installed/runtime output after the caller fix. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +5, Tests +29, Docs +10. Total +44 across 4 files.

View PR surface stats
Area Files Added Removed Net
Source 1 6 1 +5
Tests 1 32 3 +29
Docs 2 11 1 +10
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 4 49 5 +44

What I checked:

  • Root AGENTS policy read: The root policy and scoped docs/extensions guides were read; the relevant rules require whole-surface review for runtime/session paths and docs/runtime alignment for user-visible config behavior. (AGENTS.md:1, 309fdd95dad5)
  • PR diff changes only the helper and docs: The patch allows budget through compactCodexNativeThread, adds logging, changes one helper test, and documents maxActiveTranscriptBytes as starting Codex native compaction. (extensions/codex/src/app-server/compact.ts:127, 47f33748d4b6)
  • Current channel preflight skips Codex before byte-budget checks: Current main returns early for Codex runtime sessions before resolving maxActiveTranscriptBytes and before the later compactEmbeddedAgentSession({ trigger: "budget" }) call, so the documented channel transcript-byte trigger is not wired by this PR. (src/auto-reply/reply/agent-runner-memory.ts:706, 309fdd95dad5)
  • Current CLI caller treats pending native compaction as failure: Current main sends trigger: "budget" into the native harness, but compactNativeHarnessCliTranscript returns failure for any !result?.compacted outcome unless it is an unsupported or recoverable-binding fallback; runCliTurnCompactionLifecycle then throws when fallback is false. (src/agents/command/cli-compaction.ts:341, 309fdd95dad5)
  • Helper pending result remains compacted: false: Current helper success for thread/compact/start returns { ok: true, compacted: false } with details.pending: true; the PR's budget path would now feed that same pending shape to callers that expect actual compaction or an explicit skip. (extensions/codex/src/app-server/compact.ts:187, 309fdd95dad5)
  • Related open path exists for the CLI skip behavior: Live GitHub search found fix(codex): recover context overflow and budget skip paths #87879, which is open and separately handles the CLI budget skip by treating codex app-server owns automatic compaction as non-fatal rather than starting a native compact request. (fff0954d0836)

Likely related people:

  • Steady-ai: GitHub commit history for extensions/codex/src/app-server/compact.ts shows Steady-ai authored the prior merged fix(codex): avoid native compaction on budget triggers (#86772), and this PR edits that same trigger gate. (role: introduced current skip behavior and current proposer; confidence: high; commits: eb8f9b46da39, 47f33748d4b6; files: extensions/codex/src/app-server/compact.ts, extensions/codex/src/app-server/compact.test.ts)
  • pfrederiksen: Recent live history for src/agents/command/cli-compaction.ts shows pfrederiksen authored the merged fix(codex): recover raw missing-thread compaction failures (#87738), adjacent to the caller that currently rejects pending native compaction results. (role: recent adjacent owner; confidence: medium; commits: e69855e68c04; files: src/agents/command/cli-compaction.ts)
  • joshavant: GitHub history for the Codex helper shows joshavant authored fix(codex): surface native compaction failures (#85160), which shaped how this helper reports native compaction failures and pending outcomes. (role: adjacent feature owner; confidence: medium; commits: b8e9ab9385c0; files: extensions/codex/src/app-server/compact.ts)
  • steipete: The current files were heavily carried through the refactor: internalize OpenClaw agent runtime (#85341) merge, which touched the agent runtime and Codex compaction surface around this review area. (role: recent major refactor owner; confidence: medium; commits: bb46b79d3c14; files: extensions/codex/src/app-server/compact.ts, src/agents/command/cli-compaction.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Cosmic Shellbean

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: stacks clean commits.
Image traits: location status garden; accessory CI status badge; palette moonlit blue and soft silver; mood mischievous; pose waving from a small platform; shell soft speckled shell; lighting subtle sparkle highlights; background subtle branch markers.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Cosmic Shellbean in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 29, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation extensions: codex merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: XS status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants