feat(codex): surface pre-turn projection accounting (#80765) by aiZKP · Pull Request #80778 · openclaw/openclaw

aiZKP · 2026-05-11T21:19:41Z

Summary

Codex's context-engine projection previously sized the rendered prompt with the
generic 4 chars/token heuristic and exposed nothing about that estimate
downstream. Status/LCM diagnostics could not separate frontier tokens
selected by the context engine, rendered Codex projection chars/tokens
before send, and provider-observed usage after the turn.

This PR adds a small pre-turn accounting snapshot to the projection and routes
it into agent telemetry:

projectContextEngineAssemblyForCodex now returns a stats block:
- projectedPromptChars — length of the rendered Codex prompt
- promptTokens — tokenizer-backed when supplied, heuristic otherwise
- accounting: "estimated" | "exact" — explicit marker
- capChars — active rendered-context cap (currently 24_000)
- reserveTokens — surfaced when the caller routes the configured
  agents.defaults.compaction.reserveTokens /
  reserveTokensFloor through
An optional tokenize?: (text: string) => number | undefined parameter
lets a future Codex app-server / provider tokenizer flip the marker to
exact without changing call sites. Throwing or non-finite returns fall
back to the heuristic.
run-attempt.ts resolves agents.defaults.compaction.reserveTokens
(falling back to reserveTokensFloor) and emits a new
codex_app_server.context_projection agent event before turn/start
on both the context-engine and mirrored-history projection paths.

Existing behavior (24k char cap, prompt rendering, duplicate trailing-prompt
trim, developer-instruction addition, prePromptMessageCount) is unchanged.

Acceptance criteria

Native Codex projection reports pre-turn exact tokens when a tokenizer
is supplied; otherwise marks accounting as estimated.
Diagnostics can distinguish:
- LCM/frontier tokens selected by the context engine
  (frontierTokens on the emitted event, equal to contextTokenBudget)
- rendered Codex projection chars/tokens before send
  (projectedPromptChars / promptTokens / accounting)
- provider-observed prompt/input tokens after the turn
  (existing afterTurn runtimeContext.lastCallUsage / promptCache)
Tests cover the estimate-vs-exact marker and ensure configured reserve
fields surface through projection stats.

Files touched

File	Lines	Purpose
`extensions/codex/src/app-server/context-engine-projection.ts`	+95	Stats type, tokenizer seam, accounting marker
`extensions/codex/src/app-server/run-attempt.ts`	+43	Reserve resolver, projection event emit
`extensions/codex/src/app-server/context-engine-projection.test.ts`	+79 / -1	5 new tests for stats / marker / reserve

No SDK contract, no public manifest, no docs/changelog surface changed.

Test plan

pnpm test extensions/codex/src/app-server/context-engine-projection.test.ts — 10 passed (5 new + 5 existing)
pnpm test extensions/codex/src/app-server/run-attempt.context-engine.test.ts — 6 passed
pnpm check:changed (extension prod + extension test lanes) — typecheck, oxlint, format, runtime sidecar guard, import-cycle check all green
One unrelated test (run-attempt.test.ts > does not expose OpenClaw Tool Search controls through Codex dynamic tools) times out — verified to fail identically on main without these changes, so it is pre-existing and unrelated to this PR.

Notes for reviewers

The projection cap (MAX_RENDERED_CONTEXT_CHARS = 24_000) is intentionally
unchanged here. Making it budget-aware via contextTokenBudget /
reserveTokens is tracked by fix(codex): scale context engine projection #80761; this PR is the accounting
follow-up only.
The new tokenize parameter is a no-op until a Codex/provider tokenizer
is wired in. The acceptance criterion ("exact when the runtime/tokenizer
surface supports it") is satisfied by the seam plus the explicit
estimated marker; no behavior change for current callers.
The emitted event uses Record<string, unknown> — consumers that already
subscribe to onAgentEvent see a new stream value but the existing
envelope shape is preserved.

Refs: #80765

Adds a `stats` block to the Codex context-engine projection so callers can distinguish LCM/frontier sizing from the rendered Codex prompt and from post-turn provider-observed usage. The block carries `projectedPromptChars`, `promptTokens`, an `accounting: "estimated" | "exact"` marker, the active `capChars`, and (when routed through) the configured compaction `reserveTokens` knob. The projection accepts an optional `tokenize` callback so a provider/runtime tokenizer can flip stats to `exact` when available; without one the existing 4-chars/token heuristic is used and accounting is explicitly marked `estimated`. The Codex app-server run-attempt now resolves `agents.defaults.compaction.reserveTokens` (falling back to `reserveTokensFloor`) and emits a `codex_app_server.context_projection` telemetry event alongside the existing post-turn usage signals. Closes openclaw#80765

clawsweeper · 2026-05-11T21:23:00Z

Codex review: needs real behavior proof before merge. Reviewed May 27, 2026, 12:59 AM ET / 04:59 UTC.

Summary
The PR adds Codex context projection stats plus a codex_app_server.context_projection telemetry event for pre-turn rendered-prompt accounting.

PR surface: Source +138, Tests +78. Total +216 across 3 files.

Reproducibility: yes. from source, but not from a live run: current main returns no projection stats and has no codex_app_server.context_projection event, so the observability gap is source-reproducible.

Review metrics: 1 noteworthy metric.

Mergeability state: dirty at head 304379c. The PR must be rebased before maintainers can evaluate or land the accounting change against current projection code.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦐 gold shrimp
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Rebase onto current main and preserve the existing budget-aware projection helpers instead of the old fixed 24k cap path.
Add real behavior proof from a Codex app-server run showing the emitted codex_app_server.context_projection data with private details redacted.
After updating the PR body, let ClawSweeper re-review automatically or ask a maintainer to comment @clawsweeper re-review if it does not rerun.

Proof guidance:
Needs real behavior proof before merge: The PR provides test output only; it needs redacted terminal output, logs, or another real setup artifact showing the new Codex app-server projection event after the patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

The branch is currently dirty/not mergeable and was written against an older projection shape; resolving that conflict incorrectly could regress or misreport the budget-aware large-window projection path.
No after-fix real behavior proof shows the new telemetry in a real Codex app-server run; tests alone do not satisfy the external PR proof gate.

Maintainer options:

Decide the mitigation before merge
Rebase onto current main, layer stats and the telemetry event onto the existing budget-aware projection/reserve helpers, and add redacted real Codex app-server output showing the emitted event values.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
The next action is contributor/maintainer rebase plus real behavior proof; automation should not attempt repair while the external PR is dirty and proof is missing.

Security
Cleared: The diff only changes Codex plugin telemetry/accounting code and tests; no dependency, workflow, credential, or code-download security concern was found.

Review findings

[P2] Report the active projection cap — extensions/codex/src/app-server/context-engine-projection.ts:122

Review details

Best possible solution:

Rebase onto current main, layer stats and the telemetry event onto the existing budget-aware projection/reserve helpers, and add redacted real Codex app-server output showing the emitted event values.

Do we have a high-confidence way to reproduce the issue?

Yes from source, but not from a live run: current main returns no projection stats and has no codex_app_server.context_projection event, so the observability gap is source-reproducible.

Is this the best way to solve the issue?

No as currently submitted. The useful approach is to add the accounting seam, but it must be rebased onto the current budget-aware projection path and report the actual active cap/reserve values.

Full review comments:

[P2] Report the active projection cap — extensions/codex/src/app-server/context-engine-projection.ts:122
Current main sizes Codex context-engine projection with maxRenderedContextChars from resolveCodexContextEngineProjectionMaxChars(...), but this patch reports capChars from the old fixed MAX_RENDERED_CONTEXT_CHARS constant and its branch does not pass the active cap into projection. For large Codex context windows the new telemetry would say the cap is 24,000 even when the runtime is using a much larger budget-aware cap, so the accounting diagnostic would be wrong. Rebase onto current main and derive capChars from the normalized cap used for rendering, with the existing reserve resolver.
Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 44c1cc8285c8.

Label changes

Label changes:

add P2: This is a normal Codex diagnostics improvement with a real remaining gap, but it is blocked by rebase/proof work rather than an urgent runtime regression.
add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR provides test output only; it needs redacted terminal output, logs, or another real setup artifact showing the new Codex app-server projection event after the patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

P2: This is a normal Codex diagnostics improvement with a real remaining gap, but it is blocked by rebase/proof work rather than an urgent runtime regression.
rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR provides test output only; it needs redacted terminal output, logs, or another real setup artifact showing the new Codex app-server projection event after the patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Evidence reviewed

PR surface:

Source +138, Tests +78. Total +216 across 3 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	2	138	0	+138
Tests	1	79	1	+78
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	3	217	1	+216

What I checked:

Root policy read: Read the full root AGENTS.md and applied its ClawSweeper proof, scoped-policy, and extension-boundary review rules. (AGENTS.md:1, 44c1cc8285c8)
Scoped extension policy read: Read extensions/AGENTS.md; the touched Codex plugin files are under the extension boundary and should keep behavior local to the plugin without adding core SDK surface. (extensions/AGENTS.md:1, 44c1cc8285c8)
Current main has budget-aware projection sizing: Current main normalizes maxRenderedContextChars, exposes resolveCodexContextEngineProjectionMaxChars, and resolves reserve tokens through the shared projection helper. (extensions/codex/src/app-server/context-engine-projection.ts:29, 44c1cc8285c8)
Current main passes the active cap into Codex projection: The active context-engine path passes maxRenderedContextChars: resolveCodexContextEngineProjectionMaxChars(...) with the shared reserve resolver before rendering the Codex prompt. (extensions/codex/src/app-server/run-attempt.ts:1384, 44c1cc8285c8)
Current main lacks the requested telemetry stream: Repository search found no existing codex_app_server.context_projection stream on current main, so the central requested accounting/event is not already implemented. (extensions/codex/src/app-server/run-attempt.ts:1408, 44c1cc8285c8)
PR head reports the stale fixed cap: The PR head builds stats with capChars: MAX_RENDERED_CONTEXT_CHARS, where that branch constant is still 24,000, instead of reporting the normalized active cap used by current main. (extensions/codex/src/app-server/context-engine-projection.ts:122, 304379c78965)

Likely related people:

@jalehman: GitHub commit metadata for the merged budget-aware projection PR shows Josh Lehman commits on the same Codex projection surface that this PR must preserve. (role: related projection-scaling contributor; confidence: high; commits: 33fa4665ffdd, af605c8e6bbd, f7ab8c26b1e7; files: extensions/codex/src/app-server/context-engine-projection.ts, extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/thread-lifecycle.ts)
@vincentkoc: Local blame on current main attributes the active projection sizing and reserve resolver lines to Vincent Koc in the shallow checkout boundary commit. (role: current projection code contributor; confidence: medium; commits: c965b3a1ae61; files: extensions/codex/src/app-server/context-engine-projection.ts)
@steipete: Recent local history shows Peter Steinberger touching the large Codex run-attempt surface adjacent to the projection event path. (role: recent adjacent contributor; confidence: low; commits: d2711c900d7b; files: extensions/codex/src/app-server/run-attempt.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

openclaw-barnacle · 2026-05-26T04:47:13Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

clawsweeper · 2026-05-26T04:52:00Z

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?

The egg game starts only after the PR passes the real-behavior proof check.
Before that, no creature or rarity is rolled. The treat waits for real proof.
This is still just collectible flavor: proof affects review readiness, not creature quality.

clawsweeper · 2026-06-04T11:11:35Z

@aiZKP thanks for the PR. ClawSweeper is still waiting on real behavior proof before this can move forward.

Useful proof can be a screenshot, short video, terminal output, copied live output, linked artifact, or redacted logs that show the changed behavior after the fix. Please redact private tokens, phone numbers, private endpoints, customer data, and anything else sensitive.

Once proof is added to the PR body or a comment, ClawSweeper or a maintainer can re-check it.

clawsweeper · 2026-06-11T11:32:26Z

@aiZKP thanks for the PR. ClawSweeper is still waiting on real behavior proof before this can move forward.

Useful proof can be a screenshot, short video, terminal output, copied live output, linked artifact, or redacted logs that show the changed behavior after the fix. Please redact private tokens, phone numbers, private endpoints, customer data, and anything else sensitive.

Once proof is added to the PR body or a comment, ClawSweeper or a maintainer can re-check it.

openclaw-barnacle Bot added extensions: codex size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 11, 2026

clawsweeper Bot mentioned this pull request May 15, 2026

Codex context-engine projection lacks exact pre-turn token accounting #80765

Open

openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 26, 2026

clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label May 26, 2026

openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(codex): surface pre-turn projection accounting (#80765)#80778

feat(codex): surface pre-turn projection accounting (#80765)#80778
aiZKP wants to merge 1 commit into
openclaw:mainfrom
aiZKP:fix/codex-projection-accounting-80765

aiZKP commented May 11, 2026

Uh oh!

clawsweeper Bot commented May 11, 2026 •

edited

Loading

Uh oh!

openclaw-barnacle Bot commented May 26, 2026

Uh oh!

clawsweeper Bot commented May 26, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Jun 4, 2026

Uh oh!

clawsweeper Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

aiZKP commented May 11, 2026

Summary

Acceptance criteria

Files touched

Test plan

Notes for reviewers

Uh oh!

clawsweeper Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openclaw-barnacle Bot commented May 26, 2026

Uh oh!

clawsweeper Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented Jun 4, 2026

Uh oh!

clawsweeper Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clawsweeper Bot commented May 11, 2026 •

edited

Loading

clawsweeper Bot commented May 26, 2026 •

edited

Loading