Skip to content

feat(codex): surface pre-turn projection accounting (#80765)#80778

Open
aiZKP wants to merge 1 commit into
openclaw:mainfrom
aiZKP:fix/codex-projection-accounting-80765
Open

feat(codex): surface pre-turn projection accounting (#80765)#80778
aiZKP wants to merge 1 commit into
openclaw:mainfrom
aiZKP:fix/codex-projection-accounting-80765

Conversation

@aiZKP

@aiZKP aiZKP commented May 11, 2026

Copy link
Copy Markdown

Summary

Closes #80765.

Codex's context-engine projection previously sized the rendered prompt with the
generic 4 chars/token heuristic and exposed nothing about that estimate
downstream. Status/LCM diagnostics could not separate frontier tokens
selected by the context engine, rendered Codex projection chars/tokens
before send, and provider-observed usage after the turn.

This PR adds a small pre-turn accounting snapshot to the projection and routes
it into agent telemetry:

  • projectContextEngineAssemblyForCodex now returns a stats block:
    • projectedPromptChars — length of the rendered Codex prompt
    • promptTokens — tokenizer-backed when supplied, heuristic otherwise
    • accounting: "estimated" | "exact" — explicit marker
    • capChars — active rendered-context cap (currently 24_000)
    • reserveTokens — surfaced when the caller routes the configured
      agents.defaults.compaction.reserveTokens /
      reserveTokensFloor through
  • An optional tokenize?: (text: string) => number | undefined parameter
    lets a future Codex app-server / provider tokenizer flip the marker to
    exact without changing call sites. Throwing or non-finite returns fall
    back to the heuristic.
  • run-attempt.ts resolves agents.defaults.compaction.reserveTokens
    (falling back to reserveTokensFloor) and emits a new
    codex_app_server.context_projection agent event before turn/start
    on both the context-engine and mirrored-history projection paths.

Existing behavior (24k char cap, prompt rendering, duplicate trailing-prompt
trim, developer-instruction addition, prePromptMessageCount) is unchanged.

Acceptance criteria

  • Native Codex projection reports pre-turn exact tokens when a tokenizer
    is supplied; otherwise marks accounting as estimated.
  • Diagnostics can distinguish:
    • LCM/frontier tokens selected by the context engine
      (frontierTokens on the emitted event, equal to contextTokenBudget)
    • rendered Codex projection chars/tokens before send
      (projectedPromptChars / promptTokens / accounting)
    • provider-observed prompt/input tokens after the turn
      (existing afterTurn runtimeContext.lastCallUsage / promptCache)
  • Tests cover the estimate-vs-exact marker and ensure configured reserve
    fields surface through projection stats.

Files touched

File Lines Purpose
extensions/codex/src/app-server/context-engine-projection.ts +95 Stats type, tokenizer seam, accounting marker
extensions/codex/src/app-server/run-attempt.ts +43 Reserve resolver, projection event emit
extensions/codex/src/app-server/context-engine-projection.test.ts +79 / -1 5 new tests for stats / marker / reserve

No SDK contract, no public manifest, no docs/changelog surface changed.

Test plan

  • pnpm test extensions/codex/src/app-server/context-engine-projection.test.ts10 passed (5 new + 5 existing)
  • pnpm test extensions/codex/src/app-server/run-attempt.context-engine.test.ts6 passed
  • pnpm check:changed (extension prod + extension test lanes) — typecheck, oxlint, format, runtime sidecar guard, import-cycle check all green
  • One unrelated test (run-attempt.test.ts > does not expose OpenClaw Tool Search controls through Codex dynamic tools) times out — verified to fail identically on main without these changes, so it is pre-existing and unrelated to this PR.

Notes for reviewers

  • The projection cap (MAX_RENDERED_CONTEXT_CHARS = 24_000) is intentionally
    unchanged here. Making it budget-aware via contextTokenBudget /
    reserveTokens is tracked by fix(codex): scale context engine projection #80761; this PR is the accounting
    follow-up only.
  • The new tokenize parameter is a no-op until a Codex/provider tokenizer
    is wired in. The acceptance criterion ("exact when the runtime/tokenizer
    surface supports it") is satisfied by the seam plus the explicit
    estimated marker; no behavior change for current callers.
  • The emitted event uses Record<string, unknown> — consumers that already
    subscribe to onAgentEvent see a new stream value but the existing
    envelope shape is preserved.

Refs: #80765

Adds a `stats` block to the Codex context-engine projection so callers can
distinguish LCM/frontier sizing from the rendered Codex prompt and from
post-turn provider-observed usage. The block carries `projectedPromptChars`,
`promptTokens`, an `accounting: "estimated" | "exact"` marker, the active
`capChars`, and (when routed through) the configured compaction
`reserveTokens` knob.

The projection accepts an optional `tokenize` callback so a provider/runtime
tokenizer can flip stats to `exact` when available; without one the existing
4-chars/token heuristic is used and accounting is explicitly marked
`estimated`. The Codex app-server run-attempt now resolves
`agents.defaults.compaction.reserveTokens` (falling back to
`reserveTokensFloor`) and emits a `codex_app_server.context_projection`
telemetry event alongside the existing post-turn usage signals.

Closes openclaw#80765
@openclaw-barnacle openclaw-barnacle Bot added extensions: codex size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 11, 2026
@clawsweeper

clawsweeper Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 27, 2026, 12:59 AM ET / 04:59 UTC.

Summary
The PR adds Codex context projection stats plus a codex_app_server.context_projection telemetry event for pre-turn rendered-prompt accounting.

PR surface: Source +138, Tests +78. Total +216 across 3 files.

Reproducibility: yes. from source, but not from a live run: current main returns no projection stats and has no codex_app_server.context_projection event, so the observability gap is source-reproducible.

Review metrics: 1 noteworthy metric.

  • Mergeability state: dirty at head 304379c. The PR must be rebased before maintainers can evaluate or land the accounting change against current projection code.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦐 gold shrimp
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Rebase onto current main and preserve the existing budget-aware projection helpers instead of the old fixed 24k cap path.
  • Add real behavior proof from a Codex app-server run showing the emitted codex_app_server.context_projection data with private details redacted.
  • After updating the PR body, let ClawSweeper re-review automatically or ask a maintainer to comment @clawsweeper re-review if it does not rerun.

Proof guidance:
Needs real behavior proof before merge: The PR provides test output only; it needs redacted terminal output, logs, or another real setup artifact showing the new Codex app-server projection event after the patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • The branch is currently dirty/not mergeable and was written against an older projection shape; resolving that conflict incorrectly could regress or misreport the budget-aware large-window projection path.
  • No after-fix real behavior proof shows the new telemetry in a real Codex app-server run; tests alone do not satisfy the external PR proof gate.

Maintainer options:

  1. Decide the mitigation before merge
    Rebase onto current main, layer stats and the telemetry event onto the existing budget-aware projection/reserve helpers, and add redacted real Codex app-server output showing the emitted event values.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
The next action is contributor/maintainer rebase plus real behavior proof; automation should not attempt repair while the external PR is dirty and proof is missing.

Security
Cleared: The diff only changes Codex plugin telemetry/accounting code and tests; no dependency, workflow, credential, or code-download security concern was found.

Review findings

  • [P2] Report the active projection cap — extensions/codex/src/app-server/context-engine-projection.ts:122
Review details

Best possible solution:

Rebase onto current main, layer stats and the telemetry event onto the existing budget-aware projection/reserve helpers, and add redacted real Codex app-server output showing the emitted event values.

Do we have a high-confidence way to reproduce the issue?

Yes from source, but not from a live run: current main returns no projection stats and has no codex_app_server.context_projection event, so the observability gap is source-reproducible.

Is this the best way to solve the issue?

No as currently submitted. The useful approach is to add the accounting seam, but it must be rebased onto the current budget-aware projection path and report the actual active cap/reserve values.

Full review comments:

  • [P2] Report the active projection cap — extensions/codex/src/app-server/context-engine-projection.ts:122
    Current main sizes Codex context-engine projection with maxRenderedContextChars from resolveCodexContextEngineProjectionMaxChars(...), but this patch reports capChars from the old fixed MAX_RENDERED_CONTEXT_CHARS constant and its branch does not pass the active cap into projection. For large Codex context windows the new telemetry would say the cap is 24,000 even when the runtime is using a much larger budget-aware cap, so the accounting diagnostic would be wrong. Rebase onto current main and derive capChars from the normalized cap used for rendering, with the existing reserve resolver.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 44c1cc8285c8.

Label changes

Label changes:

  • add P2: This is a normal Codex diagnostics improvement with a real remaining gap, but it is blocked by rebase/proof work rather than an urgent runtime regression.
  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR provides test output only; it needs redacted terminal output, logs, or another real setup artifact showing the new Codex app-server projection event after the patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal Codex diagnostics improvement with a real remaining gap, but it is blocked by rebase/proof work rather than an urgent runtime regression.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR provides test output only; it needs redacted terminal output, logs, or another real setup artifact showing the new Codex app-server projection event after the patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +138, Tests +78. Total +216 across 3 files.

View PR surface stats
Area Files Added Removed Net
Source 2 138 0 +138
Tests 1 79 1 +78
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 3 217 1 +216

What I checked:

Likely related people:

  • @jalehman: GitHub commit metadata for the merged budget-aware projection PR shows Josh Lehman commits on the same Codex projection surface that this PR must preserve. (role: related projection-scaling contributor; confidence: high; commits: 33fa4665ffdd, af605c8e6bbd, f7ab8c26b1e7; files: extensions/codex/src/app-server/context-engine-projection.ts, extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/thread-lifecycle.ts)
  • @vincentkoc: Local blame on current main attributes the active projection sizing and reserve resolver lines to Vincent Koc in the shallow checkout boundary commit. (role: current projection code contributor; confidence: medium; commits: c965b3a1ae61; files: extensions/codex/src/app-server/context-engine-projection.ts)
  • @steipete: Recent local history shows Peter Steinberger touching the large Codex run-attempt surface adjacent to the projection event path. (role: recent adjacent contributor; confidence: low; commits: d2711c900d7b; files: extensions/codex/src/app-server/run-attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 26, 2026
@clawsweeper clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label May 27, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@aiZKP thanks for the PR. ClawSweeper is still waiting on real behavior proof before this can move forward.

Useful proof can be a screenshot, short video, terminal output, copied live output, linked artifact, or redacted logs that show the changed behavior after the fix. Please redact private tokens, phone numbers, private endpoints, customer data, and anything else sensitive.

Once proof is added to the PR body or a comment, ClawSweeper or a maintainer can re-check it.

@clawsweeper

clawsweeper Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@aiZKP thanks for the PR. ClawSweeper is still waiting on real behavior proof before this can move forward.

Useful proof can be a screenshot, short video, terminal output, copied live output, linked artifact, or redacted logs that show the changed behavior after the fix. Please redact private tokens, phone numbers, private endpoints, customer data, and anything else sensitive.

Once proof is added to the PR body or a comment, ClawSweeper or a maintainer can re-check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex context-engine projection lacks exact pre-turn token accounting

1 participant