Skip to content

fix: [codex-harness] avoid treating cumulative app-server usage as current context#64669

Merged
steipete merged 1 commit into
openclaw:mainfrom
cyrusaf:codex/app-server-token-usage-projection
Apr 18, 2026
Merged

fix: [codex-harness] avoid treating cumulative app-server usage as current context#64669
steipete merged 1 commit into
openclaw:mainfrom
cyrusaf:codex/app-server-token-usage-projection

Conversation

@cyrusaf

@cyrusaf cyrusaf commented Apr 11, 2026

Copy link
Copy Markdown

Summary

  • Problem: Codex app-server token usage projection used cumulative tokenUsage.total as assistant/attempt usage.
  • Why it matters: downstream session accounting treats assistant/attempt usage as fresh current context-window usage, so cumulative Codex totals can inflate totalTokens and make /status show values like 999%.
  • What changed: Codex app-server projection now uses explicit current/last usage fields when available and normalizes both camelCase app-server fields and snake_case Codex-native usage fields.
  • What did NOT change (scope boundary): this does not change generic session/status accounting, native compaction behavior, transcript fallback, billing aggregation, reset behavior, or Codex app-server startup.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the Codex app-server event projector treated tokenUsage.total as if it were the current turn/current call usage snapshot.
  • Missing detection / guardrail: existing projector tests only covered a tokenUsage.total fixture and did not assert that cumulative totals must not populate assistant/attempt usage.
  • Contributing context: OpenClaw downstream accounting intentionally uses assistant usage, attemptUsage, lastCallUsage, and promptTokens as current context-window snapshots, not cumulative thread/billing totals.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/codex/src/app-server/event-projector.test.ts
  • Scenario the test should lock in: when a token usage notification has both cumulative and current usage, assistant/attempt usage uses the current usage; when only a huge cumulative total is present, it is not projected as fresh context usage.
  • Why this is the smallest reliable guardrail: the bug occurs at the Codex app-server projection boundary before downstream session/status accounting sees the usage, so this test catches the bad value at the source.
  • Existing test that already covers this (if any): existing projector tests continue to cover assistant text, plan, compaction, and tool metadata projection.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Codex-backed sessions should no longer persist cumulative app-server token totals as fresh context-window usage. This prevents inflated session context percentages in status output when only cumulative usage is available.

Diagram (if applicable)

Before:
thread/tokenUsage/updated
  -> tokenUsage.total
  -> assistant usage / attemptUsage
  -> lastCallUsage / promptTokens
  -> SessionEntry.totalTokens
  -> /status inflated context

After:
thread/tokenUsage/updated
  -> explicit current/last usage
  -> assistant usage / attemptUsage
  -> lastCallUsage / promptTokens
  -> SessionEntry.totalTokens

cumulative-only total
  -> ignored for fresh context accounting

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local repo test runner
  • Model/provider: Codex app-server projection path
  • Integration/channel (if any): Codex bundled plugin
  • Relevant config (redacted): N/A

Steps

  1. Project a thread/tokenUsage/updated notification containing a large cumulative tokenUsage.total and a smaller current/last usage object.
  2. Build the embedded attempt result.
  3. Inspect attemptUsage and the final assistant message usage.

Expected

  • Current/last usage is projected into assistant/attempt usage.
  • Cumulative-only totals are not treated as fresh context-window usage.

Actual

  • Before this fix, tokenUsage.total was projected into assistant/attempt usage.
  • After this fix, cumulative-only totals are ignored for context accounting unless an explicit current/last usage field is present.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What I personally verified:

  • Verified scenarios:
    • pnpm test extensions/codex/src/app-server/event-projector.test.ts
    • pnpm test extensions/codex
    • pre-commit pnpm check
  • Edge cases checked:
    • both cumulative and current usage are present
    • cumulative-only usage is huge
    • assistant text projection still works
    • plan/tool/compaction projection tests still pass
  • What I did not verify:
    • live Codex app-server session against Codex Desktop
    • full repo test suite

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: if an older app-server only emits cumulative usage and no current/last usage field, OpenClaw will no longer show token context usage from that notification.
    • Mitigation: unknown context usage is safer than persisting cumulative billing/thread totals as fresh context-window usage; downstream already supports missing usage.

@cyrusaf cyrusaf changed the title Codex: avoid treating cumulative app-server usage as current context fix: Codex: avoid treating cumulative app-server usage as current context Apr 11, 2026
@cyrusaf cyrusaf changed the title fix: Codex: avoid treating cumulative app-server usage as current context fix: [codex-harness] avoid treating cumulative app-server usage as current context Apr 11, 2026
@steipete steipete force-pushed the codex/app-server-token-usage-projection branch from 9cdfd51 to 8cf38e3 Compare April 18, 2026 22:01
@steipete steipete marked this pull request as ready for review April 18, 2026 22:01
@greptile-apps

greptile-apps Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a bug in the Codex app-server event projector where tokenUsage.total (a cumulative thread-level billing total) was being projected into attemptUsage/lastAssistant.usage as if it were the current turn's context-window usage, causing /status to report inflated context percentages (e.g. 999%). The fix introduces CURRENT_TOKEN_USAGE_KEYS to look for explicit current/last-call usage fields (last, current, lastCall, lastCallUsage, lastTokenUsage, last_token_usage) and ignores the notification entirely when only a cumulative total is present. Three new test scenarios are added to lock in the correct behavior.

Confidence Score: 5/5

Safe to merge — targeted bug fix with clear test coverage and no regressions to existing behavior.

All three changed files are correct. The fix is precisely scoped to the Codex app-server projection boundary, does not touch generic session accounting, and is backed by three new test cases that directly prove the fix works. No P0 or P1 findings.

No files require special attention.

Reviews (1): Last reviewed commit: "fix: avoid cumulative codex usage as con..." | Re-trigger Greptile

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8cf38e36b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +557 to +560
return normalizeUsage({
input: readNumberAlias(record, ["inputTokens", "input_tokens", "input", "promptTokens"]),
output: readNumberAlias(record, ["outputTokens", "output_tokens", "output"]),
cacheRead: readNumberAlias(record, [

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve cache-aware normalization for current token usage

normalizeCodexTokenUsage pre-extracts fields into { input, output, cacheRead, ... } before calling normalizeUsage, which removes raw keys that the normalizer uses to correct OpenAI-style prompt totals (for example cached-token metadata). In thread/tokenUsage/updated payloads that report input_tokens plus cached-token counts, this can double-count cache reads in prompt/context math, so attemptUsage and downstream status percentages can still be inflated even though this change is meant to fix that class of issue.

Useful? React with 👍 / 👎.

@steipete steipete merged commit 9a94194 into openclaw:main Apr 18, 2026
71 checks passed
@steipete

Copy link
Copy Markdown
Contributor

Landed via temp rebase onto main.

  • Local gates: pnpm check, pnpm build, pnpm test extensions/codex/src/app-server/event-projector.test.ts, pnpm test extensions/codex/src/app-server/event-projector.test.ts extensions/codex/src/app-server/run-attempt.test.ts extensions/codex/src/app-server/shared-client.test.ts
  • Docker gate: pnpm test:docker:live-codex-harness
  • GitHub PR checks: green
  • PR source commit: 8cf38e3
  • Landed commit: 9a94194

Thanks @cyrusaf!

ender-wiggin-ai pushed a commit to stroupaloop/openclaw that referenced this pull request Apr 18, 2026
Mquarmoc pushed a commit to Mquarmoc/openclaw that referenced this pull request Apr 20, 2026
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
globalcaos pushed a commit to globalcaos/tinkerclaw that referenced this pull request May 13, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants