Summary
Native Codex app-server runs can silently expose only a small slice of context-engine output because projectContextEngineAssemblyForCodex() renders assembled context into a quoted prompt block capped at 24,000 chars.
This is a real context-delivery regression for context engines such as Lossless Claw/LCM after switching an agent from the Pi embedded route to the native Codex route. LCM can assemble a large, full-fit frontier, but Codex receives only the capped projection. The visible /status or UI percent then reflects the smaller Codex runtime prompt usage, which makes healthy LCM state look like it lost context or overcompacted.
Confirmed Evidence
Environment:
- OpenClaw
2026.5.10-beta.5
- runtime: native
codex harness / openai-codex/gpt-5.5
- session key:
agent:main:main
- context window denominator:
258000
Local observed run after beta switch:
- LCM active conversation remained healthy:
conversation_id=1872
session_key=agent:main:main
27811 messages
14241378 raw message tokens
- frontier:
247 context items, about 186764 tokens
- LCM assembled a full-fit frontier before the turn:
contextItems=215
selectionMode=full-fit
estimatedTokens=179134
rawMessageCount=93
summaryCount=122
- The same Codex runtime turn reported only:
input=590
output=6
cacheRead=76288
totalTokens=76884
- visible status about
30%
There was no LCM compaction at that turn:
shouldCompact=false
- reason:
below-context-threshold-floor
So the observed drop was not LCM DB loss and not overcompaction. It is the native Codex projection/accounting boundary.
Code Path
The native Codex route assembles context-engine output here:
extensions/codex/src/app-server/run-attempt.ts
- calls
assembleHarnessContextEngine(...)
- then calls
projectContextEngineAssemblyForCodex(...)
The projection cap is here:
extensions/codex/src/app-server/context-engine-projection.ts
MAX_RENDERED_CONTEXT_CHARS = 24_000
MAX_TEXT_PART_CHARS = 6_000
truncateText(renderedContext, MAX_RENDERED_CONTEXT_CHARS)
This means a context engine may select hundreds of thousands of tokens, but native Codex only sees a 24k-character rendered block.
By contrast, the Pi embedded route feeds assembled context-engine messages through the Pi session flow rather than this small Codex text projection:
src/agents/pi-embedded-runner/run/attempt.ts
- calls
assembleAttemptContextEngine(...)
- applies assembled messages to the active Pi session
Reproduction Shape
- Use a context-engine plugin with a large existing frontier, such as Lossless Claw.
- Run the same long-lived session through native Codex app-server runtime.
- Observe LCM/context-engine assemble logs reporting a large full-fit selection.
- Observe Codex runtime usage/status reporting a much smaller prompt.
- Inspect
extensions/codex/src/app-server/context-engine-projection.ts and confirm the 24k-character projection cap.
Expected Behavior
When a context engine assembles context within the model budget, native Codex should either:
- pass a model-visible projection sized from the actual model/context budget, or
- explicitly report that the host projected/truncated the context-engine output and how much was dropped.
The model-visible context should not be silently capped to 24k chars while the context engine believes it delivered a full-fit frontier.
Actual Behavior
The context engine assembles a large full-fit frontier, but native Codex receives a small rendered quoted-context block capped at 24k chars. Status then reports the smaller Codex runtime usage, making it appear as if context dropped from about 80% to about 30%.
Fix Direction
Recommended upstream fix:
- Replace the fixed
24_000 rendered-context cap with a budget derived from the active model/context budget.
- Reserve space for:
- developer instructions / workspace bootstrap
- current user prompt
- tool schemas
- model output margin
- Return and record projection stats:
- assembled message count
- rendered chars before cap
- rendered chars after cap
- truncation boolean
- approximate dropped chars/tokens
- Include those stats in trajectory/status diagnostics so
/status can distinguish:
- context-engine frontier available
- context-engine projected to model
- provider-reported prompt/cache usage
Even a conservative first patch that makes the cap configurable and records truncation metadata would prevent this from being misdiagnosed as LCM data loss.
Summary
Native Codex app-server runs can silently expose only a small slice of context-engine output because
projectContextEngineAssemblyForCodex()renders assembled context into a quoted prompt block capped at 24,000 chars.This is a real context-delivery regression for context engines such as Lossless Claw/LCM after switching an agent from the Pi embedded route to the native Codex route. LCM can assemble a large, full-fit frontier, but Codex receives only the capped projection. The visible
/statusor UI percent then reflects the smaller Codex runtime prompt usage, which makes healthy LCM state look like it lost context or overcompacted.Confirmed Evidence
Environment:
2026.5.10-beta.5codexharness /openai-codex/gpt-5.5agent:main:main258000Local observed run after beta switch:
conversation_id=1872session_key=agent:main:main27811messages14241378raw message tokens247context items, about186764tokenscontextItems=215selectionMode=full-fitestimatedTokens=179134rawMessageCount=93summaryCount=122input=590output=6cacheRead=76288totalTokens=7688430%There was no LCM compaction at that turn:
shouldCompact=falsebelow-context-threshold-floorSo the observed drop was not LCM DB loss and not overcompaction. It is the native Codex projection/accounting boundary.
Code Path
The native Codex route assembles context-engine output here:
extensions/codex/src/app-server/run-attempt.tsassembleHarnessContextEngine(...)projectContextEngineAssemblyForCodex(...)The projection cap is here:
extensions/codex/src/app-server/context-engine-projection.tsMAX_RENDERED_CONTEXT_CHARS = 24_000MAX_TEXT_PART_CHARS = 6_000truncateText(renderedContext, MAX_RENDERED_CONTEXT_CHARS)This means a context engine may select hundreds of thousands of tokens, but native Codex only sees a 24k-character rendered block.
By contrast, the Pi embedded route feeds assembled context-engine messages through the Pi session flow rather than this small Codex text projection:
src/agents/pi-embedded-runner/run/attempt.tsassembleAttemptContextEngine(...)Reproduction Shape
extensions/codex/src/app-server/context-engine-projection.tsand confirm the 24k-character projection cap.Expected Behavior
When a context engine assembles context within the model budget, native Codex should either:
The model-visible context should not be silently capped to 24k chars while the context engine believes it delivered a full-fit frontier.
Actual Behavior
The context engine assembles a large full-fit frontier, but native Codex receives a small rendered quoted-context block capped at 24k chars. Status then reports the smaller Codex runtime usage, making it appear as if context dropped from about 80% to about 30%.
Fix Direction
Recommended upstream fix:
24_000rendered-context cap with a budget derived from the active model/context budget./statuscan distinguish:Even a conservative first patch that makes the cap configurable and records truncation metadata would prevent this from being misdiagnosed as LCM data loss.