Skip to content

Codex context-engine projection caps LCM output to 24k chars, hiding full-fit context #80760

@100yenadmin

Description

@100yenadmin

Summary

Native Codex app-server runs can silently expose only a small slice of context-engine output because projectContextEngineAssemblyForCodex() renders assembled context into a quoted prompt block capped at 24,000 chars.

This is a real context-delivery regression for context engines such as Lossless Claw/LCM after switching an agent from the Pi embedded route to the native Codex route. LCM can assemble a large, full-fit frontier, but Codex receives only the capped projection. The visible /status or UI percent then reflects the smaller Codex runtime prompt usage, which makes healthy LCM state look like it lost context or overcompacted.

Confirmed Evidence

Environment:

  • OpenClaw 2026.5.10-beta.5
  • runtime: native codex harness / openai-codex/gpt-5.5
  • session key: agent:main:main
  • context window denominator: 258000

Local observed run after beta switch:

  • LCM active conversation remained healthy:
    • conversation_id=1872
    • session_key=agent:main:main
    • 27811 messages
    • 14241378 raw message tokens
    • frontier: 247 context items, about 186764 tokens
  • LCM assembled a full-fit frontier before the turn:
    • contextItems=215
    • selectionMode=full-fit
    • estimatedTokens=179134
    • rawMessageCount=93
    • summaryCount=122
  • The same Codex runtime turn reported only:
    • input=590
    • output=6
    • cacheRead=76288
    • totalTokens=76884
    • visible status about 30%

There was no LCM compaction at that turn:

  • shouldCompact=false
  • reason: below-context-threshold-floor

So the observed drop was not LCM DB loss and not overcompaction. It is the native Codex projection/accounting boundary.

Code Path

The native Codex route assembles context-engine output here:

  • extensions/codex/src/app-server/run-attempt.ts
    • calls assembleHarnessContextEngine(...)
    • then calls projectContextEngineAssemblyForCodex(...)

The projection cap is here:

  • extensions/codex/src/app-server/context-engine-projection.ts
    • MAX_RENDERED_CONTEXT_CHARS = 24_000
    • MAX_TEXT_PART_CHARS = 6_000
    • truncateText(renderedContext, MAX_RENDERED_CONTEXT_CHARS)

This means a context engine may select hundreds of thousands of tokens, but native Codex only sees a 24k-character rendered block.

By contrast, the Pi embedded route feeds assembled context-engine messages through the Pi session flow rather than this small Codex text projection:

  • src/agents/pi-embedded-runner/run/attempt.ts
    • calls assembleAttemptContextEngine(...)
    • applies assembled messages to the active Pi session

Reproduction Shape

  1. Use a context-engine plugin with a large existing frontier, such as Lossless Claw.
  2. Run the same long-lived session through native Codex app-server runtime.
  3. Observe LCM/context-engine assemble logs reporting a large full-fit selection.
  4. Observe Codex runtime usage/status reporting a much smaller prompt.
  5. Inspect extensions/codex/src/app-server/context-engine-projection.ts and confirm the 24k-character projection cap.

Expected Behavior

When a context engine assembles context within the model budget, native Codex should either:

  • pass a model-visible projection sized from the actual model/context budget, or
  • explicitly report that the host projected/truncated the context-engine output and how much was dropped.

The model-visible context should not be silently capped to 24k chars while the context engine believes it delivered a full-fit frontier.

Actual Behavior

The context engine assembles a large full-fit frontier, but native Codex receives a small rendered quoted-context block capped at 24k chars. Status then reports the smaller Codex runtime usage, making it appear as if context dropped from about 80% to about 30%.

Fix Direction

Recommended upstream fix:

  1. Replace the fixed 24_000 rendered-context cap with a budget derived from the active model/context budget.
  2. Reserve space for:
    • developer instructions / workspace bootstrap
    • current user prompt
    • tool schemas
    • model output margin
  3. Return and record projection stats:
    • assembled message count
    • rendered chars before cap
    • rendered chars after cap
    • truncation boolean
    • approximate dropped chars/tokens
  4. Include those stats in trajectory/status diagnostics so /status can distinguish:
    • context-engine frontier available
    • context-engine projected to model
    • provider-reported prompt/cache usage

Even a conservative first patch that makes the cap configurable and records truncation metadata would prevent this from being misdiagnosed as LCM data loss.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions