[feat]: thread currentTokenCount into ContextEngine.assemble#81079
[feat]: thread currentTokenCount into ContextEngine.assemble#81079DatPham-6996 wants to merge 6 commits into
Conversation
OpenClaw passes only the full model context window (tokenBudget) to ContextEngine.assemble, so engines that inject a systemPromptAddition have no way to tell how much of that window is already consumed before injecting. On long sessions this can push the prompt past the model's context limit and the runtime's preemptive-compaction does not catch it because it runs before assemble. Add an optional currentTokenCount field to the assemble params that carries the runtime's pre-assembly prompt token estimate (messages + system prompt + incoming prompt). Engines can subtract this from tokenBudget to bound their injection. The field is optional so engines that don't read it continue to work as before. Wiring: - runEmbeddedAttempt computes the estimate via estimatePrePromptTokens, the same helper preemptive-compaction already uses, so engines see the same number the runtime would. - installContextEngineLoopHook reads currentTokenCount from the existing getRuntimeContext() callback when available. - assembleHarnessContextEngine passes the field through when present and omits it when absent (so engines can detect older runtimes).
|
Codex review: needs real behavior proof before merge. Reviewed June 1, 2026, 4:52 AM ET / 08:52 UTC. Summary PR surface: Source +38, Tests +147, Docs +38. Total +223 across 12 files. Reproducibility: not applicable. as a strict bug reproduction: the PR is a capability addition, and the downstream overflow scenario has not been reproduced in this review or supplied as live proof. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Land this only after maintainer API-shape approval, an updated SDK API baseline hash, and redacted live long-session or downstream plugin proof showing the field prevents oversized injection. Do we have a high-confidence way to reproduce the issue? Not applicable as a strict bug reproduction: the PR is a capability addition, and the downstream overflow scenario has not been reproduced in this review or supplied as live proof. Is this the best way to solve the issue? Unclear: threading an optional field is a plausible narrow API, but it is not yet proven as the best permanent contract until maintainers settle the SDK shape and the contributor supplies live downstream proof. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 07a425aa145f. Label changesLabel justifications:
Evidence reviewedPR surface: Source +38, Tests +147, Docs +38. Total +223 across 12 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
ClawSweeper [P2] feedback: the Codex extension's
assembleHarnessContextEngine call at run-attempt.ts:624 was the only
production caller that did not compute currentTokenCount, so engines
running under the bundled Codex harness still saw the field as
undefined after the initial PR.
Re-export estimatePrePromptTokens from openclaw/plugin-sdk/agent-harness-runtime
so extensions can reuse the same estimator the PI runner uses, then
compute the pre-assembly estimate at the Codex call site from
{ historyMessages, developerInstructions, params.prompt } and pass it
through. Defining the semantics once (via the shared helper) and using
it from every assemble caller satisfies the maintainability concern
flagged in the review.
Docs: docs/concepts/context-engine.md now mentions currentTokenCount
in the assemble lifecycle accordion, includes it in the registration
example, and adds a "Sizing systemPromptAddition" subsection with a
guard pattern and a note that preemptive-compaction runs before
assemble so sizing is the engine's responsibility.
Test: extends run-attempt.context-engine.test.ts to assert the new
field is now passed on the Codex path. 118/118 tests pass on the
touched files.
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
|
Hi @jalehman could you please help me review this PR ? |
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
ClawSweeper [P2] follow-up: the embedded-runner callback that feeds installContextEngineLoopHook called buildAfterTurnRuntimeContext with no token estimate, so the loop-hook pass-through added in the previous commit always saw `loopCurrentTokenCount` as undefined in production. Compute the estimate via estimatePrePromptTokens against the loop messages and the run's systemPromptText (prompt is empty mid-loop — the continuation is driven by tool results already in messages) and pass it through buildAfterTurnRuntimeContext so engines now receive currentTokenCount on the tool-loop assemble path too. Test: extends attempt.spawn-workspace.context-engine.test.ts to capture the loop-hook params and assert getRuntimeContext now returns a positive currentTokenCount. 194/194 tests pass on the touched files. EOF )"
|
Addressed [P2] in The production loop-hook install site at
Test: new test in Tally: 194/194 tests pass on the touched files. PR description updated to reflect the new bullet, target test file, and counts. Real behavior proof from a live long-session run is still pending and will be posted before merge. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
This pull request has been automatically marked as stale due to inactivity. |
|
ClawSweeper PR egg: 🎁 locked until real behavior proof passes. Details
|
Summary
ContextEngine.assemblereceives onlytokenBudget(the full model context window) and has no signal for how many of those tokens are already consumed bymessages + systemPrompt + prompt. Engines that prepend asystemPromptAdditionhave no way to size it against actual remaining headroom.preemptive-compaction(which has full visibility) runs before assemble, so it does not catch overflow caused by the engine's injection — the next LLM call simply fails with a context-limit error.currentTokenCount?: numberfield toContextEngine.assembleparams, plumbed it throughassembleHarnessContextEngine, and wired computation at every production caller: the PI runner main assemble (runEmbeddedAttempt), the PI runner tool-loop hook (installContextEngineLoopHookvia thegetRuntimeContextcallback), and the Codex harness (extensions/codex/src/app-server/run-attempt.ts).estimatePrePromptTokensis re-exported fromopenclaw/plugin-sdk/agent-harness-runtimeso extension harnesses use the same estimator the PI runner uses — defining the semantics once. Docs updated.tokenBudget.preemptive-compactionlogic. No public type was renamed or removed. No new dependency was added.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real behavior proof (required for external PRs)
systemPromptAdditionagainst the model's remaining headroom becauseassemblereceives onlytokenBudget, notcurrentTokenCount. After this change, engines can computeremaining = tokenBudget - currentTokenCount - <reserve>and skip injection when the curated content would not fit.pnpm exec vitest runon the touched test files. Not yet tested against a live model on a long session — verification so far is at the unit/integration-test layer. A follow-up real-session repro on a Gemini 1M-context channel where pre-this-PR overflow was observed is planned before merge.runEmbeddedAttemptcomputespreassemblyCurrentTokenCountviaestimatePrePromptTokensfrom{ activeSession.messages, systemPromptText, params.prompt }and passes it throughassembleAttemptContextEngine.installContextEngineLoopHookreadscurrentTokenCountfrom the existinggetRuntimeContext()callback. The production install site atattempt.ts:1996now computes the estimate from the loop messages andsystemPromptText(with emptyprompt— the continuation is driven by tool results already inmessages), so engines see a real value on this path too.extensions/codex/src/app-server/run-attempt.ts:624computes the same estimate from{ historyMessages, developerInstructions, params.prompt }and passes it throughassembleHarnessContextEngine.Root Cause (if applicable)
N/A — this is a capability addition, not a bug fix in OpenClaw. The existing behavior (passing only
tokenBudget) is by design.systemPromptAdditiondiscovered they had no way to size injection safely. TheContextEngineRuntimeContexttype already definedcurrentTokenCountfor other lifecycle hooks (afterTurn,maintain,compact) —assemblewas the only one that didn't receive it.Regression Test Plan (if applicable)
src/agents/harness/context-engine-lifecycle.test.ts— wrapper pass-through (present + absent)src/agents/pi-embedded-runner/tool-result-context-guard.test.ts— tool-loop hook propagation (from runtimeContext, missing-from-context, no-callback)src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-injection.test.ts— attempt-level alias plumbs the fieldsrc/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts— captures loop-hook params and assertsgetRuntimeContext()returns a positivecurrentTokenCountin productionextensions/codex/src/app-server/run-attempt.context-engine.test.ts— Codex harness path now asserts a positivecurrentTokenCountreaches the enginecurrentTokenCountreachesengine.assemblefrom every production harness path; field is omitted (not undefined-valued) when no estimate is available, so engines can detect older runtimes.preemptive-compaction.test.tsalready covers the underlyingestimatePrePromptTokenshelper that produces the value.User-visible / Behavior Changes
None for OpenClaw end users. Plugin authors implementing
ContextEngineget a new optionalcurrentTokenCount?: numberparam onassemble. Plugins that do not read it see unchanged behavior. Extension authors gain a new re-exportestimatePrePromptTokensonopenclaw/plugin-sdk/agent-harness-runtimefor use when wiring custom harnesses.Diagram (if applicable)
Security Impact (required)
Yes, explain risk + mitigation: N/A.Repro + Verification
Environment
ContextEngine)tsconfig.json+ workspacepnpm-workspace.yamlSteps
git checkout feat/context-engine-current-token-countpnpm installpnpm exec vitest run src/agents/harness/context-engine-lifecycle.test.ts src/agents/pi-embedded-runner/tool-result-context-guard.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-injection.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts extensions/codex/src/app-server/run-attempt.context-engine.test.tsExpected
git diff --stat origin/main..HEADshows only the touched source, test, and docs files.Actual
Evidence
estimatePrePromptTokensis now invoked at each main assemble call site (PI main, Codex) and once per tool-loop iteration (PI loop hook). The helper already runs at the preemptive-compaction precheck on the same turn (PI main path), so the marginal cost there is one extra encode ofmessages + systemPrompt + prompt. Loop-hook iterations add one encode per iteration. Not measured separately.Human Verification (required)
assembleHarnessContextEnginepassescurrentTokenCountto the engine when supplied.assembleHarnessContextEngineomits the field when caller does not supply it (so engines can detect older runtimes viaparams.currentTokenCount === undefined).installContextEngineLoopHookreadscurrentTokenCountfromgetRuntimeContext()when the callback returns it; omits otherwise; omits when no callback is wired.attempt.ts:1996computescurrentTokenCountand feeds it into thegetRuntimeContextruntime context (asserted by capturing the callback inattempt.spawn-workspace.context-engine.test.ts).assembleAttemptContextEnginere-export still plumbs the field.extensions/codex/src/app-server/run-attempt.ts:624computes a positivecurrentTokenCountand passes it through (asserted inrun-attempt.context-engine.test.ts).getRuntimeContextcallback in the loop hook → field omitted.getRuntimeContextreturns an object withoutcurrentTokenCount→ field omitted.currentTokenCount→ unchanged behavior.estimatePrePromptTokenscalls under stress (large transcripts, fast tool loops).Review Conversations
Compatibility / Migration
params.currentTokenCountand gate behavior on whether it is present.Risks and Mitigations
estimatePrePromptTokenscall at each main assemble call site adds one extra token-estimation pass per turn, plus one per tool-loop iteration on the PI loop-hook path.undefinedor0when not supplied (see the...(typeof x === "number" ? { x } : {})spread idiom in the wrappers). Engines that follow the documented contract (params.currentTokenCount === undefined⇒ no headroom info available) behave correctly.estimatePrePromptTokensfromopenclaw/plugin-sdk/agent-harness-runtimewidens the public extension API by one function and commits us to maintaining its signature.