Skip to content

[feat]: thread currentTokenCount into ContextEngine.assemble#81079

Open
DatPham-6996 wants to merge 6 commits into
openclaw:mainfrom
DatPham-6996:feat/context-engine-current-token-count
Open

[feat]: thread currentTokenCount into ContextEngine.assemble#81079
DatPham-6996 wants to merge 6 commits into
openclaw:mainfrom
DatPham-6996:feat/context-engine-current-token-count

Conversation

@DatPham-6996

@DatPham-6996 DatPham-6996 commented May 12, 2026

Copy link
Copy Markdown

Summary

  • Problem: ContextEngine.assemble receives only tokenBudget (the full model context window) and has no signal for how many of those tokens are already consumed by messages + systemPrompt + prompt. Engines that prepend a systemPromptAddition have no way to size it against actual remaining headroom.
  • Why it matters: On long sessions an engine's injection can push the prompt past the model's context limit. The runtime's preemptive-compaction (which has full visibility) runs before assemble, so it does not catch overflow caused by the engine's injection — the next LLM call simply fails with a context-limit error.
  • What changed: Added an optional currentTokenCount?: number field to ContextEngine.assemble params, plumbed it through assembleHarnessContextEngine, and wired computation at every production caller: the PI runner main assemble (runEmbeddedAttempt), the PI runner tool-loop hook (installContextEngineLoopHook via the getRuntimeContext callback), and the Codex harness (extensions/codex/src/app-server/run-attempt.ts). estimatePrePromptTokens is re-exported from openclaw/plugin-sdk/agent-harness-runtime so extension harnesses use the same estimator the PI runner uses — defining the semantics once. Docs updated.
  • What did NOT change (scope boundary): Behavior of engines that don't read the new field (it's optional). Semantics or value of tokenBudget. preemptive-compaction logic. No public type was renamed or removed. No new dependency was added.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

No upstream OpenClaw issue. Motivation comes from a downstream context-engine plugin (@byterover/byterover) that hit a real overflow on long sessions and could not implement a precise guard without this field.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: Context-engine plugins cannot precisely guard their systemPromptAddition against the model's remaining headroom because assemble receives only tokenBudget, not currentTokenCount. After this change, engines can compute remaining = tokenBudget - currentTokenCount - <reserve> and skip injection when the curated content would not fit.
  • Real environment tested: macOS (Darwin 25.2.0), local pnpm workspace, pnpm exec vitest run on the touched test files. Not yet tested against a live model on a long session — verification so far is at the unit/integration-test layer. A follow-up real-session repro on a Gemini 1M-context channel where pre-this-PR overflow was observed is planned before merge.
  • Exact steps or command run after this patch:
cd ~/dpmemories/openclawdp/openclaw
pnpm install
pnpm exec vitest run \
  src/agents/harness/context-engine-lifecycle.test.ts \
  src/agents/pi-embedded-runner/tool-result-context-guard.test.ts \
  src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-injection.test.ts \
  src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts \
  extensions/codex/src/app-server/run-attempt.context-engine.test.ts
  • Evidence after fix:
RUN  v4.1.6 /Users/datpham/dpmemories/openclawdp/openclaw

Test Files  9 passed (9)
     Tests  194 passed (194)
  Duration  19.96s
  • Observed result after fix: New field is plumbed end-to-end across every production assemble path.
    • PI runner main assemble: runEmbeddedAttempt computes preassemblyCurrentTokenCount via estimatePrePromptTokens from { activeSession.messages, systemPromptText, params.prompt } and passes it through assembleAttemptContextEngine.
    • PI runner tool-loop hook: installContextEngineLoopHook reads currentTokenCount from the existing getRuntimeContext() callback. The production install site at attempt.ts:1996 now computes the estimate from the loop messages and systemPromptText (with empty prompt — the continuation is driven by tool results already in messages), so engines see a real value on this path too.
    • Codex harness: extensions/codex/src/app-server/run-attempt.ts:624 computes the same estimate from { historyMessages, developerInstructions, params.prompt } and passes it through assembleHarnessContextEngine.
  • What was not tested: A live overflow scenario with a real LLM provider. The downstream plugin's consumption of this field (which is where the real-world payoff happens) is tracked in a separate plugin PR.
  • Before evidence (optional but encouraged): N/A — this PR adds new capability rather than fixing a behavior already present in OpenClaw.

Root Cause (if applicable)

N/A — this is a capability addition, not a bug fix in OpenClaw. The existing behavior (passing only tokenBudget) is by design.

  • Root cause: N/A
  • Missing detection / guardrail: N/A
  • Contributing context (if known): Plugin authors building context engines that inject systemPromptAddition discovered they had no way to size injection safely. The ContextEngineRuntimeContext type already defined currentTokenCount for other lifecycle hooks (afterTurn, maintain, compact) — assemble was the only one that didn't receive it.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/agents/harness/context-engine-lifecycle.test.ts — wrapper pass-through (present + absent)
    • src/agents/pi-embedded-runner/tool-result-context-guard.test.ts — tool-loop hook propagation (from runtimeContext, missing-from-context, no-callback)
    • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-injection.test.ts — attempt-level alias plumbs the field
    • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts — captures loop-hook params and asserts getRuntimeContext() returns a positive currentTokenCount in production
    • extensions/codex/src/app-server/run-attempt.context-engine.test.ts — Codex harness path now asserts a positive currentTokenCount reaches the engine
  • Scenario the test should lock in: currentTokenCount reaches engine.assemble from every production harness path; field is omitted (not undefined-valued) when no estimate is available, so engines can detect older runtimes.
  • Why this is the smallest reliable guardrail: Each wiring layer (wrapper, PI main computation site, PI loop-hook computation site, Codex computation site, attempt-level re-export) has its own test. Refactoring any one of them without preserving pass-through will fail an existing test.
  • Existing test that already covers this (if any): preemptive-compaction.test.ts already covers the underlying estimatePrePromptTokens helper that produces the value.
  • If no new test is added, why not: N/A (9 new tests / assertions added across 5 test files).

User-visible / Behavior Changes

None for OpenClaw end users. Plugin authors implementing ContextEngine get a new optional currentTokenCount?: number param on assemble. Plugins that do not read it see unchanged behavior. Extension authors gain a new re-export estimatePrePromptTokens on openclaw/plugin-sdk/agent-harness-runtime for use when wiring custom harnesses.

Diagram (if applicable)

Before:
runEmbeddedAttempt          (PI harness — main + loop-hook)
  └─ contextEngine.assemble({ tokenBudget, messages, prompt, ... })

extensions/codex run-attempt (Codex harness)
  └─ contextEngine.assemble({ tokenBudget, messages, prompt, ... })

  (engines have no view of pre-assembly consumed tokens)

After:
runEmbeddedAttempt          (PI harness)
  ├─ main assemble
  │   ├─ estimatePrePromptTokens(messages + systemPrompt + prompt) → N
  │   └─ contextEngine.assemble({ tokenBudget, currentTokenCount: N, ... })
  └─ tool-loop hook (getRuntimeContext callback at install site)
      ├─ estimatePrePromptTokens(loopMessages + systemPrompt + "") → N
      └─ contextEngine.assemble({ tokenBudget, currentTokenCount: N, ... })

extensions/codex run-attempt (Codex harness)
  ├─ estimatePrePromptTokens(historyMessages + developerInstructions + prompt) → N
  └─ contextEngine.assemble({ tokenBudget, currentTokenCount: N, ... })

  (engines can compute remaining = tokenBudget - currentTokenCount - reserve)

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A.

Repro + Verification

Environment

  • OS: macOS Darwin 25.2.0
  • Runtime/container: pnpm 11.x workspace, Node 22, vitest 4.1.6
  • Model/provider: N/A (tests use mocked ContextEngine)
  • Integration/channel (if any): N/A
  • Relevant config (redacted): default tsconfig.json + workspace pnpm-workspace.yaml

Steps

  1. git checkout feat/context-engine-current-token-count
  2. pnpm install
  3. pnpm exec vitest run src/agents/harness/context-engine-lifecycle.test.ts src/agents/pi-embedded-runner/tool-result-context-guard.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-injection.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts

Expected

  • All 194 tests across the 9 affected test files pass.
  • git diff --stat origin/main..HEAD shows only the touched source, test, and docs files.

Actual

  • 194/194 tests pass (see Evidence above).

Evidence

  • Failing test/log before + passing after — not applicable; this is a feature addition, no prior failing test existed.
  • Trace/log snippets — vitest summary above.
  • Screenshot/recording
  • Perf numbers (if relevant) — estimatePrePromptTokens is now invoked at each main assemble call site (PI main, Codex) and once per tool-loop iteration (PI loop hook). The helper already runs at the preemptive-compaction precheck on the same turn (PI main path), so the marginal cost there is one extra encode of messages + systemPrompt + prompt. Loop-hook iterations add one encode per iteration. Not measured separately.

Human Verification (required)

  • Verified scenarios:
    • assembleHarnessContextEngine passes currentTokenCount to the engine when supplied.
    • assembleHarnessContextEngine omits the field when caller does not supply it (so engines can detect older runtimes via params.currentTokenCount === undefined).
    • installContextEngineLoopHook reads currentTokenCount from getRuntimeContext() when the callback returns it; omits otherwise; omits when no callback is wired.
    • The production loop-hook install site at attempt.ts:1996 computes currentTokenCount and feeds it into the getRuntimeContext runtime context (asserted by capturing the callback in attempt.spawn-workspace.context-engine.test.ts).
    • The attempt-level assembleAttemptContextEngine re-export still plumbs the field.
    • The Codex harness call at extensions/codex/src/app-server/run-attempt.ts:624 computes a positive currentTokenCount and passes it through (asserted in run-attempt.context-engine.test.ts).
  • Edge cases checked:
    • Missing getRuntimeContext callback in the loop hook → field omitted.
    • getRuntimeContext returns an object without currentTokenCount → field omitted.
    • Older callers that don't supply currentTokenCount → unchanged behavior.
  • What you did NOT verify:
    • End-to-end run against a real LLM provider where pre-this-PR overflow would have occurred. This is the most important real-world signal and will be performed before merge.
    • Performance impact of the additional estimatePrePromptTokens calls under stress (large transcripts, fast tool loops).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A. The new field is optional. Existing context engines compile and run unchanged. Engines that want to opt in simply read params.currentTokenCount and gate behavior on whether it is present.

Risks and Mitigations

  • Risk: The additional estimatePrePromptTokens call at each main assemble call site adds one extra token-estimation pass per turn, plus one per tool-loop iteration on the PI loop-hook path.
    • Mitigation: Uses the same helper the runtime already invokes at the preemptive-compaction precheck on the same turn (PI main path), so the marginal cost there is small and amortized. Loop-hook iterations add one encode per iteration; if this proves measurably costly under sustained tool loops, the result can be cached or computed lazily.
  • Risk: Engines that misread the new field (e.g. treat absent as zero) could over-restrict their injection on older runtimes that don't pass the field.
    • Mitigation: Field is explicitly omitted rather than set to undefined or 0 when not supplied (see the ...(typeof x === "number" ? { x } : {}) spread idiom in the wrappers). Engines that follow the documented contract (params.currentTokenCount === undefined ⇒ no headroom info available) behave correctly.
  • Risk: Re-exporting estimatePrePromptTokens from openclaw/plugin-sdk/agent-harness-runtime widens the public extension API by one function and commits us to maintaining its signature.
    • Mitigation: The function is small, pure, and stable — it already exists at its source location and is used internally. Re-exporting just makes the same shape available to extension harnesses. If the internal signature ever needs to change, the barrel re-export can be wrapped at that point.

OpenClaw passes only the full model context window (tokenBudget) to
ContextEngine.assemble, so engines that inject a systemPromptAddition
have no way to tell how much of that window is already consumed before
injecting. On long sessions this can push the prompt past the model's
context limit and the runtime's preemptive-compaction does not catch
it because it runs before assemble.

Add an optional currentTokenCount field to the assemble params that
carries the runtime's pre-assembly prompt token estimate (messages +
system prompt + incoming prompt). Engines can subtract this from
tokenBudget to bound their injection. The field is optional so engines
that don't read it continue to work as before.

Wiring:
- runEmbeddedAttempt computes the estimate via estimatePrePromptTokens,
  the same helper preemptive-compaction already uses, so engines see
  the same number the runtime would.
- installContextEngineLoopHook reads currentTokenCount from the
  existing getRuntimeContext() callback when available.
- assembleHarnessContextEngine passes the field through when present
  and omits it when absent (so engines can detect older runtimes).
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S proof: supplied External PR includes structured after-fix real behavior proof. labels May 12, 2026
@clawsweeper

clawsweeper Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 1, 2026, 4:52 AM ET / 08:52 UTC.

Summary
The PR adds optional currentTokenCount plumbing to ContextEngine.assemble across embedded and Codex harness paths, re-exports estimatePrePromptTokens from the agent-harness runtime SDK surface, and updates context-engine docs/tests.

PR surface: Source +38, Tests +147, Docs +38. Total +223 across 12 files.

Reproducibility: not applicable. as a strict bug reproduction: the PR is a capability addition, and the downstream overflow scenario has not been reproduced in this review or supplied as live proof.

Review metrics: 1 noteworthy metric.

  • Public API surface: 1 assemble parameter added, 1 SDK export added. Both changes affect third-party context-engine/plugin authors and need contract, baseline, and upgrade review before merge.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦐 gold shrimp
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Post redacted live long-session or downstream context-engine plugin proof showing currentTokenCount prevents an oversized systemPromptAddition.
  • Run pnpm plugin-sdk:api:gen and commit the updated docs/.generated/plugin-sdk-api-baseline.sha256.
  • Get maintainer confirmation that this is the intended permanent context-engine assemble/SDK export shape.

Proof guidance:

  • [P1] Needs real behavior proof before merge: Only targeted Vitest output is supplied; before merge the contributor should post redacted live long-session or downstream plugin terminal output, logs, recording, or linked artifact, then update the PR body so ClawSweeper re-reviews automatically or ask a maintainer to comment @clawsweeper re-review.

Risk before merge

  • [P1] This extends a public context-engine/plugin SDK contract with a new assemble parameter and SDK export before maintainers confirm the permanent API shape.
  • [P1] The contributor's proof is targeted Vitest output only and the PR body explicitly says no live long-session/model or downstream context-engine plugin run has been tested yet.
  • [P1] The new public SDK export is missing the tracked Plugin SDK API baseline hash update, so SDK drift checks are expected to remain unsettled.

Maintainer options:

  1. Confirm and update the SDK contract (recommended)
    Have maintainers confirm that currentTokenCount on assemble and the estimatePrePromptTokens SDK export are the intended permanent API, then regenerate and commit the Plugin SDK API baseline hash before merge.
  2. Accept the additive contract intentionally
    Maintainers may accept the additive optional API as-is once the baseline and real proof are present, while owning the new SDK contract going forward.
  3. Pause for the broader assemble RFC
    If maintainers want the broader metadata design in RFC: Expose existing InputProvenance / AgentInternalEvent signals to ContextEngine.assemble() #82137 to own this surface, pause this PR until that contract direction is settled.

Next step before merge

  • [P1] Human review is needed because the remaining blockers are public SDK contract direction and contributor-owned live proof, not just a mechanical code repair.

Security
Cleared: The diff only changes TypeScript contract plumbing, docs, and tests; no dependency, workflow, secret, network, or package-resolution surface change was found.

Review findings

  • [P2] Regenerate the Plugin SDK API baseline hash — src/plugin-sdk/agent-harness-runtime.ts:294
Review details

Best possible solution:

Land this only after maintainer API-shape approval, an updated SDK API baseline hash, and redacted live long-session or downstream plugin proof showing the field prevents oversized injection.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a strict bug reproduction: the PR is a capability addition, and the downstream overflow scenario has not been reproduced in this review or supplied as live proof.

Is this the best way to solve the issue?

Unclear: threading an optional field is a plausible narrow API, but it is not yet proven as the best permanent contract until maintainers settle the SDK shape and the contributor supplies live downstream proof.

Full review comments:

  • [P2] Regenerate the Plugin SDK API baseline hash — src/plugin-sdk/agent-harness-runtime.ts:294
    Adding estimatePrePromptTokens to this public SDK export changes the generated Plugin SDK API surface, but the PR does not update docs/.generated/plugin-sdk-api-baseline.sha256. Run pnpm plugin-sdk:api:gen and commit the updated hash so the tracked baseline matches the new export.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 07a425aa145f.

Label changes

Label justifications:

  • P2: This is a normal-priority plugin SDK/API improvement with limited direct end-user blast radius but real maintainer contract review needed.
  • merge-risk: 🚨 compatibility: Merging would publish a new context-engine assemble parameter and Plugin SDK export that third-party plugins may depend on.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: Only targeted Vitest output is supplied; before merge the contributor should post redacted live long-session or downstream plugin terminal output, logs, recording, or linked artifact, then update the PR body so ClawSweeper re-reviews automatically or ask a maintainer to comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +38, Tests +147, Docs +38. Total +223 across 12 files.

View PR surface stats
Area Files Added Removed Net
Source 6 38 0 +38
Tests 5 147 0 +147
Docs 1 41 3 +38
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 12 226 3 +223

Acceptance criteria:

  • [P1] pnpm plugin-sdk:api:check.
  • [P1] node scripts/run-vitest.mjs src/agents/harness/context-engine-lifecycle.test.ts src/agents/embedded-agent-runner/tool-result-context-guard.test.ts src/agents/embedded-agent-runner/run/attempt.spawn-workspace.context-injection.test.ts src/agents/embedded-agent-runner/run/attempt.spawn-workspace.context-engine.test.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts.
  • [P1] Contributor-supplied redacted live long-session or downstream context-engine plugin proof.

What I checked:

  • Current main lacks assemble currentTokenCount: On current main, ContextEngine.assemble accepts sessionId, sessionKey, messages, tokenBudget, availableTools, citationsMode, model, and prompt, but not currentTokenCount. (src/context-engine/types.ts:311, 07a425aa145f)
  • PR adds a public SDK export without the tracked baseline hash: The live PR diff adds estimatePrePromptTokens to src/plugin-sdk/agent-harness-runtime.ts, while the PR file list does not include docs/.generated/plugin-sdk-api-baseline.sha256. (src/plugin-sdk/agent-harness-runtime.ts:294, b7aa56b269c3)
  • SDK policy requires API baseline alignment: The scoped Plugin SDK guide treats this directory as the public plugin/core contract and says public subpath changes must keep docs, entrypoints, package exports, and API baseline/export checks aligned. (src/plugin-sdk/AGENTS.md:81, 07a425aa145f)
  • Generated baseline hash is the tracked artifact: The generated docs README says plugin-sdk-api-baseline.sha256 is tracked and should be regenerated with pnpm plugin-sdk:api:gen when the Plugin SDK API changes. Public docs: docs/.generated/README.md. (docs/.generated/README.md:6, 07a425aa145f)
  • Codex protocol source inspected: Upstream Codex thread/start and thread/resume params carry developer_instructions through thread configuration, matching the OpenClaw Codex harness path reviewed here. (../codex/codex-rs/app-server-protocol/src/protocol/v2/thread.rs:95, f1b1b64005cd)
  • Owner history sample: Current shallow history/blame for the central context-engine and Codex harness files points at recent work by Vincent Koc, including current-main context-engine contract and recent Codex run-attempt changes. (extensions/codex/src/app-server/run-attempt.ts:720, 275caeb5f522)

Likely related people:

  • vincentkoc: Current shallow blame and recent log entries for ContextEngine types, harness lifecycle, embedded runner wiring, Plugin SDK runtime export surface, and Codex run-attempt all point to recent work by Vincent Koc; deeper provenance is limited by the available shallow checkout. (role: recent area contributor; confidence: medium; commits: 275caeb5f522, c429a3c4726f, f55ff8dd1b88; files: src/context-engine/types.ts, src/agents/harness/context-engine-lifecycle.ts, src/agents/embedded-agent-runner/run/attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

ClawSweeper [P2] feedback: the Codex extension's
assembleHarnessContextEngine call at run-attempt.ts:624 was the only
production caller that did not compute currentTokenCount, so engines
running under the bundled Codex harness still saw the field as
undefined after the initial PR.

Re-export estimatePrePromptTokens from openclaw/plugin-sdk/agent-harness-runtime
so extensions can reuse the same estimator the PI runner uses, then
compute the pre-assembly estimate at the Codex call site from
{ historyMessages, developerInstructions, params.prompt } and pass it
through. Defining the semantics once (via the shared helper) and using
it from every assemble caller satisfies the maintainability concern
flagged in the review.

Docs: docs/concepts/context-engine.md now mentions currentTokenCount
in the assemble lifecycle accordion, includes it in the registration
example, and adds a "Sizing systemPromptAddition" subsection with a
guard pattern and a note that preemptive-compaction runs before
assemble so sizing is the engine's responsibility.

Test: extends run-attempt.context-engine.test.ts to assert the new
field is now passed on the Codex path. 118/118 tests pass on the
touched files.
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation extensions: codex labels May 12, 2026
@DatPham-6996

Copy link
Copy Markdown
Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@DatPham-6996

Copy link
Copy Markdown
Author

Hi @jalehman could you please help me review this PR ?

@DatPham-6996

Copy link
Copy Markdown
Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

ClawSweeper [P2] follow-up: the embedded-runner callback that feeds
installContextEngineLoopHook called buildAfterTurnRuntimeContext with
no token estimate, so the loop-hook pass-through added in the previous
commit always saw `loopCurrentTokenCount` as undefined in production.
Compute the estimate via estimatePrePromptTokens against the loop
messages and the run's systemPromptText (prompt is empty mid-loop —
the continuation is driven by tool results already in messages) and
pass it through buildAfterTurnRuntimeContext so engines now receive
currentTokenCount on the tool-loop assemble path too.

Test: extends attempt.spawn-workspace.context-engine.test.ts to capture
the loop-hook params and assert getRuntimeContext now returns a
positive currentTokenCount. 194/194 tests pass on the touched files.
EOF
)"
@DatPham-6996

Copy link
Copy Markdown
Author

Addressed [P2] in 14d6dca26d.

The production loop-hook install site at src/agents/pi-embedded-runner/run/attempt.ts:1996 now computes currentTokenCount via estimatePrePromptTokens({ messages, systemPrompt: systemPromptText, prompt: "" }) and passes it through buildAfterTurnRuntimeContext, so the loop-hook pass-through added in the previous commit now receives a real estimate in production rather than undefined.

prompt: "" mid-loop because the continuation is driven by tool results that are already appended to messages — there's no new user prompt for those iterations.

Test: new test in src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts ("populates currentTokenCount in the loop-hook getRuntimeContext callback") captures the loop-hook params, invokes the captured getRuntimeContext with synthetic loop messages, and asserts the returned runtime context contains currentTokenCount as a positive number. This locks in the wiring at the install site so a regression there fails the test.

Tally: 194/194 tests pass on the touched files. PR description updated to reflect the new bullet, target test file, and counts.

Real behavior proof from a live long-session run is still pending and will be posted before merge.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 27, 2026
@clawsweeper clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg: 🎁 locked until real behavior proof passes.

Details
  • No creature or rarity is rolled until proof passes.
  • Eggs are collectible flavor only; they do not affect labels, ratings, merge decisions, or automation.

@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label May 28, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation extensions: codex merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant