Skip to content

fix(agents): parse prompt_tokens/completion_tokens in CLI usage for llama.cpp compatibility (#77992)#78085

Open
Beandon13 wants to merge 1 commit into
openclaw:mainfrom
Beandon13:fix/openclaw-77992-llamacpp-usage-tokens
Open

fix(agents): parse prompt_tokens/completion_tokens in CLI usage for llama.cpp compatibility (#77992)#78085
Beandon13 wants to merge 1 commit into
openclaw:mainfrom
Beandon13:fix/openclaw-77992-llamacpp-usage-tokens

Conversation

@Beandon13

Copy link
Copy Markdown
Contributor

Summary

  • toCliUsage() in cli-output.ts only recognized input_tokens/output_tokens (and camelCase aliases) from CLI runner output. llama.cpp and other OpenAI-compatible local providers return prompt_tokens/completion_tokens instead, which are the standard OpenAI field names.
  • Without the fallback, usage was silently dropped and context display showed ?/131k for all llama.cpp, Ollama, and similar OpenAI-compatible users.
  • Fix: add prompt_tokens → fallback for totalInput and completion_tokens → fallback for output in toCliUsage(). Both parseCliJson and parseCliJsonl route through this function, so all CLI output parsing paths are covered.

Closes #77992

Testing

  • pnpm vitest run src/agents/cli-output.test.ts

Real behavior proof

  • Behavior: Context display shows ?/131k with llama.cpp after upgrading to 2026.5.4 — field name mismatch causes usage to be silently dropped
  • Tested via targeted unit test added in this PR that exercises the exact llama.cpp response shape (prompt_tokens, completion_tokens, total_tokens).
  • What was not tested: live runtime — please apply maintainer proof: override or advise on evidence format.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 5, 2026
@clawsweeper

clawsweeper Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 31, 2026, 1:17 AM ET / 05:17 UTC.

Summary
Adds CLI usage parsing fallbacks for prompt_tokens/promptTokens and completion_tokens/completionTokens, plus a JSONL regression fixture for llama.cpp-style usage.

PR surface: Source +7, Tests +37. Total +44 across 2 files.

Reproducibility: yes. at source level: feed the CLI JSONL parser a result event with usage.prompt_tokens and usage.completion_tokens, and current main cannot populate usage.input or usage.output. No live llama.cpp run is attached.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted after-fix proof from a real llama.cpp/OpenAI-compatible CLI run showing the context/status usage no longer renders as unknown.
  • Use the repo validation lane for the focused test, such as node scripts/run-vitest.mjs src/agents/cli-output.test.ts in a Codex worktree or pnpm test src/agents/cli-output.test.ts in a normal checkout.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body provides a unit-test fixture only and explicitly says live runtime was not tested; contributor should add redacted terminal/log/screenshot/live output from a llama.cpp-compatible run, then update the PR body to trigger a fresh review or ask for @clawsweeper re-review.

Risk before merge

  • [P1] The PR body provides only targeted unit-test proof and explicitly says live runtime was not tested, so maintainers still lack after-fix evidence from a real llama.cpp or OpenAI-compatible CLI run.

Maintainer options:

  1. Decide the mitigation before merge
    Land the alias fallback after contributor-supplied real llama.cpp/OpenAI-compatible CLI proof, or after a maintainer records an explicit proof override with equivalent local proof.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] The remaining blocker is contributor real-behavior proof or an explicit maintainer proof override, not an automated code repair.

Security
Cleared: The diff only changes CLI usage parsing and a focused unit test; it does not touch secrets, dependencies, workflows, package scripts, or other supply-chain surfaces.

Review details

Best possible solution:

Land the alias fallback after contributor-supplied real llama.cpp/OpenAI-compatible CLI proof, or after a maintainer records an explicit proof override with equivalent local proof.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: feed the CLI JSONL parser a result event with usage.prompt_tokens and usage.completion_tokens, and current main cannot populate usage.input or usage.output. No live llama.cpp run is attached.

Is this the best way to solve the issue?

Yes, the PR's narrow fallback in shared CLI usage parsing is a maintainable fix for the reported field-name mismatch. A future cleanup could route CLI parsing through normalizeUsage(), but that is not required for this XS repair.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 51dee73a5d3e.

Label changes

Label changes:

  • add P2: This is a normal-priority regression fix for local OpenAI-compatible provider usage display with limited blast radius.
  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides a unit-test fixture only and explicitly says live runtime was not tested; contributor should add redacted terminal/log/screenshot/live output from a llama.cpp-compatible run, then update the PR body to trigger a fresh review or ask for @clawsweeper re-review.
  • remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority regression fix for local OpenAI-compatible provider usage display with limited blast radius.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides a unit-test fixture only and explicitly says live runtime was not tested; contributor should add redacted terminal/log/screenshot/live output from a llama.cpp-compatible run, then update the PR body to trigger a fresh review or ask for @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +7, Tests +37. Total +44 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 9 2 +7
Tests 1 37 0 +37
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 46 2 +44

What I checked:

  • Current main parser gap: toCliUsage() on current main only picks input_tokens/inputTokens for input and output_tokens/outputTokens for output, so OpenAI-compatible prompt_tokens/completion_tokens are not mapped by the CLI parser. (src/agents/cli-output.ts:147, 51dee73a5d3e)
  • CLI parsing path coverage: Both parseCliJson() and parseCliJsonl() route parsed records through readCliUsage(), so a fix in toCliUsage() covers the JSON and JSONL CLI output paths used by the runner. (src/agents/cli-output.ts:323, 51dee73a5d3e)
  • Context display depends on prompt-side usage: The CLI runner stores parsed usage as agentMeta.usage/lastCallUsage, and session storage derives totalTokens from prompt/input tokens rather than usage.total, so a payload with only total_tokens still cannot produce the context snapshot the report wants. (src/agents/command/session-store.ts:220, 51dee73a5d3e)
  • Existing usage contract: Current source and docs already treat prompt_tokens/completion_tokens as OpenAI-family usage fields, so accepting those names in CLI usage parsing is consistent with sibling usage normalization. (src/agents/usage.ts:53, 51dee73a5d3e)
  • PR implementation: The PR head adds the missing aliases in toCliUsage() and adds a focused JSONL fixture asserting llama.cpp-style prompt_tokens, completion_tokens, and total_tokens parse into CLI usage. (src/agents/cli-output.ts:147, 383946ff45ff)
  • Release/current-main check: No tag contains the PR head commit, and v2026.5.28 still shows the old totalInput/output lines without the prompt/completion aliases, so the useful change is not already shipped. (src/agents/cli-output.ts:147, e93216080aa1)

Likely related people:

  • @steipete: Blame on the current toCliUsage() helper points to 0be3ef5a38, and history shows 48ae976333 introduced the split CLI runner parser in the same files. (role: current parser/refactor owner; confidence: high; commits: 0be3ef5a383d, 48ae97633303, c39f061003f4; files: src/agents/cli-output.ts, src/agents/cli-output.test.ts)
  • @vincentkoc: c75f82448f added Gemini JSON response and stats parsing in the same CLI output parser/test files, including nearby usage normalization behavior. (role: recent CLI usage parser contributor; confidence: medium; commits: c75f82448fad; files: src/agents/cli-output.ts, src/agents/cli-output.test.ts)
  • @Lellansin: 2ccd1839f2 added real usage handling for OpenAI-compatible chat completions and tests around prompt_tokens/completion_tokens in the sibling usage path. (role: adjacent OpenAI-compatible usage contributor; confidence: medium; commits: 2ccd1839f212; files: src/agents/usage.ts, src/agents/usage.test.ts, src/gateway/openai-http.ts)
  • @Takhoffman: 079494aee5 recently reworked cached prompt-token accounting in src/agents/usage.ts, which is the sibling normalization path for the same field family. (role: adjacent usage normalization contributor; confidence: medium; commits: 079494aee559; files: src/agents/usage.ts, src/agents/usage.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@byungskers byungskers left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great compatibility fix for llama.cpp and other OpenAI-compatible local providers. The fallback chain for usage fields is well-structured — prioritizing the modern input/output fields first while gracefully falling back to prompt/completion_tokens. The test case with the detailed comment explaining the issue is especially helpful for future maintainers.

…t for llama.cpp (openclaw#77992)

llama.cpp and other OpenAI-compatible local providers return usage as
{ prompt_tokens, completion_tokens } instead of { input_tokens, output_tokens }.
The toCliUsage() function in cli-output.ts only accepted input_tokens /
output_tokens (and their camelCase aliases), so llama.cpp usage was silently
dropped and context display showed "?/131k" for all llama.cpp users.

Add prompt_tokens and completion_tokens as fallback keys for totalInput and
output respectively in toCliUsage(). Both parseCliJson and parseCliJsonl go
through this function, so the fix covers all CLI output parsing paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 31, 2026
@clawsweeper clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label May 31, 2026
@barnacle-openclaw barnacle-openclaw Bot removed the stale Marked as stale due to inactivity label May 31, 2026
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling P2 Normal backlog priority with limited blast radius. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: XS status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Context display shows ?/131k with llama.cpp after upgrading to 2026.5.4 — field name mismatch not resolved

2 participants