fix(agents): parse prompt_tokens/completion_tokens in CLI usage for llama.cpp compatibility (#77992) by Beandon13 · Pull Request #78085 · openclaw/openclaw

Beandon13 · 2026-05-05T21:50:26Z

Summary

toCliUsage() in cli-output.ts only recognized input_tokens/output_tokens (and camelCase aliases) from CLI runner output. llama.cpp and other OpenAI-compatible local providers return prompt_tokens/completion_tokens instead, which are the standard OpenAI field names.
Without the fallback, usage was silently dropped and context display showed ?/131k for all llama.cpp, Ollama, and similar OpenAI-compatible users.
Fix: add prompt_tokens → fallback for totalInput and completion_tokens → fallback for output in toCliUsage(). Both parseCliJson and parseCliJsonl route through this function, so all CLI output parsing paths are covered.

Closes #77992

Testing

pnpm vitest run src/agents/cli-output.test.ts

Real behavior proof

Behavior: Context display shows ?/131k with llama.cpp after upgrading to 2026.5.4 — field name mismatch causes usage to be silently dropped
Tested via targeted unit test added in this PR that exercises the exact llama.cpp response shape (prompt_tokens, completion_tokens, total_tokens).
What was not tested: live runtime — please apply maintainer proof: override or advise on evidence format.

clawsweeper · 2026-05-05T21:53:26Z

Codex review: needs real behavior proof before merge. Reviewed May 31, 2026, 1:17 AM ET / 05:17 UTC.

Summary
Adds CLI usage parsing fallbacks for prompt_tokens/promptTokens and completion_tokens/completionTokens, plus a JSONL regression fixture for llama.cpp-style usage.

PR surface: Source +7, Tests +37. Total +44 across 2 files.

Reproducibility: yes. at source level: feed the CLI JSONL parser a result event with usage.prompt_tokens and usage.completion_tokens, and current main cannot populate usage.input or usage.output. No live llama.cpp run is attached.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P1] Add redacted after-fix proof from a real llama.cpp/OpenAI-compatible CLI run showing the context/status usage no longer renders as unknown.
Use the repo validation lane for the focused test, such as node scripts/run-vitest.mjs src/agents/cli-output.test.ts in a Codex worktree or pnpm test src/agents/cli-output.test.ts in a normal checkout.

Proof guidance:

[P1] Needs real behavior proof before merge: The PR body provides a unit-test fixture only and explicitly says live runtime was not tested; contributor should add redacted terminal/log/screenshot/live output from a llama.cpp-compatible run, then update the PR body to trigger a fresh review or ask for @clawsweeper re-review.

Risk before merge

[P1] The PR body provides only targeted unit-test proof and explicitly says live runtime was not tested, so maintainers still lack after-fix evidence from a real llama.cpp or OpenAI-compatible CLI run.

Maintainer options:

Decide the mitigation before merge
Land the alias fallback after contributor-supplied real llama.cpp/OpenAI-compatible CLI proof, or after a maintainer records an explicit proof override with equivalent local proof.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

[P1] The remaining blocker is contributor real-behavior proof or an explicit maintainer proof override, not an automated code repair.

Security
Cleared: The diff only changes CLI usage parsing and a focused unit test; it does not touch secrets, dependencies, workflows, package scripts, or other supply-chain surfaces.

Review details

Best possible solution:

Land the alias fallback after contributor-supplied real llama.cpp/OpenAI-compatible CLI proof, or after a maintainer records an explicit proof override with equivalent local proof.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: feed the CLI JSONL parser a result event with usage.prompt_tokens and usage.completion_tokens, and current main cannot populate usage.input or usage.output. No live llama.cpp run is attached.

Is this the best way to solve the issue?

Yes, the PR's narrow fallback in shared CLI usage parsing is a maintainable fix for the reported field-name mismatch. A future cleanup could route CLI parsing through normalizeUsage(), but that is not required for this XS repair.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 51dee73a5d3e.

Label changes

Label changes:

add P2: This is a normal-priority regression fix for local OpenAI-compatible provider usage display with limited blast radius.
add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides a unit-test fixture only and explicitly says live runtime was not tested; contributor should add redacted terminal/log/screenshot/live output from a llama.cpp-compatible run, then update the PR body to trigger a fresh review or ask for @clawsweeper re-review.
remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Label justifications:

P2: This is a normal-priority regression fix for local OpenAI-compatible provider usage display with limited blast radius.
rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides a unit-test fixture only and explicitly says live runtime was not tested; contributor should add redacted terminal/log/screenshot/live output from a llama.cpp-compatible run, then update the PR body to trigger a fresh review or ask for @clawsweeper re-review.

Evidence reviewed

PR surface:

Source +7, Tests +37. Total +44 across 2 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	1	9	2	+7
Tests	1	37	0	+37
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	2	46	2	+44

What I checked:

Current main parser gap: toCliUsage() on current main only picks input_tokens/inputTokens for input and output_tokens/outputTokens for output, so OpenAI-compatible prompt_tokens/completion_tokens are not mapped by the CLI parser. (src/agents/cli-output.ts:147, 51dee73a5d3e)
CLI parsing path coverage: Both parseCliJson() and parseCliJsonl() route parsed records through readCliUsage(), so a fix in toCliUsage() covers the JSON and JSONL CLI output paths used by the runner. (src/agents/cli-output.ts:323, 51dee73a5d3e)
Context display depends on prompt-side usage: The CLI runner stores parsed usage as agentMeta.usage/lastCallUsage, and session storage derives totalTokens from prompt/input tokens rather than usage.total, so a payload with only total_tokens still cannot produce the context snapshot the report wants. (src/agents/command/session-store.ts:220, 51dee73a5d3e)
Existing usage contract: Current source and docs already treat prompt_tokens/completion_tokens as OpenAI-family usage fields, so accepting those names in CLI usage parsing is consistent with sibling usage normalization. (src/agents/usage.ts:53, 51dee73a5d3e)
PR implementation: The PR head adds the missing aliases in toCliUsage() and adds a focused JSONL fixture asserting llama.cpp-style prompt_tokens, completion_tokens, and total_tokens parse into CLI usage. (src/agents/cli-output.ts:147, 383946ff45ff)
Release/current-main check: No tag contains the PR head commit, and v2026.5.28 still shows the old totalInput/output lines without the prompt/completion aliases, so the useful change is not already shipped. (src/agents/cli-output.ts:147, e93216080aa1)

Likely related people:

@steipete: Blame on the current toCliUsage() helper points to 0be3ef5a38, and history shows 48ae976333 introduced the split CLI runner parser in the same files. (role: current parser/refactor owner; confidence: high; commits: 0be3ef5a383d, 48ae97633303, c39f061003f4; files: src/agents/cli-output.ts, src/agents/cli-output.test.ts)
@vincentkoc: c75f82448f added Gemini JSON response and stats parsing in the same CLI output parser/test files, including nearby usage normalization behavior. (role: recent CLI usage parser contributor; confidence: medium; commits: c75f82448fad; files: src/agents/cli-output.ts, src/agents/cli-output.test.ts)
@Lellansin: 2ccd1839f2 added real usage handling for OpenAI-compatible chat completions and tests around prompt_tokens/completion_tokens in the sibling usage path. (role: adjacent OpenAI-compatible usage contributor; confidence: medium; commits: 2ccd1839f212; files: src/agents/usage.ts, src/agents/usage.test.ts, src/gateway/openai-http.ts)
@Takhoffman: 079494aee5 recently reworked cached prompt-token accounting in src/agents/usage.ts, which is the sibling normalization path for the same field family. (role: adjacent usage normalization contributor; confidence: medium; commits: 079494aee559; files: src/agents/usage.ts, src/agents/usage.test.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

byungskers

This is a great compatibility fix for llama.cpp and other OpenAI-compatible local providers. The fallback chain for usage fields is well-structured — prioritizing the modern input/output fields first while gracefully falling back to prompt/completion_tokens. The test case with the detailed comment explaining the issue is especially helpful for future maintainers.

…t for llama.cpp (openclaw#77992) llama.cpp and other OpenAI-compatible local providers return usage as { prompt_tokens, completion_tokens } instead of { input_tokens, output_tokens }. The toCliUsage() function in cli-output.ts only accepted input_tokens / output_tokens (and their camelCase aliases), so llama.cpp usage was silently dropped and context display showed "?/131k" for all llama.cpp users. Add prompt_tokens and completion_tokens as fallback keys for totalInput and output respectively in toCliUsage(). Both parseCliJson and parseCliJsonl go through this function, so the fix covers all CLI output parsing paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

openclaw-barnacle · 2026-05-31T05:03:44Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

openclaw-barnacle Bot added agents Agent runtime and tooling size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 5, 2026

byungskers reviewed May 5, 2026

View reviewed changes

Beandon13 force-pushed the fix/openclaw-77992-llamacpp-usage-tokens branch from 1eae9bc to 383946f Compare May 6, 2026 12:46

This was referenced May 10, 2026

fix: robust token usage normalization for OpenAI-compatible providers #45535

Closed

[Bug] Context display shows ?/131k with llama.cpp after upgrading to 2026.5.4 — field name mismatch not resolved #77992

Closed

clawsweeper Bot mentioned this pull request May 22, 2026

[Bug]: token counting broken with local LLM, only user prompts counted #85165

Closed

openclaw-barnacle Bot added the stale Marked as stale due to inactivity label May 31, 2026

clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label May 31, 2026

barnacle-openclaw Bot removed the stale Marked as stale due to inactivity label May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): parse prompt_tokens/completion_tokens in CLI usage for llama.cpp compatibility (#77992)#78085

fix(agents): parse prompt_tokens/completion_tokens in CLI usage for llama.cpp compatibility (#77992)#78085
Beandon13 wants to merge 1 commit into
openclaw:mainfrom
Beandon13:fix/openclaw-77992-llamacpp-usage-tokens

Beandon13 commented May 5, 2026

Uh oh!

clawsweeper Bot commented May 5, 2026 •

edited

Loading

Uh oh!

byungskers left a comment

Uh oh!

openclaw-barnacle Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Beandon13 commented May 5, 2026

Summary

Testing

Real behavior proof

Uh oh!

clawsweeper Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

byungskers left a comment

Choose a reason for hiding this comment

Uh oh!

openclaw-barnacle Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented May 5, 2026 •

edited

Loading