Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
After upgrading from OpenClaw 2026.2.26 to 2026.5.4, the context display shows '?/131k' instead of actual token usage when using llama.cpp as the model provider. OpenClaw expects 'input_tokens' and 'output_tokens' fields but llama.cpp returns 'prompt_tokens' and 'completion_tokens'.
Steps to reproduce
- Run OpenClaw 2026.5.4 with llama.cpp server as model backend (running locally on port 8080)
- Send a message through the Telegram channel
- Check the session status display - context shows '?/131k' instead of actual token count
- Verify the llama.cpp server returns usage with 'prompt_tokens' and 'completion_tokens' fields (OpenAI-compatible format)
Expected behavior
In OpenClaw 2026.2.26, the context display showed actual token usage (e.g., '45/131k'). The system should correctly parse llama.cpp's 'prompt_tokens' and 'completion_tokens' fields and display the real-time token usage rate.
Actual behavior
Context display shows '?/131k' (question mark instead of actual token count). OpenClaw fails to find the expected 'input_tokens' and 'output_tokens' fields because llama.cpp returns 'prompt_tokens' and 'completion_tokens' instead. This is the same issue reported in #53448 but still unfixed in 2026.5.4.
OpenClaw version
2026.5.4
Operating system
Linux Mint 22.1 (based on Ubuntu 24.04) - Linux 6.14.0-37-generic (x64)
Install method
No response
Model
llamacpp/Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf
Provider / routing chain
openclaw -> llamacpp (local llama-server on http://127.0.0.1:8080)
Additional provider/model setup details
llama.cpp server running locally on port 8080 with OpenAI-compatible API format. Model: Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf (131k context window). Configured in openclaw.json under models.providers.llamacpp.
Logs, screenshots, and evidence
Related issue: #53448 (reported March 24, 2026, still unfixed in 2026.5.4)
llama.cpp server returns usage in OpenAI-compatible format:
{
"usage": {
"prompt_tokens": 11,
"completion_tokens": 1,
"total_tokens": 12
}
}
OpenClaw expects 'input_tokens' and 'output_tokens' which don't exist in llama.cpp's response.
Impact and severity
Affected: All self-hosted OpenClaw users running llama.cpp or Ollama as local model provider
Severity: High - prevents accurate context monitoring and may cause context overflow without warning
Frequency: Always (100% of sessions with llama.cpp)
Consequence: LCM auto-compression may not trigger, context window can overflow silently, user cannot monitor token usage
Additional information
Last known good version: 2026.2.26
First known bad version: 2026.5.4
This is a regression that broke context tracking for llama.cpp users. The fix suggested in #53448 is straightforward - add fallback field name support:
input: response.usage?.prompt_tokens ?? response.usage?.input_tokens ?? 0,
output: response.usage?.completion_tokens ?? response.usage?.output_tokens ?? 0,
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
After upgrading from OpenClaw 2026.2.26 to 2026.5.4, the context display shows '?/131k' instead of actual token usage when using llama.cpp as the model provider. OpenClaw expects 'input_tokens' and 'output_tokens' fields but llama.cpp returns 'prompt_tokens' and 'completion_tokens'.
Steps to reproduce
Expected behavior
In OpenClaw 2026.2.26, the context display showed actual token usage (e.g., '45/131k'). The system should correctly parse llama.cpp's 'prompt_tokens' and 'completion_tokens' fields and display the real-time token usage rate.
Actual behavior
Context display shows '?/131k' (question mark instead of actual token count). OpenClaw fails to find the expected 'input_tokens' and 'output_tokens' fields because llama.cpp returns 'prompt_tokens' and 'completion_tokens' instead. This is the same issue reported in #53448 but still unfixed in 2026.5.4.
OpenClaw version
2026.5.4
Operating system
Linux Mint 22.1 (based on Ubuntu 24.04) - Linux 6.14.0-37-generic (x64)
Install method
No response
Model
llamacpp/Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf
Provider / routing chain
openclaw -> llamacpp (local llama-server on http://127.0.0.1:8080)
Additional provider/model setup details
llama.cpp server running locally on port 8080 with OpenAI-compatible API format. Model: Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf (131k context window). Configured in openclaw.json under models.providers.llamacpp.
Logs, screenshots, and evidence
Impact and severity
Affected: All self-hosted OpenClaw users running llama.cpp or Ollama as local model provider
Severity: High - prevents accurate context monitoring and may cause context overflow without warning
Frequency: Always (100% of sessions with llama.cpp)
Consequence: LCM auto-compression may not trigger, context window can overflow silently, user cannot monitor token usage
Additional information
Last known good version: 2026.2.26
First known bad version: 2026.5.4
This is a regression that broke context tracking for llama.cpp users. The fix suggested in #53448 is straightforward - add fallback field name support:
input: response.usage?.prompt_tokens ?? response.usage?.input_tokens ?? 0,
output: response.usage?.completion_tokens ?? response.usage?.output_tokens ?? 0,