Skip to content

[Bug] Context display shows ?/131k with llama.cpp after upgrading to 2026.5.4 — field name mismatch not resolved #77992

@MarTT79

Description

@MarTT79

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

After upgrading from OpenClaw 2026.2.26 to 2026.5.4, the context display shows '?/131k' instead of actual token usage when using llama.cpp as the model provider. OpenClaw expects 'input_tokens' and 'output_tokens' fields but llama.cpp returns 'prompt_tokens' and 'completion_tokens'.

Steps to reproduce

  1. Run OpenClaw 2026.5.4 with llama.cpp server as model backend (running locally on port 8080)
  2. Send a message through the Telegram channel
  3. Check the session status display - context shows '?/131k' instead of actual token count
  4. Verify the llama.cpp server returns usage with 'prompt_tokens' and 'completion_tokens' fields (OpenAI-compatible format)

Expected behavior

In OpenClaw 2026.2.26, the context display showed actual token usage (e.g., '45/131k'). The system should correctly parse llama.cpp's 'prompt_tokens' and 'completion_tokens' fields and display the real-time token usage rate.

Actual behavior

Context display shows '?/131k' (question mark instead of actual token count). OpenClaw fails to find the expected 'input_tokens' and 'output_tokens' fields because llama.cpp returns 'prompt_tokens' and 'completion_tokens' instead. This is the same issue reported in #53448 but still unfixed in 2026.5.4.

OpenClaw version

2026.5.4

Operating system

Linux Mint 22.1 (based on Ubuntu 24.04) - Linux 6.14.0-37-generic (x64)

Install method

No response

Model

llamacpp/Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf

Provider / routing chain

openclaw -> llamacpp (local llama-server on http://127.0.0.1:8080)

Additional provider/model setup details

llama.cpp server running locally on port 8080 with OpenAI-compatible API format. Model: Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf (131k context window). Configured in openclaw.json under models.providers.llamacpp.

Logs, screenshots, and evidence

Related issue: #53448 (reported March 24, 2026, still unfixed in 2026.5.4)

llama.cpp server returns usage in OpenAI-compatible format:
{
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 1,
    "total_tokens": 12
  }
}

OpenClaw expects 'input_tokens' and 'output_tokens' which don't exist in llama.cpp's response.

Impact and severity

Affected: All self-hosted OpenClaw users running llama.cpp or Ollama as local model provider
Severity: High - prevents accurate context monitoring and may cause context overflow without warning
Frequency: Always (100% of sessions with llama.cpp)
Consequence: LCM auto-compression may not trigger, context window can overflow silently, user cannot monitor token usage

Additional information

Last known good version: 2026.2.26
First known bad version: 2026.5.4

This is a regression that broke context tracking for llama.cpp users. The fix suggested in #53448 is straightforward - add fallback field name support:

input: response.usage?.prompt_tokens ?? response.usage?.input_tokens ?? 0,
output: response.usage?.completion_tokens ?? response.usage?.output_tokens ?? 0,

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions