Bug type
Behavior bug (incorrect output/state without crash)
Summary
OpenClaw GitHub Issue - Context Usage Bug
Issue Title
[bug] llama-cpp and Ollama providers return incorrect context usage due to field name mismatch
Issue Content
Problem Description
OpenClaw fails to accurately track token usage due to mismatched field names between expected and actual API responses, causing context usage to display as 0/80k (0%) even when the model is actively consuming significant tokens.
Environment:
- 🦞 OpenClaw: 2026.3.23-1
- 🧠 Model: llama-cpp/qwen35b-local
- 📚 Context Display: 0/80k (0%)
- 🧵 Session: agent:main:main
- 🪢 Runtime: direct
Affected Frameworks
| Framework |
Status |
Notes |
| ❌ llama.cpp server |
AFFECTED |
Most common local deployment solution |
| ❌ Ollama |
AFFECTED |
Popular model management service |
| ✅ vLLM |
NOT AFFECTED |
Compatible (OpenAI format) |
| ✅ HuggingFace TGI |
NOT AFFECTED |
Compatible (OpenAI format) |
| ✅ OpenAI API |
NOT AFFECTED |
Compatible (OpenAI format) |
Root Cause
OpenClaw expects these field names at line ~181675:
input: response.usage?.input_tokens ?? 0,
output: response.usage?.output_tokens ?? 0,
However, different frameworks return different field names:
llama.cpp server (OpenAI-compatible format)
{
"usage": {
"prompt_tokens": 11,
"completion_tokens": 1,
"total_tokens": 12
}
}
Ollama (custom format)
{
"prompt_eval_count": 26,
"eval_count": 259
}
vLLM / TGI / OpenAI (OpenAI standard format)
{
"usage": {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150
}
}
Real-World Case
User Configuration:
- OpenClaw Display:
0/80k (0%)
- Remote llama-server (192.168.3.77:8080) Actual Usage:
43250/80000 (54%)
Cause: llama.cpp server returns prompt_tokens, but OpenClaw expects input_tokens.
Chain Reactions from Failed Context Statistics
1. Context Window Overflow Risk
Due to inability to accurately track token usage:
Chain Reactions:
- User cannot see real-time token usage rate
- Cannot determine if conversation is approaching the 80k context limit
- May lead to:
- Model truncation: Ultra-long conversations are forcibly truncated
- Quality degradation: Context overflow causes model to forget early conversation
- Session crash: API returns errors after exceeding limits
Actual Impact:
- In long conversation scenarios, users may encounter context overflow without warning
- Important conversation content may be lost
2. Conversation Management Failure
OpenClaw's conversation management mechanisms rely on accurate token counting:
Chain Reactions:
-
Auto-compression mechanism fails:
- OpenClaw may decide to compress historical messages based on token usage rate
- If count is 0, compression never triggers
- Leads to unlimited accumulation of historical messages, eventually causing memory overflow
-
Session reset strategy fails:
- Under some configurations, sessions automatically reset when token usage reaches a threshold
- Due to count being 0, reset never triggers
- Leads to uncontrolled session length
-
Resource waste:
- Cannot accurately evaluate token cost per session
- May lead to unnecessary long conversations
3. Cost Monitoring Failure
Even with free local models, token statistics are important performance metrics:
Chain Reactions:
-
Performance analysis difficulty:
- Cannot analyze token consumption across different conversations
- Cannot identify abnormally high token usage patterns
- Difficult to optimize conversation strategies
-
Multi-model comparison fails:
- If multiple model backends exist, cannot fairly compare token efficiency
- Cannot make model switching decisions based on token usage
-
API quota monitoring fails (if using paid APIs):
- Cannot accurately track API quota usage
- May unexpectedly exceed quota causing service interruption
4. LCM (Lossless Context Management) Function Abnormalities
OpenClaw's LCM system relies on token statistics to manage conversation history:
Chain Reactions:
-
Historical message compression strategy fails:
- LCM decides whether to compress history based on token usage rate
- When count is 0, compression never triggers
- Leads to uncontrolled memory usage
-
Context optimization fails:
- LCM cannot intelligently retain important conversations
- May lead to important information being discarded too early
-
Search and retrieval functionality affected:
- LCM's search function may rely on token statistics
- Leads to inaccurate search results
5. User Experience Degradation
Chain Reactions:
-
User confusion:
- See
0/80k (0%) display
- User cannot determine conversation status
- May mistakenly think system is malfunctioning
-
Trust reduction:
- Key metrics display incorrectly
- User may question the reliability of the entire system
-
Cannot optimize conversation strategy:
- User cannot adjust conversation methods based on token usage
- Cannot learn how to efficiently use the context window
6. Diagnosis and Debugging Difficulty
Chain Reactions:
-
Problem troubleshooting difficulty:
- If conversation anomalies occur, cannot locate issues through token statistics
- Increases troubleshooting time costs
-
Performance optimization blocked:
- Cannot perform performance optimization based on token statistics
- Difficult to identify performance bottlenecks
-
Automated testing fails:
- Automated tests may rely on token statistics as success metrics
- Leads to inaccurate test results
7. Resource Allocation Issues in Multi-User/Multi-Session Scenarios
If multiple users or concurrent sessions exist:
Chain Reactions:
-
Unequal resource allocation:
- Cannot accurately track token usage per session
- Leads to some sessions consuming excessive resources
-
Service quality degradation:
- Some sessions may respond slowly due to resource exhaustion
- Affects overall user experience
-
Quota management difficult to implement:
- Cannot fairly allocate token quotas
- May lead to certain users monopolizing resources
Problem Severity Assessment
| Issue |
Severity |
Affected Scope |
Probability |
| Context window overflow |
🔴 High |
All long conversations |
High |
| Conversation management failure |
🟡 Medium |
LCM users |
Medium |
| Cost monitoring failure |
🟡 Medium |
All users |
High |
| LCM function abnormality |
🔴 High |
LCM users |
High |
| User experience degradation |
🟢 Low |
All users |
High |
| Diagnosis difficulty |
🟡 Medium |
Developers/Advanced users |
Medium |
| Resource allocation issues |
🟡 Medium |
Multi-user scenarios |
Medium |
Overall Severity: 🔴 High
Case Study 1: Long Conversation Leading to Content Loss
User Scenario:
- Conducting a 50+ turn technical discussion
- OpenClaw Display:
0/80k (0%)
- Actual llama-server Usage:
65000/80000 (81%)
Result:
- User thought there was still ample context available
- Continued conversation until model started truncating early content
- Key information from technical discussion was forgotten
- Conversation quality deteriorated rapidly
Case Study 2: LCM Compression Mechanism Failure
User Scenario:
- Configured automatic compression of historical messages
- Expected compression to trigger when token usage reached 70%
Result:
- Due to count being 0, compression never triggered
- Historical messages accumulated infinitely
- Eventually led to excessive memory usage and slow system response
Code Location
File: ~/.npm-global/lib/node_modules/openclaw/dist/pi-embedded-CwMQzdKD.js
Line: ~181675 (exact line may vary by version)
Test Steps
- Configure llama.cpp server as model backend
- Send a test message
- Check if context display updates
Expected Result:
- Display actual token usage rate
- Example:
12/80k (15%) instead of 0/80k (0%)
Environment Information
| Item |
Value |
| OpenClaw Version |
2026.3.23-1 |
| Remote llama-server |
192.168.3.77:8080 |
| Model |
Qwen3.5-35B-A3B-GGUF |
| Operating System |
macOS (user) / Ubuntu 24.04 (server) |
| llama.cpp Version |
8419 (commit: 509a31d00) |
| Model File |
unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf |
| GPU |
NVIDIA GeForce RTX 3090 (24GB) |
Recommended Solution
Modify OpenClaw code to support multiple field name formats:
// Before
input: response.usage?.input_tokens ?? 0,
output: response.usage?.output_tokens ?? 0,
// After - Support all formats
input: response.usage?.prompt_tokens ??
response.usage?.input_tokens ??
response.usage?.prompt_eval_count ?? 0,
output: response.usage?.completion_tokens ??
response.usage?.output_tokens ??
response.usage?.eval_count ?? 0,
This solution:
- ✅ Backward compatible with all existing configurations
- ✅ Supports llama.cpp server, Ollama, vLLM, and other frameworks
- ✅ Zero configuration, works out of the box
Expected Fix Priority
Recommended: HIGH
This issue has wide-ranging impact and may cause severe user experience problems.
Server Information
192.168.3.77 Server Details
Basic Information:
- Hostname: vllm-server
- IP Address: 192.168.3.77
- OS: Ubuntu 24.04 (Linux 6.8.0-106-generic)
- Architecture: x86_64
- Uptime: 3 days 13 hours
Hardware:
- GPU: NVIDIA GeForce RTX 3090 (24GB VRAM)
- System Memory: 62GB
- Disk: 836GB (138GB used, 656GB available)
Software:
- llama.cpp Version: 8419 (commit: 509a31d00)
- GCC Version: 13.3.0
- NVIDIA Driver: 580.126.09
- CUDA Support: Enabled
llama-server Configuration:
/home/XXX/llama.cpp/build/bin/llama-server \
-m /home/XXX/.cache/llama.cpp/unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf \
--mmproj /home/XXX/.cache/llama.cpp/mmproj-F16.gguf \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
-ngl 99 \
-np 1 \
-fa on \
--ctx-size 96000 \
--image-min-tokens 1024 \
--image-max-tokens 4096 \
--host 0.0.0.0 \
--port 8080
Key Configuration Notes:
- Context window: 96,000 tokens (configured)
- Model size: 21 GB (Q4_K_XL quantized)
- GPU layers: 99 (all layers on GPU)
- Flash Attention: Enabled
References
Steps to reproduce
telegrem /status
Expected behavior
- 📚 Context Display: 10/100k (10%)
Actual behavior
- 📚 Context Display: 0/100k (0%)
OpenClaw version
2026.3.8~2026.3.23
Operating system
macos12.7 llam8419 (commit: 509a31d00)
Install method
npm
Model
unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf
Provider / routing chain
openclaw---->llama-server
Additional provider/model setup details
llama.cpp server (OpenAI-compatible format)
{
"usage": {
"prompt_tokens": 11,
"completion_tokens": 1,
"total_tokens": 12
}
}
Ollama (custom format)
{
"prompt_eval_count": 26,
"eval_count": 259
}
vLLM / TGI / OpenAI (OpenAI standard format)
{
"usage": {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150
}
}
| Framework |
Status |
Notes |
| ❌ llama.cpp server |
AFFECTED |
Most common local deployment solution |
| ❌ Ollama |
AFFECTED |
Popular model management service |
| ✅ vLLM |
NOT AFFECTED |
Compatible (OpenAI format) |
| ✅ HuggingFace TGI |
NOT AFFECTED |
Compatible (OpenAI format) |
| ✅ OpenAI API |
NOT AFFECTED |
Compatible (OpenAI format) |
Logs, screenshots, and evidence
Root Cause
OpenClaw expects these field names at line ~181675:
input: response.usage?.input_tokens ?? 0,
output: response.usage?.output_tokens ?? 0,
However, different frameworks return different field names:
llama.cpp server (OpenAI-compatible format)
{
"usage": {
"prompt_tokens": 11,
"completion_tokens": 1,
"total_tokens": 12
}
}
Ollama (custom format)
{
"prompt_eval_count": 26,
"eval_count": 259
}
vLLM / TGI / OpenAI (OpenAI standard format)
{
"usage": {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150
}
}
Real-World Case
User Configuration:
• OpenClaw Display: 0/80k (0%)
• Remote llama-server (192.168.3.77:8080) Actual Usage: 43250/80000 (54%)
Cause: llama.cpp server returns prompt_tokens, but OpenClaw expects
Impact and severity
- Context Window Overflow Risk
Due to inability to accurately track token usage:
- User cannot see real-time token usage rate
- Cannot determine if conversation is approaching the 80k context limit
- May lead to:
• Model truncation: Ultra-long conversations are forcibly truncated
• Quality degradation: Context overflow causes model to forget early conversation
• Session crash: API returns errors after exceeding limits
Actual Impact:
• In long conversation scenarios, users may encounter context overflow without warning
• Important conversation content may be lost
───
- Conversation Management Failure
OpenClaw's conversation management mechanisms rely on accurate token counting:
- Auto-compression mechanism fails:
• OpenClaw may decide to compress historical messages based on token usage rate
• If count is 0, compression never triggers
• Leads to unlimited accumulation of historical messages, eventually causing memory overflow
- Session reset strategy fails:
• Under some configurations, sessions automatically reset when token usage reaches a threshold
• Due to count being 0, reset never triggers
• Leads to uncontrolled session length
- Resource waste:
• Cannot accurately evaluate token cost per session
• May lead to unnecessary long conversations
───
- Cost Monitoring Failure
Even with free local models, token statistics are important performance metrics:
- Performance analysis difficulty:
• Cannot analyze token consumption across different conversations
• Cannot identify abnormally high token usage patterns
• Difficult to optimize conversation strategies
- Multi-model comparison fails:
• If multiple model backends exist, cannot fairly compare token efficiency
• Cannot make model switching decisions based on token usage
- API quota monitoring fails (if using paid APIs):
• Cannot accurately track API quota usage
• May unexpectedly exceed quota causing service interruption
- LCM (Lossless Context Management) Function Abnormalities
OpenClaw's LCM system relies on token statistics to manage conversation history:
- Historical message compression strategy fails:
• LCM decides whether to compress history based on token usage rate
• When count is 0, compression never triggers
• Leads to uncontrolled memory usage
- Context optimization fails:
• LCM cannot intelligently retain important conversations
• May lead to important information being discarded too early
- Search and retrieval functionality affected:
• LCM's search function may rely on token statistics
• Leads to inaccurate search results
───
-
User Experience Degradation
-
User confusion:
• See 0/80k (0%) display
• User cannot determine conversation status
• May mistakenly think system is malfunctioning
-
Trust reduction:
• Key metrics display incorrectly
• User may question the reliability of the entire system
-
Cannot optimize conversation strategy:
• User cannot adjust conversation methods based on token usage
• Cannot learn how to efficiently use the context window
───
-
Diagnosis and Debugging Difficulty
-
Problem troubleshooting difficulty:
• If conversation anomalies occur, cannot locate issues through token statistics
• Increases troubleshooting time costs
-
Performance optimization blocked:
• Cannot perform performance optimization based on token statistics
• Difficult to identify performance bottlenecks
-
Automated testing fails:
• Automated tests may rely on token statistics as success metrics
• Leads to inaccurate test results
───
- Resource Allocation Issues in Multi-User/Multi-Session Scenarios
If multiple users or concurrent sessions exist:
- Unequal resource allocation:
• Cannot accurately track token usage per session
• Leads to some sessions consuming excessive resources
- Service quality degradation:
• Some sessions may respond slowly due to resource exhaustion
• Affects overall user experience
- Quota management difficult to implement:
• Cannot fairly allocate token quotas
• May lead to certain users monopolizing resources
| Issue | Severity | Affected Scope | Probability |
| ------------------------------- | --------- | ------------------------- | ----------- |
| Context window overflow | 🔴 High | All long conversations | High |
| Conversation management failure | 🟡 Medium | LCM users | Medium |
| Cost monitoring failure | 🟡 Medium | All users | High |
| LCM function abnormality | 🔴 High | LCM users | High |
| User experience degradation | 🟢 Low | All users | High |
| Diagnosis difficulty | 🟡 Medium | Developers/Advanced users | Medium |
| Resource allocation issues | 🟡 Medium | Multi-user scenarios | Medium |
Additional information
No response
Bug type
Behavior bug (incorrect output/state without crash)
Summary
OpenClaw GitHub Issue - Context Usage Bug
Issue Title
Issue Content
Problem Description
OpenClaw fails to accurately track token usage due to mismatched field names between expected and actual API responses, causing context usage to display as
0/80k (0%)even when the model is actively consuming significant tokens.Environment:
Affected Frameworks
Root Cause
OpenClaw expects these field names at line ~181675:
However, different frameworks return different field names:
llama.cpp server (OpenAI-compatible format)
{ "usage": { "prompt_tokens": 11, "completion_tokens": 1, "total_tokens": 12 } }Ollama (custom format)
{ "prompt_eval_count": 26, "eval_count": 259 }vLLM / TGI / OpenAI (OpenAI standard format)
{ "usage": { "prompt_tokens": 100, "completion_tokens": 50, "total_tokens": 150 } }Real-World Case
User Configuration:
0/80k (0%)43250/80000 (54%)Cause: llama.cpp server returns
prompt_tokens, but OpenClaw expectsinput_tokens.Chain Reactions from Failed Context Statistics
1. Context Window Overflow Risk
Due to inability to accurately track token usage:
Chain Reactions:
Actual Impact:
2. Conversation Management Failure
OpenClaw's conversation management mechanisms rely on accurate token counting:
Chain Reactions:
Auto-compression mechanism fails:
Session reset strategy fails:
Resource waste:
3. Cost Monitoring Failure
Even with free local models, token statistics are important performance metrics:
Chain Reactions:
Performance analysis difficulty:
Multi-model comparison fails:
API quota monitoring fails (if using paid APIs):
4. LCM (Lossless Context Management) Function Abnormalities
OpenClaw's LCM system relies on token statistics to manage conversation history:
Chain Reactions:
Historical message compression strategy fails:
Context optimization fails:
Search and retrieval functionality affected:
5. User Experience Degradation
Chain Reactions:
User confusion:
0/80k (0%)displayTrust reduction:
Cannot optimize conversation strategy:
6. Diagnosis and Debugging Difficulty
Chain Reactions:
Problem troubleshooting difficulty:
Performance optimization blocked:
Automated testing fails:
7. Resource Allocation Issues in Multi-User/Multi-Session Scenarios
If multiple users or concurrent sessions exist:
Chain Reactions:
Unequal resource allocation:
Service quality degradation:
Quota management difficult to implement:
Problem Severity Assessment
Overall Severity: 🔴 High
Case Study 1: Long Conversation Leading to Content Loss
User Scenario:
0/80k (0%)65000/80000 (81%)Result:
Case Study 2: LCM Compression Mechanism Failure
User Scenario:
Result:
Code Location
File:
~/.npm-global/lib/node_modules/openclaw/dist/pi-embedded-CwMQzdKD.jsLine: ~181675 (exact line may vary by version)
Test Steps
Expected Result:
12/80k (15%)instead of0/80k (0%)Environment Information
Recommended Solution
Modify OpenClaw code to support multiple field name formats:
This solution:
Expected Fix Priority
Recommended: HIGH
This issue has wide-ranging impact and may cause severe user experience problems.
Server Information
192.168.3.77 Server Details
Basic Information:
Hardware:
Software:
llama-server Configuration:
Key Configuration Notes:
References
Steps to reproduce
telegrem /status
Expected behavior
Actual behavior
OpenClaw version
2026.3.8~2026.3.23
Operating system
macos12.7 llam8419 (commit: 509a31d00)
Install method
npm
Model
unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf
Provider / routing chain
openclaw---->llama-server
Additional provider/model setup details
llama.cpp server (OpenAI-compatible format)
{
"usage": {
"prompt_tokens": 11,
"completion_tokens": 1,
"total_tokens": 12
}
}
Ollama (custom format)
{
"prompt_eval_count": 26,
"eval_count": 259
}
vLLM / TGI / OpenAI (OpenAI standard format)
{
"usage": {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150
}
}
Logs, screenshots, and evidence
Impact and severity
Due to inability to accurately track token usage:
• Model truncation: Ultra-long conversations are forcibly truncated
• Quality degradation: Context overflow causes model to forget early conversation
• Session crash: API returns errors after exceeding limits
Actual Impact:
• In long conversation scenarios, users may encounter context overflow without warning
• Important conversation content may be lost
───
OpenClaw's conversation management mechanisms rely on accurate token counting:
• OpenClaw may decide to compress historical messages based on token usage rate
• If count is 0, compression never triggers
• Leads to unlimited accumulation of historical messages, eventually causing memory overflow
• Under some configurations, sessions automatically reset when token usage reaches a threshold
• Due to count being 0, reset never triggers
• Leads to uncontrolled session length
• Cannot accurately evaluate token cost per session
• May lead to unnecessary long conversations
───
Even with free local models, token statistics are important performance metrics:
• Cannot analyze token consumption across different conversations
• Cannot identify abnormally high token usage patterns
• Difficult to optimize conversation strategies
• If multiple model backends exist, cannot fairly compare token efficiency
• Cannot make model switching decisions based on token usage
• Cannot accurately track API quota usage
• May unexpectedly exceed quota causing service interruption
OpenClaw's LCM system relies on token statistics to manage conversation history:
• LCM decides whether to compress history based on token usage rate
• When count is 0, compression never triggers
• Leads to uncontrolled memory usage
• LCM cannot intelligently retain important conversations
• May lead to important information being discarded too early
• LCM's search function may rely on token statistics
• Leads to inaccurate search results
───
User Experience Degradation
User confusion:
• See 0/80k (0%) display
• User cannot determine conversation status
• May mistakenly think system is malfunctioning
Trust reduction:
• Key metrics display incorrectly
• User may question the reliability of the entire system
Cannot optimize conversation strategy:
• User cannot adjust conversation methods based on token usage
• Cannot learn how to efficiently use the context window
───
Diagnosis and Debugging Difficulty
Problem troubleshooting difficulty:
• If conversation anomalies occur, cannot locate issues through token statistics
• Increases troubleshooting time costs
Performance optimization blocked:
• Cannot perform performance optimization based on token statistics
• Difficult to identify performance bottlenecks
Automated testing fails:
• Automated tests may rely on token statistics as success metrics
• Leads to inaccurate test results
───
If multiple users or concurrent sessions exist:
• Cannot accurately track token usage per session
• Leads to some sessions consuming excessive resources
• Some sessions may respond slowly due to resource exhaustion
• Affects overall user experience
• Cannot fairly allocate token quotas
• May lead to certain users monopolizing resources
| Issue | Severity | Affected Scope | Probability |
| ------------------------------- | --------- | ------------------------- | ----------- |
| Context window overflow | 🔴 High | All long conversations | High |
| Conversation management failure | 🟡 Medium | LCM users | Medium |
| Cost monitoring failure | 🟡 Medium | All users | High |
| LCM function abnormality | 🔴 High | LCM users | High |
| User experience degradation | 🟢 Low | All users | High |
| Diagnosis difficulty | 🟡 Medium | Developers/Advanced users | Medium |
| Resource allocation issues | 🟡 Medium | Multi-user scenarios | Medium |
Additional information
No response