Description
Description
When using Kimi K2.6 (moonshotai/kimi-k2.6) via NVIDIA Endpoints and the
agent attempts a tool call that fails (e.g. web search without BRAVE_API_KEY),
the model's internal reasoning/chain-of-thought text leaks into the user-
visible output. The user sees paragraphs of "But wait, the user might be
testing..." internal deliberation instead of a clean response.
This only happens with complex prompts that trigger reasoning. Simple
prompts (math, factual Q&A) do not leak.
Environment
Device: MacBook Pro (Apple M4 Pro, 48 GB)
OS: macOS 26.0.1
NemoClaw: v0.0.36
OpenClaw: 2026.4.24
Model: moonshotai/kimi-k2.6
Provider: nvidia-prod (NVIDIA Endpoints)
Plugin: kimi-inference-compat v0.1.0 (auto-installed)
Steps to Reproduce
1. Onboard with Kimi K2.6: NEMOCLAW_MODEL=moonshotai/kimi-k2.6
2. Do NOT configure BRAVE_API_KEY
3. Run: openclaw agent -m "Search the web: What is the latest NVIDIA GPU?"
4. Observe the output
Expected Result
Agent responds with a clean answer, noting that web search is unavailable.
Internal reasoning tokens are filtered before display.
Actual Result
Output includes internal chain-of-thought:
"But wait, the user might be testing the web search functionality..."
"Actually, re-reading: this means they want to know..."
"I need to be careful. The user might be asking about..."
Multiple paragraphs of reasoning appear before the actual answer.
PR #3046 (support reasoning models in OpenClaw harness) may need
to filter thinking tokens from the kimi-inference-compat stream.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_Agent&Skills, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Inference |
[NVB#6154911]
Description
Description
Environment Steps to Reproduce Expected Result Actual ResultBug Details
[NVB#6154911]