fix(inference): suppress kimi reasoning output by yimoj · Pull Request #3244 · NVIDIA/NemoClaw

yimoj · 2026-05-08T07:29:14Z

Summary

Suppresses Kimi K2.6 managed inference thinking/reasoning output so failed tool-call turns do not leak provider reasoning into user-visible agent output. The runtime request patch now disables Kimi thinking and the Kimi OpenClaw compatibility plugin defensively strips reasoning fields/events.

Related Issue

Fixes #3177

Changes

Add Kimi K2.6 to the managed inference request preload and merge chat_template_kwargs.thinking=false.
Filter Kimi reasoning/thinking fields, blocks, and stream events in the Kimi compatibility plugin while preserving text and tool-call deltas.
Align Kimi onboarding probes with the runtime thinking suppression payload.
Add regression tests for non-streaming and streaming failed-tool-call responses with reasoning fields.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Targeted tests and the Kimi compatibility E2E passed. Full hook/test suites were run but are not marked passing because unrelated environment/test failures remain (test/fetch-guard-patch-regression.test.ts host permission issue; earlier installer-preflight failures in the full run).

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Additional verification run:

npm test -- test/kimi-inference-compat-plugin.test.ts test/nemotron-inference-fix.test.ts src/lib/inference/onboard-probes.test.ts test/generate-openclaw-config.test.ts test/validate-config-schemas.test.ts
npm run typecheck:cli
NEMOCLAW_SANDBOX_NAME=e2e-kimi-3177 NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 NEMOCLAW_YES=1 bash test/e2e/test-kimi-inference-compat.sh

Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

New Features
- Added reasoning and thinking content filtering for Kimi K2.6 model outputs to ensure cleaner responses.
Bug Fixes
- Disabled thinking mode for Kimi K2.6 to prevent thinking-only outputs and improve response quality.
- Enhanced compatibility for Kimi K2.6 managed inference integration.
Tests
- Updated test coverage for Kimi K2.6 payload validation and reasoning output handling.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

coderabbitai · 2026-05-08T07:29:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6a3ec1a8-3ddf-4985-a367-a2d04daa66fa

📥 Commits

Reviewing files that changed from the base of the PR and between b1320d5 and bf6c389.

📒 Files selected for processing (8)

nemoclaw-blueprint/model-specific-setup/openclaw/kimi-k2.6-managed-inference.json
nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js
nemoclaw-blueprint/scripts/nemotron-inference-fix.js
scripts/nemoclaw-start.sh
src/lib/inference/onboard-probes.test.ts
src/lib/inference/onboard-probes.ts
test/kimi-inference-compat-plugin.test.ts
test/nemotron-inference-fix.test.ts

📝 Walkthrough

Walkthrough

This PR addresses Kimi K2.6 reasoning output leakage by implementing two coordinated suppression mechanisms: request-time injection of thinking: false to disable reasoning generation at the inference API, and stream-time filtering to remove any reasoning content that still appears in responses or failures.

Changes

Kimi K2.6 Reasoning Output Suppression

Layer / File(s)	Summary
Model Configuration & Probe Setup `nemoclaw-blueprint/model-specific-setup/openclaw/kimi-k2.6-managed-inference.json`, `src/lib/inference/onboard-probes.ts`, `src/lib/inference/onboard-probes.test.ts`, `scripts/nemoclaw-start.sh`	Kimi K2.6 model manifest updated to document reasoning-output normalization. Probe payload for Kimi now includes `chat_template_kwargs: { thinking: false }` alongside existing `max_tokens` configuration. Probe test expectations and startup documentation both updated to reflect multi-model (DeepSeek V4 Pro + Kimi K2.6) inference fix scope.
Request Interception & Template Injection `nemoclaw-blueprint/scripts/nemotron-inference-fix.js`	Preload intercepts `POST /v1/chat/completions` requests to Kimi K2.6 and conditionally initializes `chat_template_kwargs` (preserving existing non-array objects), then sets `thinking: false` alongside existing Nemotron model handling. Updated documentation clarifies the expanded model set and request shape.
Stream & Output Reasoning Filtering `nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js`	Kimi inference compatibility plugin now defines reasoning-field detection (constants for field names, event types, content block types) and adds utilities to recursively strip reasoning content from messages, choices, and streamed events. Stream chunk processing updated to apply reasoning filtering in addition to existing safe exec tool-call rewriting. Provider description and `__testing` exports expanded.
Integration Tests & Validation `test/kimi-inference-compat-plugin.test.ts`, `test/nemotron-inference-fix.test.ts`	New test helpers construct failed tool-result scenarios with Kimi-specific reasoning fields. Tests verify final-message filtering removes reasoning while preserving visible content, streaming filtering drops reasoning deltas while preserving content/tool-call deltas, and request injection correctly appends `thinking: false` without overwriting existing template kwargs. Serialization checks confirm no "PRIVATE" reasoning fields leak into outputs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A cottontail's ode to thinking suppressed
When Kimi mused too loud, we passed the test,
With thinking: false whispered soft and low,
And filters that catch what the streams bestow,
The reasoning thoughts stay private, hushed—
User-facing answers, pristine and flushed! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: suppressing Kimi reasoning output, which directly addresses the primary objective of preventing thinking/reasoning content leakage.
Linked Issues check	✅ Passed	Changes comprehensively address `#3177` requirements: runtime thinking suppression via chat_template_kwargs.thinking=false injection, reasoning field filtering in the Kimi plugin, and aligned onboarding probes with regression tests for failed tool calls.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to Kimi K2.6 reasoning suppression: managed inference config, Kimi compatibility plugin filtering, onboarding probe alignment, nemotron fix script, and corresponding tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ericksoa

Approved. I verified the Kimi path disables thinking in the runtime preload and onboarding probe payload, and the OpenClaw Kimi compat plugin strips reasoning/thinking fields while preserving normal content and tool-call deltas. Also checked the broader preload remains scoped to known affected models so non-Kimi/Nemotron/DeepSeek requests pass through unchanged.\n\nValidation on a local no-commit merge with current main: npm run build:cli; npm run typecheck:cli; npm test -- test/kimi-inference-compat-plugin.test.ts test/nemotron-inference-fix.test.ts src/lib/inference/onboard-probes.test.ts test/generate-openclaw-config.test.ts test/validate-config-schemas.test.ts; bash -n scripts/nemoclaw-start.sh; git diff --check.

## Summary - Add an E2E Advisor workflow that recommends required/optional E2E tests for PRs. - Add a semantic Pi analysis layer using the NVIDIA inference-style Pi config template. - Add sticky PR comments with E2E recommendations and dispatch hints. - Add E2E manifest/rules/schema/config validation for advisor metadata. ## Safety model - Static analysis only; does not execute PR-provided scripts/tests/package managers. - Pi runs with read-only tools only: read, grep, find, ls. - Pi runs with no session, extensions, skills, prompt templates, or context files. - Generated Pi credential config is written under /tmp, outside uploaded artifacts. - PR commenting is best-effort and can use E2E_ADVISOR_GITHUB_TOKEN if github.token lacks write permissions. ## Secrets - PI_E2E_ADVISOR_API_KEY: required for Pi semantic analysis; otherwise semantic analysis skips and the deterministic baseline is used. - E2E_ADVISOR_GITHUB_TOKEN: optional repo-write token for PR comments when github.token cannot comment. ## Validation - npm run checks - Manually ran advisor workflow from fork against #3244 and received a successful semantic recommendation. ## Follow-up - Add an E2E Recommendation Gate required check that verifies recommended E2E jobs passed for the same PR head SHA before merge.  ## Summary by CodeRabbit * **New Features** * Added an E2E Advisor workflow that analyzes PR changes with a Pi-powered advisor, produces structured recommendations, and can post a sticky PR comment with test suggestions. * **Documentation** * Added a guide describing workflow usage, required secrets, outputs, and manual invocation. * **Chores** * Emits stable artifacts (prompt, raw output, parsed JSON, final result, markdown summary) and a machine-readable schema for advisor results. [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3289)

## Summary Refreshes the release-prep docs for v0.0.39 based on changes merged since the Friday 4pm doc refresh. Updates the source docs, bumps the docs version metadata, and regenerates the NemoClaw user skills from the refreshed docs. ## Changes - #3314 -> `docs/get-started/prerequisites.md`, `docs/get-started/quickstart.md`, `docs/reference/troubleshooting.md`: Documents installer Docker setup, Docker group activation, and retry guidance. - #3317 -> `docs/get-started/quickstart.md`, `docs/reference/commands.md`: Documents the DGX Spark and DGX Station express install prompt and `NEMOCLAW_NO_EXPRESS`. - #3328 and #3329 -> `docs/security/best-practices.md`, `docs/deployment/sandbox-hardening.md`: Updates sandbox capability hardening docs for the stricter bounding-set and `setpriv` step-down behavior. - #3330, #3335, and #3346 -> `docs/inference/use-local-inference.md`: Documents Windows-host Ollama relaunch behavior, NIM key passthrough, early health-fail diagnostics, and mixed-GPU preflight detail. - #2406, #2883, #3001, #3244, #3267, #3318, #3320, and #3354 -> `docs/about/release-notes.md`: Adds the v0.0.39 release-prep section while keeping the v0.0.38 release notes intact. - Advances the release-prep docs metadata from v0.0.38 to v0.0.39. - Regenerates `.agents/skills/nemoclaw-user-*` from the updated source docs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit ## Release Notes v0.0.39 * **New Features** * Host alias management commands for easier configuration * Sandbox GPU control options during onboarding * Update command with check and confirmation modes * **Documentation** * Enhanced Linux installer guidance with Docker and group membership handling * Expanded troubleshooting for permission and connectivity issues * Improved capability-dropping security documentation * Updated inference model switching commands * Brev environment-specific troubleshooting * **Improvements** * DGX Spark/Station express install flow * Windows Ollama relay and health-check enhancements * NVIDIA NIM preflight GPU reporting [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3375)

fix(inference): suppress kimi reasoning output

bf6c389

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

yimoj self-assigned this May 8, 2026

yimoj added the v0.0.38 label May 8, 2026

jyaunches mentioned this pull request May 8, 2026

ci: add E2E Advisor recommendations #3289

Merged

ericksoa approved these changes May 9, 2026

View reviewed changes

ericksoa merged commit f012dd8 into NVIDIA:main May 9, 2026
22 checks passed

miyoungc mentioned this pull request May 12, 2026

docs: refresh 0.0.39 release prep #3375

Merged

12 tasks

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference): suppress kimi reasoning output#3244

fix(inference): suppress kimi reasoning output#3244
ericksoa merged 1 commit into
NVIDIA:mainfrom
yimoj:fix/3177-kimi-reasoning-output

yimoj commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

ericksoa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yimoj commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yimoj commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading