fix(inference): suppress kimi reasoning output#3244
Conversation
Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
ericksoa
left a comment
There was a problem hiding this comment.
Approved. I verified the Kimi path disables thinking in the runtime preload and onboarding probe payload, and the OpenClaw Kimi compat plugin strips reasoning/thinking fields while preserving normal content and tool-call deltas. Also checked the broader preload remains scoped to known affected models so non-Kimi/Nemotron/DeepSeek requests pass through unchanged.\n\nValidation on a local no-commit merge with current main: npm run build:cli; npm run typecheck:cli; npm test -- test/kimi-inference-compat-plugin.test.ts test/nemotron-inference-fix.test.ts src/lib/inference/onboard-probes.test.ts test/generate-openclaw-config.test.ts test/validate-config-schemas.test.ts; bash -n scripts/nemoclaw-start.sh; git diff --check.
## Summary - Add an E2E Advisor workflow that recommends required/optional E2E tests for PRs. - Add a semantic Pi analysis layer using the NVIDIA inference-style Pi config template. - Add sticky PR comments with E2E recommendations and dispatch hints. - Add E2E manifest/rules/schema/config validation for advisor metadata. ## Safety model - Static analysis only; does not execute PR-provided scripts/tests/package managers. - Pi runs with read-only tools only: read, grep, find, ls. - Pi runs with no session, extensions, skills, prompt templates, or context files. - Generated Pi credential config is written under /tmp, outside uploaded artifacts. - PR commenting is best-effort and can use E2E_ADVISOR_GITHUB_TOKEN if github.token lacks write permissions. ## Secrets - PI_E2E_ADVISOR_API_KEY: required for Pi semantic analysis; otherwise semantic analysis skips and the deterministic baseline is used. - E2E_ADVISOR_GITHUB_TOKEN: optional repo-write token for PR comments when github.token cannot comment. ## Validation - npm run checks - Manually ran advisor workflow from fork against #3244 and received a successful semantic recommendation. ## Follow-up - Add an E2E Recommendation Gate required check that verifies recommended E2E jobs passed for the same PR head SHA before merge. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added an E2E Advisor workflow that analyzes PR changes with a Pi-powered advisor, produces structured recommendations, and can post a sticky PR comment with test suggestions. * **Documentation** * Added a guide describing workflow usage, required secrets, outputs, and manual invocation. * **Chores** * Emits stable artifacts (prompt, raw output, parsed JSON, final result, markdown summary) and a machine-readable schema for advisor results. [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3289) <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Refreshes the release-prep docs for v0.0.39 based on changes merged since the Friday 4pm doc refresh. Updates the source docs, bumps the docs version metadata, and regenerates the NemoClaw user skills from the refreshed docs. ## Changes - #3314 -> `docs/get-started/prerequisites.md`, `docs/get-started/quickstart.md`, `docs/reference/troubleshooting.md`: Documents installer Docker setup, Docker group activation, and retry guidance. - #3317 -> `docs/get-started/quickstart.md`, `docs/reference/commands.md`: Documents the DGX Spark and DGX Station express install prompt and `NEMOCLAW_NO_EXPRESS`. - #3328 and #3329 -> `docs/security/best-practices.md`, `docs/deployment/sandbox-hardening.md`: Updates sandbox capability hardening docs for the stricter bounding-set and `setpriv` step-down behavior. - #3330, #3335, and #3346 -> `docs/inference/use-local-inference.md`: Documents Windows-host Ollama relaunch behavior, NIM key passthrough, early health-fail diagnostics, and mixed-GPU preflight detail. - #2406, #2883, #3001, #3244, #3267, #3318, #3320, and #3354 -> `docs/about/release-notes.md`: Adds the v0.0.39 release-prep section while keeping the v0.0.38 release notes intact. - Advances the release-prep docs metadata from v0.0.38 to v0.0.39. - Regenerates `.agents/skills/nemoclaw-user-*` from the updated source docs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes v0.0.39 * **New Features** * Host alias management commands for easier configuration * Sandbox GPU control options during onboarding * Update command with check and confirmation modes * **Documentation** * Enhanced Linux installer guidance with Docker and group membership handling * Expanded troubleshooting for permission and connectivity issues * Improved capability-dropping security documentation * Updated inference model switching commands * Brev environment-specific troubleshooting * **Improvements** * DGX Spark/Station express install flow * Windows Ollama relay and health-check enhancements * NVIDIA NIM preflight GPU reporting [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3375) <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Suppresses Kimi K2.6 managed inference thinking/reasoning output so failed tool-call turns do not leak provider reasoning into user-visible agent output. The runtime request patch now disables Kimi thinking and the Kimi OpenClaw compatibility plugin defensively strips reasoning fields/events.
Related Issue
Fixes #3177
Changes
chat_template_kwargs.thinking=false.Type of Change
Verification
Targeted tests and the Kimi compatibility E2E passed. Full hook/test suites were run but are not marked passing because unrelated environment/test failures remain (
test/fetch-guard-patch-regression.test.tshost permission issue; earlier installer-preflight failures in the full run).npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Additional verification run:
npm test -- test/kimi-inference-compat-plugin.test.ts test/nemotron-inference-fix.test.ts src/lib/inference/onboard-probes.test.ts test/generate-openclaw-config.test.ts test/validate-config-schemas.test.tsnpm run typecheck:cliNEMOCLAW_SANDBOX_NAME=e2e-kimi-3177 NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 NEMOCLAW_YES=1 bash test/e2e/test-kimi-inference-compat.shSigned-off-by: Yimo Jiang yimoj@nvidia.com
Summary by CodeRabbit
New Features
Bug Fixes
Tests