fix: support reasoning models in the OpenClaw harness#3046
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughAdds support for MoonshotAI Kimi K2.6 managed inference: a new OpenClaw plugin, manifest, packaging changes, config detection/wiring, probe and onboarding adjustments, unit and E2E tests, an E2E script, and a nightly CI job for Kimi inference compatibility. No public API removals. ChangesKimi K2.6 Inference Compatibility Integration
Sequence Diagram(s)sequenceDiagram
participant CI as CI/Developer
participant Sandbox as Nemo Sandbox
participant Plugin as Kimi Compat Plugin
participant Inference as inference.local (Kimi mock)
CI->>Sandbox: Start sandbox & onboard (with plugin path)
Sandbox->>Plugin: Register provider / wrap stream
Sandbox->>Inference: Send combined exec tool call (single exec with semicolons)
Plugin->>Plugin: Parse & split into N tool calls, assign stable IDs
Plugin->>Inference: Emit N separate tool-call events (streamed deltas + final)
Inference-->>Plugin: Streamed tool results / final result
Plugin-->>Sandbox: Rewrite events/messages to reflect split calls
CI->>Sandbox: Run E2E script to validate logs, trajectories, and behavior
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
Add a managed Kimi stream wrapper that rewrites only safe combined exec diagnostics into separate exec tool calls before OpenClaw's tool loop sees them. Restore the Kimi provider-plugin wiring on the live PR branch and add replay compat for tool-result names.
Selective E2E Results — ❌ Some jobs failedRun: 25416324666
|
# Conflicts: # Dockerfile
Selective E2E Results — ✅ All requested jobs passedRun: 25416404884
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
test/onboard-selection.test.ts (1)
323-323: 🏗️ Heavy liftThese tests are still too coupled to menu order.
Hard-coding
"7"and"8"here means every curated-model insertion forces unrelated test edits. A label-driven selection helper would make these cases resilient to future menu churn.Also applies to: 424-424, 520-520
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard-selection.test.ts` at line 323, Tests currently hard-code numeric menu choices (e.g., const answers = ["1", "7"]) which couples them to menu order; replace these with a label-driven selection helper: add a small utility (e.g., findMenuIndexByLabel or chooseByLabel) used by the tests in test/onboard-selection.test.ts to inspect the menu options array and return the 1-based index string for a given label, then build answers using that helper instead of literal numbers (apply same change where answers are hard-coded around the other occurrences noted). Locate and update the const answers declarations and any code that feeds menu prompts (e.g., the test helpers that call the prompt) to derive choices via the helper so tests pick menu entries by label rather than by fixed index.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 315-361: The new GitHub Actions job kimi-inference-compat-e2e is
missing a corresponding entry in .coderabbit.yaml; add a
reviews.path_instructions mapping that references kimi-inference-compat-e2e and
points to the relevant test file (test/e2e/test-kimi-inference-compat.sh) or the
plugin sources (openclaw-plugins/kimi-inference-compat) following the existing
E2E entry pattern so the validation in test/validate-e2e-coverage.test.ts passes
and reviewers know when to trigger this job.
In `@scripts/generate-openclaw-config.py`:
- Around line 141-145: The merge assumes inference_compat is a dict but can be
None/falsey; guard before spreading by ensuring you merge into a dict (e.g., use
an empty dict when inference_compat is None) or only perform the spread when
inference_compat is truthy so spreading KIMI_K26_MANAGED_INFERENCE_COMPAT into
inference_compat cannot raise; update the branch that references
kimi_managed_inference, inference_compat, and KIMI_K26_MANAGED_INFERENCE_COMPAT
to coalesce inference_compat to {} or check its truthiness before the merge.
In `@src/lib/onboard-providers.ts`:
- Around line 345-354: The function getSandboxInferenceConfig currently has
untyped parameters; update its signature to explicitly type the parameters and
return type as: model: string, provider: string | null = null,
preferredInferenceApi: string | null = null and return type
SandboxInferenceConfig (i.e., function getSandboxInferenceConfig(model: string,
provider: string | null = null, preferredInferenceApi: string | null = null):
SandboxInferenceConfig) so callers are statically guaranteed to pass a string
for model and the function contract is clear; leave the function body (including
logic that uses model.trim().toLowerCase() and constants like KIMI_K26_MODEL_ID
and KIMI_K26_MANAGED_INFERENCE_COMPAT) unchanged.
In `@test/e2e/test-kimi-inference-compat.sh`:
- Around line 321-345: The prepare_source_cli() function currently skips running
npm ci and npm run build:cli if "$REPO/dist/nemoclaw.js" already exists, which
can cause stale artifacts or missing runtime deps; change it to always run the
npm install/build steps unconditionally (remove the if [ ! -f
"$REPO/dist/nemoclaw.js" ] check) so the block that cds into "$REPO" and runs
npm ci --ignore-scripts && npm run build:cli always executes, preserving the
existing logging to "$BUILD_LOG" and rc handling so failures still return the
captured exit code.
---
Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Line 323: Tests currently hard-code numeric menu choices (e.g., const answers
= ["1", "7"]) which couples them to menu order; replace these with a
label-driven selection helper: add a small utility (e.g., findMenuIndexByLabel
or chooseByLabel) used by the tests in test/onboard-selection.test.ts to inspect
the menu options array and return the 1-based index string for a given label,
then build answers using that helper instead of literal numbers (apply same
change where answers are hard-coded around the other occurrences noted). Locate
and update the const answers declarations and any code that feeds menu prompts
(e.g., the test helpers that call the prompt) to derive choices via the helper
so tests pick menu entries by label rather than by fixed index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 453fecfa-cc0b-4191-bfe2-5b7b003afbe0
📒 Files selected for processing (18)
.github/workflows/nightly-e2e.yamlDockerfilenemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.jsnemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/openclaw.plugin.jsonscripts/generate-openclaw-config.pysrc/lib/inference-config.test.tssrc/lib/inference-config.tssrc/lib/model-prompts.test.tssrc/lib/onboard-inference-probes.test.tssrc/lib/onboard-inference-probes.tssrc/lib/onboard-providers.tssrc/lib/sandbox-build-context.tstest/e2e/test-kimi-inference-compat.shtest/generate-openclaw-config.test.tstest/kimi-inference-compat-plugin.test.tstest/onboard-selection.test.tstest/onboard.test.tstest/sandbox-build-context.test.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Dockerfile (1)
241-241: ⚡ Quick winMake file permission hardening type-safe.
At Line 241,
chmod 644 .../*is fragile if this plugin later contains subdirectories (they would lose+xand become non-traversable). Prefer file-only chmod.Proposed diff
- && chmod 644 /usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* + && find /usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat -type f -exec chmod 644 {} +🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Dockerfile` at line 241, The current chmod 644 invocation on /usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* will break traversal if subdirectories are added; replace that single glob chmod with a file-only and directory-specific permission change using find on the kimi-inference-compat tree: set regular files to 644 and set directories (and executable items that must be traversable) to 755. Update the Dockerfile by replacing the chmod 644 /usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* line with a find-based approach that targets -type f for 644 and -type d for 755 so subdirectories remain traversable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@Dockerfile`:
- Line 241: The current chmod 644 invocation on
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* will break
traversal if subdirectories are added; replace that single glob chmod with a
file-only and directory-specific permission change using find on the
kimi-inference-compat tree: set regular files to 644 and set directories (and
executable items that must be traversable) to 755. Update the Dockerfile by
replacing the chmod 644
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* line with a
find-based approach that targets -type f for 644 and -type d for 755 so
subdirectories remain traversable.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e9cc4904-cc5a-4b0f-a501-99fb56418cae
📒 Files selected for processing (4)
.github/workflows/nightly-e2e.yamlDockerfiletest/onboard-selection.test.tstest/onboard.test.ts
Selective E2E Results — ❌ Some jobs failedRun: 25416652703
|
Selective E2E Results — ✅ All requested jobs passedRun: 25417103670
|
Selective E2E Results — ✅ All requested jobs passedRun: 25417998847
|
## Context Follow-up to issue #2620: #2620 PR #3046 fixed the immediate OpenClaw/Kimi K2.6 managed `inference.local` compatibility path: #3046 That fix left the compatibility behavior embedded directly in `scripts/generate-openclaw-config.py`. This PR moves that behavior into an agent-aware registry so future model/provider compatibility work has an explicit home and cannot accidentally apply across OpenClaw and Hermes. ## Where the architecture landed The model-specific setup architecture now has three explicit layers: 1. **Declarative registry:** `nemoclaw-blueprint/model-specific-setup/<agent>/*.json` is the source of truth for deciding when model/provider compatibility setup applies. Every manifest declares exactly one `agent`, a route match (`modelIds`, `providerKey`, `inferenceApi`, `baseUrl`), and declarative effects. 2. **Agent-owned config readers:** OpenClaw and Hermes read the same registry boundary, but each agent only accepts its own effects. OpenClaw consumes `openclawCompat` and `openclawPlugins`; Hermes validates `hermesCompat` shape and ignores OpenClaw manifests. The Hermes build-time generator is now a thin entrypoint over `agents/hermes/config/`, so future Hermes env parsing, config construction, registry handling, and serialization have explicit module homes instead of landing in one monolithic script. 3. **Agent-owned executable surfaces:** executable compatibility remains outside the registry. OpenClaw wrappers/plugins live under `nemoclaw-blueprint/openclaw-plugins/`; Hermes runtime code belongs under `agents/hermes/`. The registry decides activation, while agent-specific code performs runtime behavior. For this PR, the first registry entry is OpenClaw-only: `model-specific-setup/openclaw/kimi-k2.6-managed-inference.json`. It preserves the Kimi K2.6 managed `inference.local` behavior from PR #3046 by applying the same OpenClaw compat flags and loading the same Kimi OpenClaw plugin path. Hermes now has the architectural lane and validation path, but no Hermes Kimi behavior is added without a Hermes-specific repro and acceptance test. ## What this PR does - Adds `nemoclaw-blueprint/model-specific-setup/` as the declarative registry for model/provider compatibility setup. - Makes `agent` first-class in every manifest. v1 manifests target exactly one agent, for example `openclaw` or `hermes`. - Adds the first registry entry: `model-specific-setup/openclaw/kimi-k2.6-managed-inference.json`. This preserves the Kimi K2.6 managed `inference.local` behavior from PR #3046. - Refactors OpenClaw config generation so it consumes only matching manifests with `agent: "openclaw"` instead of hardcoding Kimi constants and predicates in the generator. - Adds Hermes registry discovery/validation only. It does not add Hermes Kimi behavior, because we do not have a Hermes-specific Kimi failure or acceptance test proving that Hermes needs the same compatibility layer. - Splits the Hermes build-time config generator into focused modules under `agents/hermes/config/`, keeping `agents/hermes/generate-config.ts` as orchestration only. - Stages `nemoclaw-blueprint/openclaw-plugins/` generically and normalizes plugin directory/file permissions with `find`, so future OpenClaw model-specific plugins do not require one-off Dockerfile edits. - Adds contributor/agent guidance in `AGENTS.md` and the registry README: OpenClaw executable wrappers go under `openclaw-plugins/`; Hermes executable wrappers go under `agents/hermes/`; manifests stay declarative. ## What this PR intentionally does not do - It does not add a shared multi-agent manifest. OpenClaw and Hermes have different config files, plugin systems, replay behavior, and E2E paths. - It does not add Hermes Kimi compatibility behavior. That should be a separate Hermes-specific change if a Hermes repro proves it is needed. - It does not change the Kimi plugin behavior from PR #3046; this is a refactor of where activation is declared and validated. ## Tests - `npm run build:cli` - `npm run typecheck:cli` - `npm run lint` - `npx vitest run test/generate-hermes-config.test.ts test/validate-config-schemas.test.ts test/generate-openclaw-config.test.ts` - `python3 -m py_compile scripts/generate-openclaw-config.py` - `bash -n test/e2e/test-kimi-inference-compat.sh && bash -n agents/hermes/start.sh && bash -n scripts/lib/sandbox-init.sh` - `git diff --check` ## E2E Gate - Runner gate passed on head `be8c398b`: `nightly-e2e` / `kimi-inference-compat-e2e`: https://github.com/NVIDIA/NemoClaw/actions/runs/25450743668 - Local attempt built and uploaded the sandbox image, then OpenShell returned `tls handshake eof` while creating/listing sandboxes. The runner gate is the merge gate for this PR. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Model-specific sandbox configuration registry with per-agent declarative manifests and discovery (OpenClaw + Hermes). * Manifest-driven plugin/compatibility effects applied at runtime. * **Improvements** * Runtime images now include full plugin directories with adjusted permissions and a generalized plugin-loading flow. * Sandbox build now stages model-specific setups; stronger typing for sandbox inference config. * E2E prepare step now always builds the CLI. * **Documentation** * Registry docs, manifest schema, and contributor guidance. * **Tests** * Expanded unit, validation, integration, and E2E tests for discovery, validation, and plugin handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary - Bump docs metadata to 0.0.36 and refresh generated NemoClaw user skills. - Document Model Router onboarding, validation retries, Ollama tool checks, Hermes policy behavior, and deployment verification updates. - Remove suppressed experimental command references from public docs per `docs/.docs-skip`. ## Source summary - #2202 -> `docs/get-started/quickstart.md`, `docs/inference/inference-options.md`, `docs/reference/architecture.md`: Document Model Router setup and routed inference architecture. - #3128 -> `docs/get-started/quickstart.md`, `docs/reference/commands.md`: Document deployment verification and HTTP 401 health handling. - #3104 -> `docs/inference/inference-options.md`: Document retry behavior for transient provider validation failures. - #3121 -> `docs/reference/architecture.md`: Document agent-scoped model/provider compatibility manifests. - #3046 -> `docs/reference/architecture.md`: Tie model-specific compatibility setup to known model/provider behavior. - #3097 -> `docs/inference/use-local-inference.md`: Document Ollama tool-calling capability validation. - #3082 -> `docs/reference/commands.md`: Document `NEMOCLAW_SANDBOX_NAME` as the interactive sandbox-name default. - f586cc5, 3442adf -> `docs/get-started/quickstart-hermes.md`, `docs/reference/network-policies.md`: Document Hermes agent-specific baseline policy endpoints. ## Test plan - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user` - `make docs` - `npm run build:cli` - `rg` skip-term scan for `docs/` and generated user skills Made with [Cursor](https://cursor.com) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Model Router provider for complexity-based routed inference. * Ollama/local inference onboarding now validates tool-calling capability. * Added `local-inference` network policy preset. * **Documentation** * New integration policy examples (Outlook, Telegram, Slack, Discord, GitHub, Jira, etc.). * Clarified config immutability workflow and sandbox writable paths. * Hermes baseline network policy documented. * **Improvements** * Health checks treat device-auth responses as live; transient validation retries. * Installer performs pre-install reachability checks; CLI onboarding gained a --fresh option. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Stabilizes the nightly E2E follow-up failures by keeping the Kimi scenario on the public NVIDIA Kimi endpoint, recognizing the generated routed Kimi model ref, and repairing narrow OpenClaw scope-approval failures that report nonzero after local state changes. It also aligns the crash-loop recovery assertion with the current gateway guard markers and keeps the test-size budget ratcheted after moving coverage into a focused policy test. ## Related Issue Related to #2478, #4462, #2620, #3046. ## Changes - Keep the Kimi E2E on public NVIDIA Endpoints via `nvidia-prod` with `moonshotai/kimi-k2.6`; retain the local mock only behind `NEMOCLAW_KIMI_USE_MOCK=1` and sanitize Kimi failure logs. - Recognize the generated `inference/moonshotai/kimi-k2.6` model ref in the Kimi compatibility plugin. - Add constrained OpenClaw approval recovery for failed allowlisted scope upgrades that leave original or replacement pending state, without granting `operator.admin`. - Wire the recovery helper into sandbox auto-pair approval and the startup guard wrapper. - Align the crash-loop E2E guard assertion with current gateway safety marker behavior. - Add/update targeted tests and ratchet the `test/nemoclaw-start.test.ts` size budget downward. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [x] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Docs review found no user-facing docs changes were warranted; `docs/` and generated `.agents/skills/` remained clean. --- Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added recovery support for failed OpenClaw device approval flows, including gateway-connect compatibility scenarios. * Improved Kimi inference compatibility handling for managed Kimi model references and stream/tool rewrite alignment. * **Bug Fixes** * Auto-pair approval now conditionally retries via the recovery policy on specific approval failures, updating pending/paired scope state. * **Security & CI** * Nightly Kimi inference E2E now supports live-vs-mock execution with public NVIDIA key validation and redacts NVIDIA keys in sanitized logs. * **Tests / Chores** * Expanded E2E and policy recovery tests; updated gateway-guard assertions and adjusted a test file size budget. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Summary
Refs #2620 / NVBug 6122540.
This PR fixes the Kimi K2.6/OpenClaw harness path for a specific tool-calling failure through NemoClaw's managed
inference.localprovider. Kimi can emit the simple diagnostic requesthostname; date; uptimeas one combinedexectool call. In this harness, that shape is not equivalent to three tool calls: OpenClaw records, replays, and correlates tool results at the tool-call boundary, and Kimi's chat-completions replay also expects named tool results. Leaving the model output combined makes the session fragile and can look like context loss or an incomplete multi-step run even though the commands are individually safe and expected.The fix is intentionally narrow. It adds a NemoClaw-managed OpenClaw provider plugin for
inference/moonshotai/kimi-k2.6onhttps://inference.local/v1usingopenai-completions. That plugin rewrites only the exact safe combinedexecdiagnostic shape into three separate source tool calls, preserving this order:hostnamedateuptimeWhy this belongs in NemoClaw
NemoClaw owns the managed
inference.localOpenClaw configuration and the sandbox build context that stages OpenClaw provider plugins. This is therefore the right compatibility boundary: we can normalize the one provider/model-specific tool-call shape before OpenClaw persists and replays the turn, without changing OpenClaw globally and without teaching NemoClaw to parse arbitrary shell.This also keeps the behavior auditable. The splitter is an allowlist for three exact commands, not a shell parser. It refuses to rewrite arbitrary shell syntax, command arguments, pipes, redirects, variables, substitutions, unknown commands, non-
exectools, non-Kimi providers, multiple existing tool calls, and malformed arguments.Changes
moonshotai/kimi-k2.6/Kimi K2.6to the curated NVIDIA Endpoints cloud model menu.requiresToolResultName: true.nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/.kimi-inference-compat-e2e, that uses a hermetic OpenAI-compatible mock endpoint and validates the full OpenClaw trajectory for the Kimihostname; date; uptimescenario.Validation
Latest pushed head:
08e14dd6d48c822e12debf127157776ab82b2347.Focused local validation passed:
GitHub validation on the latest head:
checks: passed — https://github.com/NVIDIA/NemoClaw/actions/runs/25417876337/job/74553037898kimi-inference-compat-e2e: passed on the latest head — https://github.com/NVIDIA/NemoClaw/actions/runs/25417998847The nightly Kimi E2E validates the acceptance criteria end-to-end in a fresh sandbox with a hermetic Kimi-compatible endpoint:
trace.artifacts.toolMetastoolName: "exec"hostname,date,uptimehostname,date,uptimeorderpromptErrorProposed follow-up: model-specific setup registry
This PR keeps the production fix narrow, but it also exposes a pattern we should make more deliberate before we accumulate more one-off model interventions. Proposed next step: introduce a small, manifest-driven
model-specific setupregistry for sandbox/OpenClaw compatibility behavior.Suggested shape:
nemoclaw-blueprint/model-specific-setup/, with one manifest per targeted intervention such askimi-k2.6-managed-inference.json.matchfields formodelIds,providerKey,inferenceApi, andbaseUrl;openclawCompatfields such asrequiresToolResultName; andplugins.loadentries for staged plugin paths.openclaw-plugins/kimi-inference-compat/; the manifest only says when to load it.generate-openclaw-config.pyload the registry and apply matching setup records, replacing hardcoded model constants and per-model predicate functions.COPYand chmod stanza for every future model.kimi-k2.6-managed-inference-exec-splitcase inside the model-specific setup suite.That follow-up would keep model-specific behavior explicit and reviewable while avoiding a growing set of ad hoc checks spread across config generation, Dockerfile staging, and E2E scripts.
Summary by CodeRabbit
New Features
Tests
Chores