Skip to content

fix: support reasoning models in the OpenClaw harness#3046

Merged
ericksoa merged 8 commits into
mainfrom
issue-2620-kimi-k2-6
May 6, 2026
Merged

fix: support reasoning models in the OpenClaw harness#3046
ericksoa merged 8 commits into
mainfrom
issue-2620-kimi-k2-6

Conversation

@ericksoa

@ericksoa ericksoa commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Refs #2620 / NVBug 6122540.

This PR fixes the Kimi K2.6/OpenClaw harness path for a specific tool-calling failure through NemoClaw's managed inference.local provider. Kimi can emit the simple diagnostic request hostname; date; uptime as one combined exec tool call. In this harness, that shape is not equivalent to three tool calls: OpenClaw records, replays, and correlates tool results at the tool-call boundary, and Kimi's chat-completions replay also expects named tool results. Leaving the model output combined makes the session fragile and can look like context loss or an incomplete multi-step run even though the commands are individually safe and expected.

The fix is intentionally narrow. It adds a NemoClaw-managed OpenClaw provider plugin for inference/moonshotai/kimi-k2.6 on https://inference.local/v1 using openai-completions. That plugin rewrites only the exact safe combined exec diagnostic shape into three separate source tool calls, preserving this order:

  1. hostname
  2. date
  3. uptime

Why this belongs in NemoClaw

NemoClaw owns the managed inference.local OpenClaw configuration and the sandbox build context that stages OpenClaw provider plugins. This is therefore the right compatibility boundary: we can normalize the one provider/model-specific tool-call shape before OpenClaw persists and replays the turn, without changing OpenClaw globally and without teaching NemoClaw to parse arbitrary shell.

This also keeps the behavior auditable. The splitter is an allowlist for three exact commands, not a shell parser. It refuses to rewrite arbitrary shell syntax, command arguments, pipes, redirects, variables, substitutions, unknown commands, non-exec tools, non-Kimi providers, multiple existing tool calls, and malformed arguments.

Changes

  • Adds moonshotai/kimi-k2.6 / Kimi K2.6 to the curated NVIDIA Endpoints cloud model menu.
  • Adds bounded Kimi K2.6 onboarding probe handling.
  • Adds Kimi replay compat in generated OpenClaw config with requiresToolResultName: true.
  • Adds and stages the managed OpenClaw plugin at nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/.
  • Restores sandbox build-context staging for the plugin path so fresh sandboxes receive the runtime wrapper.
  • Adds focused unit/config/build-context coverage for the Kimi compatibility wrapper.
  • Adds a nightly E2E job, kimi-inference-compat-e2e, that uses a hermetic OpenAI-compatible mock endpoint and validates the full OpenClaw trajectory for the Kimi hostname; date; uptime scenario.

Validation

Latest pushed head: 08e14dd6d48c822e12debf127157776ab82b2347.

Focused local validation passed:

npm run build:cli
npx vitest run test/kimi-inference-compat-plugin.test.ts test/generate-openclaw-config.test.ts test/onboard.test.ts test/sandbox-build-context.test.ts
npx vitest run test/sandbox-provisioning.test.ts test/kimi-inference-compat-plugin.test.ts test/generate-openclaw-config.test.ts test/sandbox-build-context.test.ts
npx prek run --all-files --stage pre-push

GitHub validation on the latest head:

The nightly Kimi E2E validates the acceptance criteria end-to-end in a fresh sandbox with a hermetic Kimi-compatible endpoint:

  • exactly three trace.artifacts.toolMetas
  • all three tool metas use toolName: "exec"
  • command set is exactly hostname, date, uptime
  • source assistant tool calls are split in hostname, date, uptime order
  • no promptError
  • no abandoned context / "want me to continue"
  • final assistant response occurs after the tool results

Proposed follow-up: model-specific setup registry

This PR keeps the production fix narrow, but it also exposes a pattern we should make more deliberate before we accumulate more one-off model interventions. Proposed next step: introduce a small, manifest-driven model-specific setup registry for sandbox/OpenClaw compatibility behavior.

Suggested shape:

  1. Add a registry directory, for example nemoclaw-blueprint/model-specific-setup/, with one manifest per targeted intervention such as kimi-k2.6-managed-inference.json.
  2. Keep manifests declarative: match fields for modelIds, providerKey, inferenceApi, and baseUrl; openclawCompat fields such as requiresToolResultName; and plugins.load entries for staged plugin paths.
  3. Keep executable behavior in OpenClaw plugins, not in manifests. The Kimi splitter stays in openclaw-plugins/kimi-inference-compat/; the manifest only says when to load it.
  4. Make generate-openclaw-config.py load the registry and apply matching setup records, replacing hardcoded model constants and per-model predicate functions.
  5. Stage model-specific plugin assets generically in the sandbox image, so the Dockerfile does not need a new COPY and chmod stanza for every future model.
  6. Add a registry validator test that enforces the boundary: exact match predicates, known compat keys only, no shell/code in manifests, and every staged plugin path must exist.
  7. Organize future nightly coverage by intervention, not just provider category. For example, keep this scenario as a kimi-k2.6-managed-inference-exec-split case inside the model-specific setup suite.

That follow-up would keep model-specific behavior explicit and reviewable while avoiding a growing set of ad hoc checks spread across config generation, Dockerfile staging, and E2E scripts.

Summary by CodeRabbit

  • New Features

    • Added Kimi K2.6 model option and a compatibility plugin to normalize Kimi inference behavior; integrated Kimi support into config generation and sandbox builds.
  • Tests

    • Added a hermetic end-to-end Kimi compatibility test and comprehensive unit/integration suites covering onboarding, plugin behavior, and sandbox provisioning.
  • Chores

    • Added a nightly E2E job to CI and updated workflow reporting; adjusted runtime image permissions to include the new plugin.

@ericksoa ericksoa added bug Something fails against expected or documented behavior platform: macos Affects macOS, including Apple Silicon priority: medium NV QA Bugs found by the NVIDIA QA Team UAT Issues flagged for User Acceptance Testing. provider: nvidia NVIDIA inference endpoint, NIM, or NVIDIA provider behavior labels May 5, 2026
@ericksoa ericksoa self-assigned this May 5, 2026
@copy-pr-bot

copy-pr-bot Bot commented May 5, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds support for MoonshotAI Kimi K2.6 managed inference: a new OpenClaw plugin, manifest, packaging changes, config detection/wiring, probe and onboarding adjustments, unit and E2E tests, an E2E script, and a nightly CI job for Kimi inference compatibility. No public API removals.

Changes

Kimi K2.6 Inference Compatibility Integration

Layer / File(s) Summary
Data Shape / Constants
scripts/generate-openclaw-config.py, src/lib/onboard-providers.ts
Introduce KIMI_K26_MODEL_ID, KIMI_K26_MANAGED_INFERENCE_COMPAT, plugin id/path constants, and _is_kimi_k26_managed_inference(...) to detect Kimi K2.6 managed inference.
Config Generation
scripts/generate-openclaw-config.py
Conditionally merge KIMI compat into inference_compat and enable plugin load paths when detection is true; switch plugins wiring to dynamic structure.
Provider Runtime Wiring
src/lib/onboard-providers.ts, src/lib/sandbox-build-context.ts
Merge compat into getSandboxInferenceConfig when providerKey/inferenceApi/model match; copy openclaw-plugins into staged blueprint during optimized build.
Plugin Implementation
nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js
New plugin that detects managed Kimi contexts, parses/validates combined exec tool calls, splits safe semicolon-delimited exec commands into separate tool calls with stable IDs, rewrites messages/events, provides stream wrapper, and exposes __testing helpers.
Plugin Manifest
nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/openclaw.plugin.json
New manifest declaring id, name, version, description, providers=["inference"], and empty object configSchema.
Runtime Packaging / Permissions
Dockerfile
Copy plugin into image path and expand chmod steps to set permissions for plugin directory and files.
Model Lists & Prompts
src/lib/inference-config.ts, src/lib/inference-config.test.ts, src/lib/model-prompts.test.ts
Add moonshotai/kimi-k2.6 to CLOUD_MODEL_OPTIONS and adjust affected test prompt index.
Validation Probes
src/lib/onboard-inference-probes.ts, src/lib/onboard-inference-probes.test.ts
Add getKimiK26ValidationProbeCurlArgs, isKimiK26Model, limit probe payload max_tokens to 8 for Kimi K2.6, and export the probe helper.
Config Generation Tests
test/generate-openclaw-config.test.ts
Add test asserting managed-inference compat merged for moonshotai/kimi-k2.6, plugin entry enabled, and plugin load path set.
Plugin Unit Tests
test/kimi-inference-compat-plugin.test.ts
Comprehensive tests for safe exec splitting, trimming, rejection of unsafe commands, non-Kimi non-wrapping, and stream rewrite behavior for partial/final deltas.
Sandbox Build Context Test
test/sandbox-build-context.test.ts
Assert plugin manifest exists in optimized build context.
Onboard & Selection Tests
test/onboard.test.ts, test/onboard-selection.test.ts
Add test verifying routed inference compat merge for Kimi K2.6 and large expansion of provider selection/onboard tests.
E2E Test Script
test/e2e/test-kimi-inference-compat.sh
New hermetic Bash E2E that launches a Python-based Kimi mock, provisions a sandbox, validates OpenClaw wiring, runs agent prompts, inspects trajectories, verifies mock traffic, and uploads logs on failure.
CI Workflow
.github/workflows/nightly-e2e.yaml
Add nightly job kimi-inference-compat-e2e, include it in workflow_dispatch inputs, notify-on-failure, report-to-pr, and nightly scorecard aggregation; job runs checkout, test script, and upload-on-failure steps.
Sandbox Provisioning Tests / Permissions
test/sandbox-provisioning.test.ts
Extend test fixtures and assertions to include plugin directory/file existence and expected modes (dir 755, file 644).

Sequence Diagram(s)

sequenceDiagram
participant CI as CI/Developer
participant Sandbox as Nemo Sandbox
participant Plugin as Kimi Compat Plugin
participant Inference as inference.local (Kimi mock)

CI->>Sandbox: Start sandbox & onboard (with plugin path)
Sandbox->>Plugin: Register provider / wrap stream
Sandbox->>Inference: Send combined exec tool call (single exec with semicolons)
Plugin->>Plugin: Parse & split into N tool calls, assign stable IDs
Plugin->>Inference: Emit N separate tool-call events (streamed deltas + final)
Inference-->>Plugin: Streamed tool results / final result
Plugin-->>Sandbox: Rewrite events/messages to reflect split calls
CI->>Sandbox: Run E2E script to validate logs, trajectories, and behavior
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested labels

priority: high

Suggested reviewers

  • jyaunches
  • cv

Poem

🐰 I hop through streams and split a long line,

Semicolons tamed into tidy design.
Plugins and mocks chatter, tests hum along,
Sandboxes dance to the compatibility song.
Hooray — the rabbit's code-hopping song! 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix: support reasoning models in the OpenClaw harness' is partially related to the changeset. While it refers to a real aspect of the change (supporting Kimi K2.6 in OpenClaw), it uses the vague term 'reasoning models' and does not capture the main technical issue being fixed: splitting combined exec tool calls into individual commands for Kimi compatibility. Consider a more specific title that captures the core fix, such as 'fix: add Kimi K2.6 OpenClaw plugin for tool call splitting' or 'fix: support Kimi K2.6 managed inference in OpenClaw harness with exec call splitting'.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-2620-kimi-k2-6

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@ericksoa ericksoa changed the title fix(onboard): add Kimi K2.6 cloud model Fix reasoning model compatibility in the OpenClaw harness May 5, 2026
@ericksoa ericksoa changed the title Fix reasoning model compatibility in the OpenClaw harness fix: support reasoning models in the OpenClaw harness May 5, 2026
ericksoa added 2 commits May 5, 2026 20:40
Add a managed Kimi stream wrapper that rewrites only safe combined exec diagnostics into separate exec tool calls before OpenClaw's tool loop sees them.

Restore the Kimi provider-plugin wiring on the live PR branch and add replay compat for tool-result names.
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25416324666
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 0 passed, 1 failed, 22 skipped

Job Result
cloud-e2e ⏭️ skipped
cloud-inference-e2e ⏭️ skipped
cloud-onboard-e2e ⏭️ skipped
deployment-services-e2e ⏭️ skipped
diagnostics-e2e ⏭️ skipped
docs-validation-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped
hermes-e2e ⏭️ skipped
inference-routing-e2e ⏭️ skipped
kimi-inference-compat-e2e ❌ failure
messaging-compatible-endpoint-e2e ⏭️ skipped
messaging-providers-e2e ⏭️ skipped
network-policy-e2e ⏭️ skipped
overlayfs-autofix-e2e ⏭️ skipped
rebuild-hermes-e2e ⏭️ skipped
rebuild-openclaw-e2e ⏭️ skipped
sandbox-operations-e2e ⏭️ skipped
sandbox-survival-e2e ⏭️ skipped
shields-config-e2e ⏭️ skipped
skill-agent-e2e ⏭️ skipped
snapshot-commands-e2e ⏭️ skipped
token-rotation-e2e ⏭️ skipped
upgrade-stale-sandbox-e2e ⏭️ skipped

Failed jobs: kimi-inference-compat-e2e. Check run artifacts for logs.

@ericksoa ericksoa marked this pull request as ready for review May 6, 2026 04:31
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25416404884
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 0 passed, 0 failed, 22 skipped

Job Result
cloud-e2e ⏭️ skipped
cloud-inference-e2e ⏭️ skipped
cloud-onboard-e2e ⏭️ skipped
deployment-services-e2e ⏭️ skipped
diagnostics-e2e ⏭️ skipped
docs-validation-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped
hermes-e2e ⏭️ skipped
inference-routing-e2e ⏭️ skipped
kimi-inference-compat-e2e ⚠️ cancelled
messaging-compatible-endpoint-e2e ⏭️ skipped
messaging-providers-e2e ⏭️ skipped
network-policy-e2e ⏭️ skipped
overlayfs-autofix-e2e ⏭️ skipped
rebuild-hermes-e2e ⏭️ skipped
rebuild-openclaw-e2e ⏭️ skipped
sandbox-operations-e2e ⏭️ skipped
sandbox-survival-e2e ⏭️ skipped
shields-config-e2e ⏭️ skipped
skill-agent-e2e ⏭️ skipped
snapshot-commands-e2e ⏭️ skipped
token-rotation-e2e ⏭️ skipped
upgrade-stale-sandbox-e2e ⏭️ skipped

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
test/onboard-selection.test.ts (1)

323-323: 🏗️ Heavy lift

These tests are still too coupled to menu order.

Hard-coding "7" and "8" here means every curated-model insertion forces unrelated test edits. A label-driven selection helper would make these cases resilient to future menu churn.

Also applies to: 424-424, 520-520

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` at line 323, Tests currently hard-code
numeric menu choices (e.g., const answers = ["1", "7"]) which couples them to
menu order; replace these with a label-driven selection helper: add a small
utility (e.g., findMenuIndexByLabel or chooseByLabel) used by the tests in
test/onboard-selection.test.ts to inspect the menu options array and return the
1-based index string for a given label, then build answers using that helper
instead of literal numbers (apply same change where answers are hard-coded
around the other occurrences noted). Locate and update the const answers
declarations and any code that feeds menu prompts (e.g., the test helpers that
call the prompt) to derive choices via the helper so tests pick menu entries by
label rather than by fixed index.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 315-361: The new GitHub Actions job kimi-inference-compat-e2e is
missing a corresponding entry in .coderabbit.yaml; add a
reviews.path_instructions mapping that references kimi-inference-compat-e2e and
points to the relevant test file (test/e2e/test-kimi-inference-compat.sh) or the
plugin sources (openclaw-plugins/kimi-inference-compat) following the existing
E2E entry pattern so the validation in test/validate-e2e-coverage.test.ts passes
and reviewers know when to trigger this job.

In `@scripts/generate-openclaw-config.py`:
- Around line 141-145: The merge assumes inference_compat is a dict but can be
None/falsey; guard before spreading by ensuring you merge into a dict (e.g., use
an empty dict when inference_compat is None) or only perform the spread when
inference_compat is truthy so spreading KIMI_K26_MANAGED_INFERENCE_COMPAT into
inference_compat cannot raise; update the branch that references
kimi_managed_inference, inference_compat, and KIMI_K26_MANAGED_INFERENCE_COMPAT
to coalesce inference_compat to {} or check its truthiness before the merge.

In `@src/lib/onboard-providers.ts`:
- Around line 345-354: The function getSandboxInferenceConfig currently has
untyped parameters; update its signature to explicitly type the parameters and
return type as: model: string, provider: string | null = null,
preferredInferenceApi: string | null = null and return type
SandboxInferenceConfig (i.e., function getSandboxInferenceConfig(model: string,
provider: string | null = null, preferredInferenceApi: string | null = null):
SandboxInferenceConfig) so callers are statically guaranteed to pass a string
for model and the function contract is clear; leave the function body (including
logic that uses model.trim().toLowerCase() and constants like KIMI_K26_MODEL_ID
and KIMI_K26_MANAGED_INFERENCE_COMPAT) unchanged.

In `@test/e2e/test-kimi-inference-compat.sh`:
- Around line 321-345: The prepare_source_cli() function currently skips running
npm ci and npm run build:cli if "$REPO/dist/nemoclaw.js" already exists, which
can cause stale artifacts or missing runtime deps; change it to always run the
npm install/build steps unconditionally (remove the if [ ! -f
"$REPO/dist/nemoclaw.js" ] check) so the block that cds into "$REPO" and runs
npm ci --ignore-scripts && npm run build:cli always executes, preserving the
existing logging to "$BUILD_LOG" and rc handling so failures still return the
captured exit code.

---

Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Line 323: Tests currently hard-code numeric menu choices (e.g., const answers
= ["1", "7"]) which couples them to menu order; replace these with a
label-driven selection helper: add a small utility (e.g., findMenuIndexByLabel
or chooseByLabel) used by the tests in test/onboard-selection.test.ts to inspect
the menu options array and return the 1-based index string for a given label,
then build answers using that helper instead of literal numbers (apply same
change where answers are hard-coded around the other occurrences noted). Locate
and update the const answers declarations and any code that feeds menu prompts
(e.g., the test helpers that call the prompt) to derive choices via the helper
so tests pick menu entries by label rather than by fixed index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 453fecfa-cc0b-4191-bfe2-5b7b003afbe0

📥 Commits

Reviewing files that changed from the base of the PR and between 400869e and eda586c.

📒 Files selected for processing (18)
  • .github/workflows/nightly-e2e.yaml
  • Dockerfile
  • nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js
  • nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/openclaw.plugin.json
  • scripts/generate-openclaw-config.py
  • src/lib/inference-config.test.ts
  • src/lib/inference-config.ts
  • src/lib/model-prompts.test.ts
  • src/lib/onboard-inference-probes.test.ts
  • src/lib/onboard-inference-probes.ts
  • src/lib/onboard-providers.ts
  • src/lib/sandbox-build-context.ts
  • test/e2e/test-kimi-inference-compat.sh
  • test/generate-openclaw-config.test.ts
  • test/kimi-inference-compat-plugin.test.ts
  • test/onboard-selection.test.ts
  • test/onboard.test.ts
  • test/sandbox-build-context.test.ts

Comment thread .github/workflows/nightly-e2e.yaml
Comment thread scripts/generate-openclaw-config.py
Comment thread src/lib/onboard-providers.ts
Comment thread test/e2e/test-kimi-inference-compat.sh

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Dockerfile (1)

241-241: ⚡ Quick win

Make file permission hardening type-safe.

At Line 241, chmod 644 .../* is fragile if this plugin later contains subdirectories (they would lose +x and become non-traversable). Prefer file-only chmod.

Proposed diff
-    && chmod 644 /usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/*
+    && find /usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat -type f -exec chmod 644 {} +
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile` at line 241, The current chmod 644 invocation on
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* will break
traversal if subdirectories are added; replace that single glob chmod with a
file-only and directory-specific permission change using find on the
kimi-inference-compat tree: set regular files to 644 and set directories (and
executable items that must be traversable) to 755. Update the Dockerfile by
replacing the chmod 644
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* line with a
find-based approach that targets -type f for 644 and -type d for 755 so
subdirectories remain traversable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@Dockerfile`:
- Line 241: The current chmod 644 invocation on
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* will break
traversal if subdirectories are added; replace that single glob chmod with a
file-only and directory-specific permission change using find on the
kimi-inference-compat tree: set regular files to 644 and set directories (and
executable items that must be traversable) to 755. Update the Dockerfile by
replacing the chmod 644
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* line with a
find-based approach that targets -type f for 644 and -type d for 755 so
subdirectories remain traversable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e9cc4904-cc5a-4b0f-a501-99fb56418cae

📥 Commits

Reviewing files that changed from the base of the PR and between eda586c and 5976c8b.

📒 Files selected for processing (4)
  • .github/workflows/nightly-e2e.yaml
  • Dockerfile
  • test/onboard-selection.test.ts
  • test/onboard.test.ts

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25416652703
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 0 passed, 1 failed, 24 skipped

Job Result
cloud-e2e ⏭️ skipped
cloud-inference-e2e ⏭️ skipped
cloud-onboard-e2e ⏭️ skipped
deployment-services-e2e ⏭️ skipped
diagnostics-e2e ⏭️ skipped
docs-validation-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped
hermes-discord-e2e ⏭️ skipped
hermes-e2e ⏭️ skipped
inference-routing-e2e ⏭️ skipped
kimi-inference-compat-e2e ❌ failure
messaging-compatible-endpoint-e2e ⏭️ skipped
messaging-providers-e2e ⏭️ skipped
network-policy-e2e ⏭️ skipped
overlayfs-autofix-e2e ⏭️ skipped
rebuild-hermes-e2e ⏭️ skipped
rebuild-hermes-stale-base-e2e ⏭️ skipped
rebuild-openclaw-e2e ⏭️ skipped
sandbox-operations-e2e ⏭️ skipped
sandbox-survival-e2e ⏭️ skipped
shields-config-e2e ⏭️ skipped
skill-agent-e2e ⏭️ skipped
snapshot-commands-e2e ⏭️ skipped
token-rotation-e2e ⏭️ skipped
upgrade-stale-sandbox-e2e ⏭️ skipped

Failed jobs: kimi-inference-compat-e2e. Check run artifacts for logs.

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25417103670
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 1 passed, 0 failed, 24 skipped

Job Result
cloud-e2e ⏭️ skipped
cloud-inference-e2e ⏭️ skipped
cloud-onboard-e2e ⏭️ skipped
deployment-services-e2e ⏭️ skipped
diagnostics-e2e ⏭️ skipped
docs-validation-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped
hermes-discord-e2e ⏭️ skipped
hermes-e2e ⏭️ skipped
inference-routing-e2e ⏭️ skipped
kimi-inference-compat-e2e ✅ success
messaging-compatible-endpoint-e2e ⏭️ skipped
messaging-providers-e2e ⏭️ skipped
network-policy-e2e ⏭️ skipped
overlayfs-autofix-e2e ⏭️ skipped
rebuild-hermes-e2e ⏭️ skipped
rebuild-hermes-stale-base-e2e ⏭️ skipped
rebuild-openclaw-e2e ⏭️ skipped
sandbox-operations-e2e ⏭️ skipped
sandbox-survival-e2e ⏭️ skipped
shields-config-e2e ⏭️ skipped
skill-agent-e2e ⏭️ skipped
snapshot-commands-e2e ⏭️ skipped
token-rotation-e2e ⏭️ skipped
upgrade-stale-sandbox-e2e ⏭️ skipped

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25417998847
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 1 passed, 0 failed, 24 skipped

Job Result
cloud-e2e ⏭️ skipped
cloud-inference-e2e ⏭️ skipped
cloud-onboard-e2e ⏭️ skipped
deployment-services-e2e ⏭️ skipped
diagnostics-e2e ⏭️ skipped
docs-validation-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped
hermes-discord-e2e ⏭️ skipped
hermes-e2e ⏭️ skipped
inference-routing-e2e ⏭️ skipped
kimi-inference-compat-e2e ✅ success
messaging-compatible-endpoint-e2e ⏭️ skipped
messaging-providers-e2e ⏭️ skipped
network-policy-e2e ⏭️ skipped
overlayfs-autofix-e2e ⏭️ skipped
rebuild-hermes-e2e ⏭️ skipped
rebuild-hermes-stale-base-e2e ⏭️ skipped
rebuild-openclaw-e2e ⏭️ skipped
sandbox-operations-e2e ⏭️ skipped
sandbox-survival-e2e ⏭️ skipped
shields-config-e2e ⏭️ skipped
skill-agent-e2e ⏭️ skipped
snapshot-commands-e2e ⏭️ skipped
token-rotation-e2e ⏭️ skipped
upgrade-stale-sandbox-e2e ⏭️ skipped

@ericksoa ericksoa merged commit f5b8144 into main May 6, 2026
60 checks passed
@ericksoa ericksoa deleted the issue-2620-kimi-k2-6 branch May 6, 2026 12:34
ericksoa added a commit that referenced this pull request May 6, 2026
## Context

Follow-up to issue #2620: #2620

PR #3046 fixed the immediate OpenClaw/Kimi K2.6 managed
`inference.local` compatibility path:
#3046

That fix left the compatibility behavior embedded directly in
`scripts/generate-openclaw-config.py`. This PR moves that behavior into
an agent-aware registry so future model/provider compatibility work has
an explicit home and cannot accidentally apply across OpenClaw and
Hermes.

## Where the architecture landed

The model-specific setup architecture now has three explicit layers:

1. **Declarative registry:**
`nemoclaw-blueprint/model-specific-setup/<agent>/*.json` is the source
of truth for deciding when model/provider compatibility setup applies.
Every manifest declares exactly one `agent`, a route match (`modelIds`,
`providerKey`, `inferenceApi`, `baseUrl`), and declarative effects.
2. **Agent-owned config readers:** OpenClaw and Hermes read the same
registry boundary, but each agent only accepts its own effects. OpenClaw
consumes `openclawCompat` and `openclawPlugins`; Hermes validates
`hermesCompat` shape and ignores OpenClaw manifests. The Hermes
build-time generator is now a thin entrypoint over
`agents/hermes/config/`, so future Hermes env parsing, config
construction, registry handling, and serialization have explicit module
homes instead of landing in one monolithic script.
3. **Agent-owned executable surfaces:** executable compatibility remains
outside the registry. OpenClaw wrappers/plugins live under
`nemoclaw-blueprint/openclaw-plugins/`; Hermes runtime code belongs
under `agents/hermes/`. The registry decides activation, while
agent-specific code performs runtime behavior.

For this PR, the first registry entry is OpenClaw-only:
`model-specific-setup/openclaw/kimi-k2.6-managed-inference.json`. It
preserves the Kimi K2.6 managed `inference.local` behavior from PR #3046
by applying the same OpenClaw compat flags and loading the same Kimi
OpenClaw plugin path. Hermes now has the architectural lane and
validation path, but no Hermes Kimi behavior is added without a
Hermes-specific repro and acceptance test.

## What this PR does

- Adds `nemoclaw-blueprint/model-specific-setup/` as the declarative
registry for model/provider compatibility setup.
- Makes `agent` first-class in every manifest. v1 manifests target
exactly one agent, for example `openclaw` or `hermes`.
- Adds the first registry entry:
`model-specific-setup/openclaw/kimi-k2.6-managed-inference.json`. This
preserves the Kimi K2.6 managed `inference.local` behavior from PR
#3046.
- Refactors OpenClaw config generation so it consumes only matching
manifests with `agent: "openclaw"` instead of hardcoding Kimi constants
and predicates in the generator.
- Adds Hermes registry discovery/validation only. It does not add Hermes
Kimi behavior, because we do not have a Hermes-specific Kimi failure or
acceptance test proving that Hermes needs the same compatibility layer.
- Splits the Hermes build-time config generator into focused modules
under `agents/hermes/config/`, keeping
`agents/hermes/generate-config.ts` as orchestration only.
- Stages `nemoclaw-blueprint/openclaw-plugins/` generically and
normalizes plugin directory/file permissions with `find`, so future
OpenClaw model-specific plugins do not require one-off Dockerfile edits.
- Adds contributor/agent guidance in `AGENTS.md` and the registry
README: OpenClaw executable wrappers go under `openclaw-plugins/`;
Hermes executable wrappers go under `agents/hermes/`; manifests stay
declarative.

## What this PR intentionally does not do

- It does not add a shared multi-agent manifest. OpenClaw and Hermes
have different config files, plugin systems, replay behavior, and E2E
paths.
- It does not add Hermes Kimi compatibility behavior. That should be a
separate Hermes-specific change if a Hermes repro proves it is needed.
- It does not change the Kimi plugin behavior from PR #3046; this is a
refactor of where activation is declared and validated.

## Tests

- `npm run build:cli`
- `npm run typecheck:cli`
- `npm run lint`
- `npx vitest run test/generate-hermes-config.test.ts
test/validate-config-schemas.test.ts
test/generate-openclaw-config.test.ts`
- `python3 -m py_compile scripts/generate-openclaw-config.py`
- `bash -n test/e2e/test-kimi-inference-compat.sh && bash -n
agents/hermes/start.sh && bash -n scripts/lib/sandbox-init.sh`
- `git diff --check`

## E2E Gate

- Runner gate passed on head `be8c398b`: `nightly-e2e` /
`kimi-inference-compat-e2e`:
https://github.com/NVIDIA/NemoClaw/actions/runs/25450743668
- Local attempt built and uploaded the sandbox image, then OpenShell
returned `tls handshake eof` while creating/listing sandboxes. The
runner gate is the merge gate for this PR.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Model-specific sandbox configuration registry with per-agent
declarative manifests and discovery (OpenClaw + Hermes).
  * Manifest-driven plugin/compatibility effects applied at runtime.

* **Improvements**
* Runtime images now include full plugin directories with adjusted
permissions and a generalized plugin-loading flow.
* Sandbox build now stages model-specific setups; stronger typing for
sandbox inference config.
  * E2E prepare step now always builds the CLI.

* **Documentation**
  * Registry docs, manifest schema, and contributor guidance.

* **Tests**
* Expanded unit, validation, integration, and E2E tests for discovery,
validation, and plugin handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
miyoungc added a commit that referenced this pull request May 7, 2026
## Summary
- Bump docs metadata to 0.0.36 and refresh generated NemoClaw user
skills.
- Document Model Router onboarding, validation retries, Ollama tool
checks, Hermes policy behavior, and deployment verification updates.
- Remove suppressed experimental command references from public docs per
`docs/.docs-skip`.

## Source summary
- #2202 -> `docs/get-started/quickstart.md`,
`docs/inference/inference-options.md`, `docs/reference/architecture.md`:
Document Model Router setup and routed inference architecture.
- #3128 -> `docs/get-started/quickstart.md`,
`docs/reference/commands.md`: Document deployment verification and HTTP
401 health handling.
- #3104 -> `docs/inference/inference-options.md`: Document retry
behavior for transient provider validation failures.
- #3121 -> `docs/reference/architecture.md`: Document agent-scoped
model/provider compatibility manifests.
- #3046 -> `docs/reference/architecture.md`: Tie model-specific
compatibility setup to known model/provider behavior.
- #3097 -> `docs/inference/use-local-inference.md`: Document Ollama
tool-calling capability validation.
- #3082 -> `docs/reference/commands.md`: Document
`NEMOCLAW_SANDBOX_NAME` as the interactive sandbox-name default.
- f586cc5, 3442adf -> `docs/get-started/quickstart-hermes.md`,
`docs/reference/network-policies.md`: Document Hermes agent-specific
baseline policy endpoints.

## Test plan
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user`
- `make docs`
- `npm run build:cli`
- `rg` skip-term scan for `docs/` and generated user skills

Made with [Cursor](https://cursor.com)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Model Router provider for complexity-based routed inference.
* Ollama/local inference onboarding now validates tool-calling
capability.
  * Added `local-inference` network policy preset.

* **Documentation**
* New integration policy examples (Outlook, Telegram, Slack, Discord,
GitHub, Jira, etc.).
  * Clarified config immutability workflow and sandbox writable paths.
  * Hermes baseline network policy documented.

* **Improvements**
* Health checks treat device-auth responses as live; transient
validation retries.
* Installer performs pre-install reachability checks; CLI onboarding
gained a --fresh option.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression and removed priority: medium bug Something fails against expected or documented behavior labels Jun 3, 2026
cv added a commit that referenced this pull request Jun 13, 2026
## Summary
Stabilizes the nightly E2E follow-up failures by keeping the Kimi
scenario on the public NVIDIA Kimi endpoint, recognizing the generated
routed Kimi model ref, and repairing narrow OpenClaw scope-approval
failures that report nonzero after local state changes. It also aligns
the crash-loop recovery assertion with the current gateway guard markers
and keeps the test-size budget ratcheted after moving coverage into a
focused policy test.

## Related Issue
Related to #2478, #4462, #2620, #3046.

## Changes
- Keep the Kimi E2E on public NVIDIA Endpoints via `nvidia-prod` with
`moonshotai/kimi-k2.6`; retain the local mock only behind
`NEMOCLAW_KIMI_USE_MOCK=1` and sanitize Kimi failure logs.
- Recognize the generated `inference/moonshotai/kimi-k2.6` model ref in
the Kimi compatibility plugin.
- Add constrained OpenClaw approval recovery for failed allowlisted
scope upgrades that leave original or replacement pending state, without
granting `operator.admin`.
- Wire the recovery helper into sandbox auto-pair approval and the
startup guard wrapper.
- Align the crash-loop E2E guard assertion with current gateway safety
marker behavior.
- Add/update targeted tests and ratchet the
`test/nemoclaw-start.test.ts` size budget downward.

## Type of Change
- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [x] Git hooks passed during commit and push, or `npx prek run
--from-ref main --to-ref HEAD` passes
- [x] Targeted tests pass for changed behavior
- [ ] Full `npm test` passes (broad runtime changes only)
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Docs review found no user-facing docs changes were warranted; `docs/`
and generated `.agents/skills/` remained clean.

---
Signed-off-by: Carlos Villela <cvillela@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added recovery support for failed OpenClaw device approval flows,
including gateway-connect compatibility scenarios.
* Improved Kimi inference compatibility handling for managed Kimi model
references and stream/tool rewrite alignment.

* **Bug Fixes**
* Auto-pair approval now conditionally retries via the recovery policy
on specific approval failures, updating pending/paired scope state.

* **Security & CI**
* Nightly Kimi inference E2E now supports live-vs-mock execution with
public NVIDIA key validation and redacts NVIDIA keys in sanitized logs.

* **Tests / Chores**
* Expanded E2E and policy recovery tests; updated gateway-guard
assertions and adjusted a test file size budget.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression NV QA Bugs found by the NVIDIA QA Team platform: macos Affects macOS, including Apple Silicon provider: nvidia NVIDIA inference endpoint, NIM, or NVIDIA provider behavior UAT Issues flagged for User Acceptance Testing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants