fix: support reasoning models in the OpenClaw harness by ericksoa · Pull Request #3046 · NVIDIA/NemoClaw

ericksoa · 2026-05-05T16:47:42Z

Summary

Refs #2620 / NVBug 6122540.

This PR fixes the Kimi K2.6/OpenClaw harness path for a specific tool-calling failure through NemoClaw's managed inference.local provider. Kimi can emit the simple diagnostic request hostname; date; uptime as one combined exec tool call. In this harness, that shape is not equivalent to three tool calls: OpenClaw records, replays, and correlates tool results at the tool-call boundary, and Kimi's chat-completions replay also expects named tool results. Leaving the model output combined makes the session fragile and can look like context loss or an incomplete multi-step run even though the commands are individually safe and expected.

The fix is intentionally narrow. It adds a NemoClaw-managed OpenClaw provider plugin for inference/moonshotai/kimi-k2.6 on https://inference.local/v1 using openai-completions. That plugin rewrites only the exact safe combined exec diagnostic shape into three separate source tool calls, preserving this order:

hostname
date
uptime

Why this belongs in NemoClaw

NemoClaw owns the managed inference.local OpenClaw configuration and the sandbox build context that stages OpenClaw provider plugins. This is therefore the right compatibility boundary: we can normalize the one provider/model-specific tool-call shape before OpenClaw persists and replays the turn, without changing OpenClaw globally and without teaching NemoClaw to parse arbitrary shell.

This also keeps the behavior auditable. The splitter is an allowlist for three exact commands, not a shell parser. It refuses to rewrite arbitrary shell syntax, command arguments, pipes, redirects, variables, substitutions, unknown commands, non-exec tools, non-Kimi providers, multiple existing tool calls, and malformed arguments.

Changes

Adds moonshotai/kimi-k2.6 / Kimi K2.6 to the curated NVIDIA Endpoints cloud model menu.
Adds bounded Kimi K2.6 onboarding probe handling.
Adds Kimi replay compat in generated OpenClaw config with requiresToolResultName: true.
Adds and stages the managed OpenClaw plugin at nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/.
Restores sandbox build-context staging for the plugin path so fresh sandboxes receive the runtime wrapper.
Adds focused unit/config/build-context coverage for the Kimi compatibility wrapper.
Adds a nightly E2E job, kimi-inference-compat-e2e, that uses a hermetic OpenAI-compatible mock endpoint and validates the full OpenClaw trajectory for the Kimi hostname; date; uptime scenario.

Validation

Latest pushed head: 08e14dd6d48c822e12debf127157776ab82b2347.

Focused local validation passed:

npm run build:cli
npx vitest run test/kimi-inference-compat-plugin.test.ts test/generate-openclaw-config.test.ts test/onboard.test.ts test/sandbox-build-context.test.ts
npx vitest run test/sandbox-provisioning.test.ts test/kimi-inference-compat-plugin.test.ts test/generate-openclaw-config.test.ts test/sandbox-build-context.test.ts
npx prek run --all-files --stage pre-push

GitHub validation on the latest head:

PR checks: passed — https://github.com/NVIDIA/NemoClaw/actions/runs/25417876337/job/74553037898
PR self-hosted sandbox image and focused e2e checks: passed on the latest head — https://github.com/NVIDIA/NemoClaw/actions/runs/25417876941
Nightly kimi-inference-compat-e2e: passed on the latest head — https://github.com/NVIDIA/NemoClaw/actions/runs/25417998847
Nightly Kimi job: passed — https://github.com/NVIDIA/NemoClaw/actions/runs/25417998847/job/74553419836

The nightly Kimi E2E validates the acceptance criteria end-to-end in a fresh sandbox with a hermetic Kimi-compatible endpoint:

exactly three trace.artifacts.toolMetas
all three tool metas use toolName: "exec"
command set is exactly hostname, date, uptime
source assistant tool calls are split in hostname, date, uptime order
no promptError
no abandoned context / "want me to continue"
final assistant response occurs after the tool results

Proposed follow-up: model-specific setup registry

This PR keeps the production fix narrow, but it also exposes a pattern we should make more deliberate before we accumulate more one-off model interventions. Proposed next step: introduce a small, manifest-driven model-specific setup registry for sandbox/OpenClaw compatibility behavior.

Suggested shape:

Add a registry directory, for example nemoclaw-blueprint/model-specific-setup/, with one manifest per targeted intervention such as kimi-k2.6-managed-inference.json.
Keep manifests declarative: match fields for modelIds, providerKey, inferenceApi, and baseUrl; openclawCompat fields such as requiresToolResultName; and plugins.load entries for staged plugin paths.
Keep executable behavior in OpenClaw plugins, not in manifests. The Kimi splitter stays in openclaw-plugins/kimi-inference-compat/; the manifest only says when to load it.
Make generate-openclaw-config.py load the registry and apply matching setup records, replacing hardcoded model constants and per-model predicate functions.
Stage model-specific plugin assets generically in the sandbox image, so the Dockerfile does not need a new COPY and chmod stanza for every future model.
Add a registry validator test that enforces the boundary: exact match predicates, known compat keys only, no shell/code in manifests, and every staged plugin path must exist.
Organize future nightly coverage by intervention, not just provider category. For example, keep this scenario as a kimi-k2.6-managed-inference-exec-split case inside the model-specific setup suite.

That follow-up would keep model-specific behavior explicit and reviewable while avoiding a growing set of ad hoc checks spread across config generation, Dockerfile staging, and E2E scripts.

Summary by CodeRabbit

New Features
- Added Kimi K2.6 model option and a compatibility plugin to normalize Kimi inference behavior; integrated Kimi support into config generation and sandbox builds.
Tests
- Added a hermetic end-to-end Kimi compatibility test and comprehensive unit/integration suites covering onboarding, plugin behavior, and sandbox provisioning.
Chores
- Added a nightly E2E job to CI and updated workflow reporting; adjusted runtime image permissions to include the new plugin.

copy-pr-bot · 2026-05-05T16:47:46Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-05T16:47:51Z

📝 Walkthrough

Walkthrough

Adds support for MoonshotAI Kimi K2.6 managed inference: a new OpenClaw plugin, manifest, packaging changes, config detection/wiring, probe and onboarding adjustments, unit and E2E tests, an E2E script, and a nightly CI job for Kimi inference compatibility. No public API removals.

Changes

Kimi K2.6 Inference Compatibility Integration

Layer / File(s)	Summary
Data Shape / Constants `scripts/generate-openclaw-config.py`, `src/lib/onboard-providers.ts`	Introduce `KIMI_K26_MODEL_ID`, `KIMI_K26_MANAGED_INFERENCE_COMPAT`, plugin id/path constants, and `_is_kimi_k26_managed_inference(...)` to detect Kimi K2.6 managed inference.
Config Generation `scripts/generate-openclaw-config.py`	Conditionally merge KIMI compat into inference_compat and enable plugin load paths when detection is true; switch plugins wiring to dynamic structure.
Provider Runtime Wiring `src/lib/onboard-providers.ts`, `src/lib/sandbox-build-context.ts`	Merge compat into getSandboxInferenceConfig when providerKey/inferenceApi/model match; copy openclaw-plugins into staged blueprint during optimized build.
Plugin Implementation `nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js`	New plugin that detects managed Kimi contexts, parses/validates combined exec tool calls, splits safe semicolon-delimited exec commands into separate tool calls with stable IDs, rewrites messages/events, provides stream wrapper, and exposes `__testing` helpers.
Plugin Manifest `nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/openclaw.plugin.json`	New manifest declaring id, name, version, description, providers=["inference"], and empty object configSchema.
Runtime Packaging / Permissions `Dockerfile`	Copy plugin into image path and expand chmod steps to set permissions for plugin directory and files.
Model Lists & Prompts `src/lib/inference-config.ts`, `src/lib/inference-config.test.ts`, `src/lib/model-prompts.test.ts`	Add `moonshotai/kimi-k2.6` to CLOUD_MODEL_OPTIONS and adjust affected test prompt index.
Validation Probes `src/lib/onboard-inference-probes.ts`, `src/lib/onboard-inference-probes.test.ts`	Add `getKimiK26ValidationProbeCurlArgs`, `isKimiK26Model`, limit probe payload `max_tokens` to 8 for Kimi K2.6, and export the probe helper.
Config Generation Tests `test/generate-openclaw-config.test.ts`	Add test asserting managed-inference compat merged for moonshotai/kimi-k2.6, plugin entry enabled, and plugin load path set.
Plugin Unit Tests `test/kimi-inference-compat-plugin.test.ts`	Comprehensive tests for safe exec splitting, trimming, rejection of unsafe commands, non-Kimi non-wrapping, and stream rewrite behavior for partial/final deltas.
Sandbox Build Context Test `test/sandbox-build-context.test.ts`	Assert plugin manifest exists in optimized build context.
Onboard & Selection Tests `test/onboard.test.ts`, `test/onboard-selection.test.ts`	Add test verifying routed inference compat merge for Kimi K2.6 and large expansion of provider selection/onboard tests.
E2E Test Script `test/e2e/test-kimi-inference-compat.sh`	New hermetic Bash E2E that launches a Python-based Kimi mock, provisions a sandbox, validates OpenClaw wiring, runs agent prompts, inspects trajectories, verifies mock traffic, and uploads logs on failure.
CI Workflow `.github/workflows/nightly-e2e.yaml`	Add nightly job `kimi-inference-compat-e2e`, include it in workflow_dispatch inputs, notify-on-failure, report-to-pr, and nightly scorecard aggregation; job runs checkout, test script, and upload-on-failure steps.
Sandbox Provisioning Tests / Permissions `test/sandbox-provisioning.test.ts`	Extend test fixtures and assertions to include plugin directory/file existence and expected modes (dir 755, file 644).

Sequence Diagram(s)

sequenceDiagram
participant CI as CI/Developer
participant Sandbox as Nemo Sandbox
participant Plugin as Kimi Compat Plugin
participant Inference as inference.local (Kimi mock)

CI->>Sandbox: Start sandbox & onboard (with plugin path)
Sandbox->>Plugin: Register provider / wrap stream
Sandbox->>Inference: Send combined exec tool call (single exec with semicolons)
Plugin->>Plugin: Parse & split into N tool calls, assign stable IDs
Plugin->>Inference: Emit N separate tool-call events (streamed deltas + final)
Inference-->>Plugin: Streamed tool results / final result
Plugin-->>Sandbox: Rewrite events/messages to reflect split calls
CI->>Sandbox: Run E2E script to validate logs, trajectories, and behavior

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested labels

priority: high

Suggested reviewers

jyaunches
cv

Poem

🐰 I hop through streams and split a long line,

Semicolons tamed into tidy design.
Plugins and mocks chatter, tests hum along,
Sandboxes dance to the compatibility song.
Hooray — the rabbit's code-hopping song! 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.62% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'fix: support reasoning models in the OpenClaw harness' is partially related to the changeset. While it refers to a real aspect of the change (supporting Kimi K2.6 in OpenClaw), it uses the vague term 'reasoning models' and does not capture the main technical issue being fixed: splitting combined exec tool calls into individual commands for Kimi compatibility.	Consider a more specific title that captures the core fix, such as 'fix: add Kimi K2.6 OpenClaw plugin for tool call splitting' or 'fix: support Kimi K2.6 managed inference in OpenClaw harness with exec call splitting'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-2620-kimi-k2-6

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Add a managed Kimi stream wrapper that rewrites only safe combined exec diagnostics into separate exec tool calls before OpenClaw's tool loop sees them. Restore the Kimi provider-plugin wiring on the live PR branch and add replay compat for tool-result names.

github-actions · 2026-05-06T04:24:05Z

Selective E2E Results — ❌ Some jobs failed

Run: 25416324666
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 0 passed, 1 failed, 22 skipped

Job	Result
cloud-e2e	⏭️ skipped
cloud-inference-e2e	⏭️ skipped
cloud-onboard-e2e	⏭️ skipped
deployment-services-e2e	⏭️ skipped
diagnostics-e2e	⏭️ skipped
docs-validation-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped
hermes-e2e	⏭️ skipped
inference-routing-e2e	⏭️ skipped
kimi-inference-compat-e2e	❌ failure
messaging-compatible-endpoint-e2e	⏭️ skipped
messaging-providers-e2e	⏭️ skipped
network-policy-e2e	⏭️ skipped
overlayfs-autofix-e2e	⏭️ skipped
rebuild-hermes-e2e	⏭️ skipped
rebuild-openclaw-e2e	⏭️ skipped
sandbox-operations-e2e	⏭️ skipped
sandbox-survival-e2e	⏭️ skipped
shields-config-e2e	⏭️ skipped
skill-agent-e2e	⏭️ skipped
snapshot-commands-e2e	⏭️ skipped
token-rotation-e2e	⏭️ skipped
upgrade-stale-sandbox-e2e	⏭️ skipped

Failed jobs: kimi-inference-compat-e2e. Check run artifacts for logs.

# Conflicts: # Dockerfile

github-actions · 2026-05-06T04:35:07Z

Selective E2E Results — ✅ All requested jobs passed

Run: 25416404884
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 0 passed, 0 failed, 22 skipped

Job	Result
cloud-e2e	⏭️ skipped
cloud-inference-e2e	⏭️ skipped
cloud-onboard-e2e	⏭️ skipped
deployment-services-e2e	⏭️ skipped
diagnostics-e2e	⏭️ skipped
docs-validation-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped
hermes-e2e	⏭️ skipped
inference-routing-e2e	⏭️ skipped
kimi-inference-compat-e2e	⚠️ cancelled
messaging-compatible-endpoint-e2e	⏭️ skipped
messaging-providers-e2e	⏭️ skipped
network-policy-e2e	⏭️ skipped
overlayfs-autofix-e2e	⏭️ skipped
rebuild-hermes-e2e	⏭️ skipped
rebuild-openclaw-e2e	⏭️ skipped
sandbox-operations-e2e	⏭️ skipped
sandbox-survival-e2e	⏭️ skipped
shields-config-e2e	⏭️ skipped
skill-agent-e2e	⏭️ skipped
snapshot-commands-e2e	⏭️ skipped
token-rotation-e2e	⏭️ skipped
upgrade-stale-sandbox-e2e	⏭️ skipped

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

test/onboard-selection.test.ts (1)

323-323: 🏗️ Heavy lift

These tests are still too coupled to menu order.

Hard-coding "7" and "8" here means every curated-model insertion forces unrelated test edits. A label-driven selection helper would make these cases resilient to future menu churn.

Also applies to: 424-424, 520-520

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` at line 323, Tests currently hard-code
numeric menu choices (e.g., const answers = ["1", "7"]) which couples them to
menu order; replace these with a label-driven selection helper: add a small
utility (e.g., findMenuIndexByLabel or chooseByLabel) used by the tests in
test/onboard-selection.test.ts to inspect the menu options array and return the
1-based index string for a given label, then build answers using that helper
instead of literal numbers (apply same change where answers are hard-coded
around the other occurrences noted). Locate and update the const answers
declarations and any code that feeds menu prompts (e.g., the test helpers that
call the prompt) to derive choices via the helper so tests pick menu entries by
label rather than by fixed index.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 315-361: The new GitHub Actions job kimi-inference-compat-e2e is
missing a corresponding entry in .coderabbit.yaml; add a
reviews.path_instructions mapping that references kimi-inference-compat-e2e and
points to the relevant test file (test/e2e/test-kimi-inference-compat.sh) or the
plugin sources (openclaw-plugins/kimi-inference-compat) following the existing
E2E entry pattern so the validation in test/validate-e2e-coverage.test.ts passes
and reviewers know when to trigger this job.

In `@scripts/generate-openclaw-config.py`:
- Around line 141-145: The merge assumes inference_compat is a dict but can be
None/falsey; guard before spreading by ensuring you merge into a dict (e.g., use
an empty dict when inference_compat is None) or only perform the spread when
inference_compat is truthy so spreading KIMI_K26_MANAGED_INFERENCE_COMPAT into
inference_compat cannot raise; update the branch that references
kimi_managed_inference, inference_compat, and KIMI_K26_MANAGED_INFERENCE_COMPAT
to coalesce inference_compat to {} or check its truthiness before the merge.

In `@src/lib/onboard-providers.ts`:
- Around line 345-354: The function getSandboxInferenceConfig currently has
untyped parameters; update its signature to explicitly type the parameters and
return type as: model: string, provider: string | null = null,
preferredInferenceApi: string | null = null and return type
SandboxInferenceConfig (i.e., function getSandboxInferenceConfig(model: string,
provider: string | null = null, preferredInferenceApi: string | null = null):
SandboxInferenceConfig) so callers are statically guaranteed to pass a string
for model and the function contract is clear; leave the function body (including
logic that uses model.trim().toLowerCase() and constants like KIMI_K26_MODEL_ID
and KIMI_K26_MANAGED_INFERENCE_COMPAT) unchanged.

In `@test/e2e/test-kimi-inference-compat.sh`:
- Around line 321-345: The prepare_source_cli() function currently skips running
npm ci and npm run build:cli if "$REPO/dist/nemoclaw.js" already exists, which
can cause stale artifacts or missing runtime deps; change it to always run the
npm install/build steps unconditionally (remove the if [ ! -f
"$REPO/dist/nemoclaw.js" ] check) so the block that cds into "$REPO" and runs
npm ci --ignore-scripts && npm run build:cli always executes, preserving the
existing logging to "$BUILD_LOG" and rc handling so failures still return the
captured exit code.

---

Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Line 323: Tests currently hard-code numeric menu choices (e.g., const answers
= ["1", "7"]) which couples them to menu order; replace these with a
label-driven selection helper: add a small utility (e.g., findMenuIndexByLabel
or chooseByLabel) used by the tests in test/onboard-selection.test.ts to inspect
the menu options array and return the 1-based index string for a given label,
then build answers using that helper instead of literal numbers (apply same
change where answers are hard-coded around the other occurrences noted). Locate
and update the const answers declarations and any code that feeds menu prompts
(e.g., the test helpers that call the prompt) to derive choices via the helper
so tests pick menu entries by label rather than by fixed index.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 453fecfa-cc0b-4191-bfe2-5b7b003afbe0

📥 Commits

Reviewing files that changed from the base of the PR and between 400869e and eda586c.

📒 Files selected for processing (18)

.github/workflows/nightly-e2e.yaml
Dockerfile
nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js
nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/openclaw.plugin.json
scripts/generate-openclaw-config.py
src/lib/inference-config.test.ts
src/lib/inference-config.ts
src/lib/model-prompts.test.ts
src/lib/onboard-inference-probes.test.ts
src/lib/onboard-inference-probes.ts
src/lib/onboard-providers.ts
src/lib/sandbox-build-context.ts
test/e2e/test-kimi-inference-compat.sh
test/generate-openclaw-config.test.ts
test/kimi-inference-compat-plugin.test.ts
test/onboard-selection.test.ts
test/onboard.test.ts
test/sandbox-build-context.test.ts

coderabbitai

🧹 Nitpick comments (1)

Dockerfile (1)

241-241: ⚡ Quick win

Make file permission hardening type-safe.

At Line 241, chmod 644 .../* is fragile if this plugin later contains subdirectories (they would lose +x and become non-traversable). Prefer file-only chmod.

Proposed diff

-    && chmod 644 /usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/*
+    && find /usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat -type f -exec chmod 644 {} +

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile` at line 241, The current chmod 644 invocation on
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* will break
traversal if subdirectories are added; replace that single glob chmod with a
file-only and directory-specific permission change using find on the
kimi-inference-compat tree: set regular files to 644 and set directories (and
executable items that must be traversable) to 755. Update the Dockerfile by
replacing the chmod 644
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* line with a
find-based approach that targets -type f for 644 and -type d for 755 so
subdirectories remain traversable.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@Dockerfile`:
- Line 241: The current chmod 644 invocation on
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* will break
traversal if subdirectories are added; replace that single glob chmod with a
file-only and directory-specific permission change using find on the
kimi-inference-compat tree: set regular files to 644 and set directories (and
executable items that must be traversable) to 755. Update the Dockerfile by
replacing the chmod 644
/usr/local/share/nemoclaw/openclaw-plugins/kimi-inference-compat/* line with a
find-based approach that targets -type f for 644 and -type d for 755 so
subdirectories remain traversable.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e9cc4904-cc5a-4b0f-a501-99fb56418cae

📥 Commits

Reviewing files that changed from the base of the PR and between eda586c and 5976c8b.

📒 Files selected for processing (4)

.github/workflows/nightly-e2e.yaml
Dockerfile
test/onboard-selection.test.ts
test/onboard.test.ts

github-actions · 2026-05-06T04:45:00Z

Selective E2E Results — ❌ Some jobs failed

Run: 25416652703
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 0 passed, 1 failed, 24 skipped

Job	Result
cloud-e2e	⏭️ skipped
cloud-inference-e2e	⏭️ skipped
cloud-onboard-e2e	⏭️ skipped
deployment-services-e2e	⏭️ skipped
diagnostics-e2e	⏭️ skipped
docs-validation-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped
hermes-discord-e2e	⏭️ skipped
hermes-e2e	⏭️ skipped
inference-routing-e2e	⏭️ skipped
kimi-inference-compat-e2e	❌ failure
messaging-compatible-endpoint-e2e	⏭️ skipped
messaging-providers-e2e	⏭️ skipped
network-policy-e2e	⏭️ skipped
overlayfs-autofix-e2e	⏭️ skipped
rebuild-hermes-e2e	⏭️ skipped
rebuild-hermes-stale-base-e2e	⏭️ skipped
rebuild-openclaw-e2e	⏭️ skipped
sandbox-operations-e2e	⏭️ skipped
sandbox-survival-e2e	⏭️ skipped
shields-config-e2e	⏭️ skipped
skill-agent-e2e	⏭️ skipped
snapshot-commands-e2e	⏭️ skipped
token-rotation-e2e	⏭️ skipped
upgrade-stale-sandbox-e2e	⏭️ skipped

Failed jobs: kimi-inference-compat-e2e. Check run artifacts for logs.

github-actions · 2026-05-06T04:59:41Z

Selective E2E Results — ✅ All requested jobs passed

Run: 25417103670
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 1 passed, 0 failed, 24 skipped

Job	Result
cloud-e2e	⏭️ skipped
cloud-inference-e2e	⏭️ skipped
cloud-onboard-e2e	⏭️ skipped
deployment-services-e2e	⏭️ skipped
diagnostics-e2e	⏭️ skipped
docs-validation-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped
hermes-discord-e2e	⏭️ skipped
hermes-e2e	⏭️ skipped
inference-routing-e2e	⏭️ skipped
kimi-inference-compat-e2e	✅ success
messaging-compatible-endpoint-e2e	⏭️ skipped
messaging-providers-e2e	⏭️ skipped
network-policy-e2e	⏭️ skipped
overlayfs-autofix-e2e	⏭️ skipped
rebuild-hermes-e2e	⏭️ skipped
rebuild-hermes-stale-base-e2e	⏭️ skipped
rebuild-openclaw-e2e	⏭️ skipped
sandbox-operations-e2e	⏭️ skipped
sandbox-survival-e2e	⏭️ skipped
shields-config-e2e	⏭️ skipped
skill-agent-e2e	⏭️ skipped
snapshot-commands-e2e	⏭️ skipped
token-rotation-e2e	⏭️ skipped
upgrade-stale-sandbox-e2e	⏭️ skipped

github-actions · 2026-05-06T05:31:30Z

Selective E2E Results — ✅ All requested jobs passed

Run: 25417998847
Branch: issue-2620-kimi-k2-6
Requested jobs: kimi-inference-compat-e2e
Summary: 1 passed, 0 failed, 24 skipped

Job	Result
cloud-e2e	⏭️ skipped
cloud-inference-e2e	⏭️ skipped
cloud-onboard-e2e	⏭️ skipped
deployment-services-e2e	⏭️ skipped
diagnostics-e2e	⏭️ skipped
docs-validation-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped
hermes-discord-e2e	⏭️ skipped
hermes-e2e	⏭️ skipped
inference-routing-e2e	⏭️ skipped
kimi-inference-compat-e2e	✅ success
messaging-compatible-endpoint-e2e	⏭️ skipped
messaging-providers-e2e	⏭️ skipped
network-policy-e2e	⏭️ skipped
overlayfs-autofix-e2e	⏭️ skipped
rebuild-hermes-e2e	⏭️ skipped
rebuild-hermes-stale-base-e2e	⏭️ skipped
rebuild-openclaw-e2e	⏭️ skipped
sandbox-operations-e2e	⏭️ skipped
sandbox-survival-e2e	⏭️ skipped
shields-config-e2e	⏭️ skipped
skill-agent-e2e	⏭️ skipped
snapshot-commands-e2e	⏭️ skipped
token-rotation-e2e	⏭️ skipped
upgrade-stale-sandbox-e2e	⏭️ skipped

## Context Follow-up to issue #2620: #2620 PR #3046 fixed the immediate OpenClaw/Kimi K2.6 managed `inference.local` compatibility path: #3046 That fix left the compatibility behavior embedded directly in `scripts/generate-openclaw-config.py`. This PR moves that behavior into an agent-aware registry so future model/provider compatibility work has an explicit home and cannot accidentally apply across OpenClaw and Hermes. ## Where the architecture landed The model-specific setup architecture now has three explicit layers: 1. **Declarative registry:** `nemoclaw-blueprint/model-specific-setup/<agent>/*.json` is the source of truth for deciding when model/provider compatibility setup applies. Every manifest declares exactly one `agent`, a route match (`modelIds`, `providerKey`, `inferenceApi`, `baseUrl`), and declarative effects. 2. **Agent-owned config readers:** OpenClaw and Hermes read the same registry boundary, but each agent only accepts its own effects. OpenClaw consumes `openclawCompat` and `openclawPlugins`; Hermes validates `hermesCompat` shape and ignores OpenClaw manifests. The Hermes build-time generator is now a thin entrypoint over `agents/hermes/config/`, so future Hermes env parsing, config construction, registry handling, and serialization have explicit module homes instead of landing in one monolithic script. 3. **Agent-owned executable surfaces:** executable compatibility remains outside the registry. OpenClaw wrappers/plugins live under `nemoclaw-blueprint/openclaw-plugins/`; Hermes runtime code belongs under `agents/hermes/`. The registry decides activation, while agent-specific code performs runtime behavior. For this PR, the first registry entry is OpenClaw-only: `model-specific-setup/openclaw/kimi-k2.6-managed-inference.json`. It preserves the Kimi K2.6 managed `inference.local` behavior from PR #3046 by applying the same OpenClaw compat flags and loading the same Kimi OpenClaw plugin path. Hermes now has the architectural lane and validation path, but no Hermes Kimi behavior is added without a Hermes-specific repro and acceptance test. ## What this PR does - Adds `nemoclaw-blueprint/model-specific-setup/` as the declarative registry for model/provider compatibility setup. - Makes `agent` first-class in every manifest. v1 manifests target exactly one agent, for example `openclaw` or `hermes`. - Adds the first registry entry: `model-specific-setup/openclaw/kimi-k2.6-managed-inference.json`. This preserves the Kimi K2.6 managed `inference.local` behavior from PR #3046. - Refactors OpenClaw config generation so it consumes only matching manifests with `agent: "openclaw"` instead of hardcoding Kimi constants and predicates in the generator. - Adds Hermes registry discovery/validation only. It does not add Hermes Kimi behavior, because we do not have a Hermes-specific Kimi failure or acceptance test proving that Hermes needs the same compatibility layer. - Splits the Hermes build-time config generator into focused modules under `agents/hermes/config/`, keeping `agents/hermes/generate-config.ts` as orchestration only. - Stages `nemoclaw-blueprint/openclaw-plugins/` generically and normalizes plugin directory/file permissions with `find`, so future OpenClaw model-specific plugins do not require one-off Dockerfile edits. - Adds contributor/agent guidance in `AGENTS.md` and the registry README: OpenClaw executable wrappers go under `openclaw-plugins/`; Hermes executable wrappers go under `agents/hermes/`; manifests stay declarative. ## What this PR intentionally does not do - It does not add a shared multi-agent manifest. OpenClaw and Hermes have different config files, plugin systems, replay behavior, and E2E paths. - It does not add Hermes Kimi compatibility behavior. That should be a separate Hermes-specific change if a Hermes repro proves it is needed. - It does not change the Kimi plugin behavior from PR #3046; this is a refactor of where activation is declared and validated. ## Tests - `npm run build:cli` - `npm run typecheck:cli` - `npm run lint` - `npx vitest run test/generate-hermes-config.test.ts test/validate-config-schemas.test.ts test/generate-openclaw-config.test.ts` - `python3 -m py_compile scripts/generate-openclaw-config.py` - `bash -n test/e2e/test-kimi-inference-compat.sh && bash -n agents/hermes/start.sh && bash -n scripts/lib/sandbox-init.sh` - `git diff --check` ## E2E Gate - Runner gate passed on head `be8c398b`: `nightly-e2e` / `kimi-inference-compat-e2e`: https://github.com/NVIDIA/NemoClaw/actions/runs/25450743668 - Local attempt built and uploaded the sandbox image, then OpenShell returned `tls handshake eof` while creating/listing sandboxes. The runner gate is the merge gate for this PR.  ## Summary by CodeRabbit * **New Features** * Model-specific sandbox configuration registry with per-agent declarative manifests and discovery (OpenClaw + Hermes). * Manifest-driven plugin/compatibility effects applied at runtime. * **Improvements** * Runtime images now include full plugin directories with adjusted permissions and a generalized plugin-loading flow. * Sandbox build now stages model-specific setups; stronger typing for sandbox inference config. * E2E prepare step now always builds the CLI. * **Documentation** * Registry docs, manifest schema, and contributor guidance. * **Tests** * Expanded unit, validation, integration, and E2E tests for discovery, validation, and plugin handling.

## Summary - Bump docs metadata to 0.0.36 and refresh generated NemoClaw user skills. - Document Model Router onboarding, validation retries, Ollama tool checks, Hermes policy behavior, and deployment verification updates. - Remove suppressed experimental command references from public docs per `docs/.docs-skip`. ## Source summary - #2202 -> `docs/get-started/quickstart.md`, `docs/inference/inference-options.md`, `docs/reference/architecture.md`: Document Model Router setup and routed inference architecture. - #3128 -> `docs/get-started/quickstart.md`, `docs/reference/commands.md`: Document deployment verification and HTTP 401 health handling. - #3104 -> `docs/inference/inference-options.md`: Document retry behavior for transient provider validation failures. - #3121 -> `docs/reference/architecture.md`: Document agent-scoped model/provider compatibility manifests. - #3046 -> `docs/reference/architecture.md`: Tie model-specific compatibility setup to known model/provider behavior. - #3097 -> `docs/inference/use-local-inference.md`: Document Ollama tool-calling capability validation. - #3082 -> `docs/reference/commands.md`: Document `NEMOCLAW_SANDBOX_NAME` as the interactive sandbox-name default. - f586cc5, 3442adf -> `docs/get-started/quickstart-hermes.md`, `docs/reference/network-policies.md`: Document Hermes agent-specific baseline policy endpoints. ## Test plan - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user` - `make docs` - `npm run build:cli` - `rg` skip-term scan for `docs/` and generated user skills Made with [Cursor](https://cursor.com)  ## Summary by CodeRabbit * **New Features** * Model Router provider for complexity-based routed inference. * Ollama/local inference onboarding now validates tool-calling capability. * Added `local-inference` network policy preset. * **Documentation** * New integration policy examples (Outlook, Telegram, Slack, Discord, GitHub, Jira, etc.). * Clarified config immutability workflow and sandbox writable paths. * Hermes baseline network policy documented. * **Improvements** * Health checks treat device-auth responses as live; transient validation retries. * Installer performs pre-install reachability checks; CLI onboarding gained a --fresh option.

## Summary Stabilizes the nightly E2E follow-up failures by keeping the Kimi scenario on the public NVIDIA Kimi endpoint, recognizing the generated routed Kimi model ref, and repairing narrow OpenClaw scope-approval failures that report nonzero after local state changes. It also aligns the crash-loop recovery assertion with the current gateway guard markers and keeps the test-size budget ratcheted after moving coverage into a focused policy test. ## Related Issue Related to #2478, #4462, #2620, #3046. ## Changes - Keep the Kimi E2E on public NVIDIA Endpoints via `nvidia-prod` with `moonshotai/kimi-k2.6`; retain the local mock only behind `NEMOCLAW_KIMI_USE_MOCK=1` and sanitize Kimi failure logs. - Recognize the generated `inference/moonshotai/kimi-k2.6` model ref in the Kimi compatibility plugin. - Add constrained OpenClaw approval recovery for failed allowlisted scope upgrades that leave original or replacement pending state, without granting `operator.admin`. - Wire the recovery helper into sandbox auto-pair approval and the startup guard wrapper. - Align the crash-loop E2E guard assertion with current gateway safety marker behavior. - Add/update targeted tests and ratchet the `test/nemoclaw-start.test.ts` size budget downward. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [x] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Docs review found no user-facing docs changes were warranted; `docs/` and generated `.agents/skills/` remained clean. --- Signed-off-by: Carlos Villela <cvillela@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Added recovery support for failed OpenClaw device approval flows, including gateway-connect compatibility scenarios. * Improved Kimi inference compatibility handling for managed Kimi model references and stream/tool rewrite alignment. * **Bug Fixes** * Auto-pair approval now conditionally retries via the recovery policy on specific approval failures, updating pending/paired scope state. * **Security & CI** * Nightly Kimi inference E2E now supports live-vs-mock execution with public NVIDIA key validation and redacts NVIDIA keys in sanitized logs. * **Tests / Chores** * Expanded E2E and policy recovery tests; updated gateway-guard assertions and adjusted a test file size budget.  --------- Signed-off-by: Carlos Villela <cvillela@nvidia.com>

fix(onboard): add Kimi K2.6 cloud model

0279337

ericksoa self-assigned this May 5, 2026

Merge branch 'main' into issue-2620-kimi-k2-6

1390f0b

ericksoa changed the title ~~fix(onboard): add Kimi K2.6 cloud model~~ Fix reasoning model compatibility in the OpenClaw harness May 5, 2026

ericksoa changed the title ~~Fix reasoning model compatibility in the OpenClaw harness~~ fix: support reasoning models in the OpenClaw harness May 5, 2026

ericksoa added 2 commits May 5, 2026 20:40

test: add Kimi inference compat e2e

0c168eb

test: prepare Kimi e2e runner dependencies

eda586c

ericksoa marked this pull request as ready for review May 6, 2026 04:31

Merge remote-tracking branch 'origin/main' into issue-2620-kimi-k2-6

5976c8b

# Conflicts: # Dockerfile

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread .github/workflows/nightly-e2e.yaml

Comment thread scripts/generate-openclaw-config.py

Comment thread src/lib/onboard-providers.ts

Comment thread test/e2e/test-kimi-inference-compat.sh

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

cv approved these changes May 6, 2026

View reviewed changes

test: locate Kimi e2e runtime session files

801f5ca

test: fix Kimi check coverage after plugin staging

08e14dd

ericksoa merged commit f5b8144 into main May 6, 2026
60 checks passed

ericksoa deleted the issue-2620-kimi-k2-6 branch May 6, 2026 12:34

miyoungc mentioned this pull request May 6, 2026

docs: prepare 0.0.36 release docs #3151

Merged

github-actions Bot mentioned this pull request May 22, 2026

fix(openclaw): canonicalize mixed Kimi tool calls #4040

Merged

13 tasks

jyaunches mentioned this pull request May 22, 2026

test(e2e): require model-specific coverage for runtime dependency changes in scenario matrix #4042

Closed

wscurran added area: cli Command line interface, flags, terminal UX, or output area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression and removed priority: medium bug Something fails against expected or documented behavior labels Jun 3, 2026

cv mentioned this pull request Jun 13, 2026

fix(e2e): stabilize nightly recovery coverage #5401

Merged

13 tasks

Conversation

ericksoa commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this belongs in NemoClaw

Changes

Validation

Proposed follow-up: model-specific setup registry

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions Bot commented May 6, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented May 6, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 6, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented May 6, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 6, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericksoa commented May 5, 2026 •

edited

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading