fix: complete #2731, #2697, and #2727 reliability fixes by Saibabu7770 · Pull Request #2737 · NVIDIA/NemoClaw

Saibabu7770 · 2026-04-30T03:44:15Z

Summary

This PR fixes three reliability bugs across onboarding, blueprint apply, and sandbox rebuild flows. It improves tool-call validation for local Ollama/Hermes responses, ensures components.policy.additions from blueprint.yaml are actually applied at runtime, and makes rebuild backups resilient when tar returns partial output due to unreadable/root-owned files.

Related Issue

Closes #2731
Closes #2697
Closes #2727

Changes

Added robust chat-completions tool-call validation for local Ollama/Hermes, including detection of leaked stringified tool-call JSON in message.content.
Updated Local Ollama onboarding validation to require structured chat tool_calls and avoid relying on the /responses path for this provider/runtime combo.
Wired blueprint components.policy.additions into actionApply by fetching the live policy, merging additions into network_policies, and applying via openshell policy set --wait.
Persisted applied policy_additions in run state metadata (plan.json) for traceability.
Updated backup behavior to treat tar exit code 2 as partial success when archive bytes are present, then mark only missing directories as failed.
Added/updated regression tests for all three fixes:
- onboarding tool-call leak and structured tool-call detection
- runner policy additions merge/apply flow
- sandbox backup partial-tar handling

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Summary by CodeRabbit

New Features
- Merge and apply blueprint policy additions into sandbox policies; writes and applies a merged policy when additions exist and persists the chosen additions for the run.
Improvements
- Stricter validation for local model selection and OpenAI-like chat-completions probing, requiring structured chat-completions tool-calls and detecting leaked tool-call payloads; skips probing in certain local flows.
Tests
- Expanded tests for policy merge/apply behavior (including empty, invalid-YAML, and merge cases) and for strict tool-calling probe scenarios.

Signed-off-by: Aaron Erickson aerickson@nvidia.com

copy-pr-bot · 2026-04-30T03:44:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-30T03:44:39Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Two independent changes: (1) onboarding now enforces structured Chat Completions tool-calling for Ollama/local flows and adds helpers/probes to detect structured or leaked tool-call payloads; (2) blueprint components.policy.additions is validated, merged with the current sandbox policy, written to stateDir as merged-policy.yaml, applied via openshell, and persisted into plan.json, with tests covering success, malformed current policy, and empty-additions no-op.

Changes

Ollama tool-calling validation

Layer / File(s)	Summary
Probe helper import & re-export `src/lib/onboard.ts`, `src/lib/inference/onboard-probes.ts`	Re-exported `hasChatCompletionsToolCall` and `hasChatCompletionsToolCallLeak` so onboarding and tests can use them.
Validation option and flow `src/lib/onboard.ts`	`validateOpenAiLikeSelection` gains `requireChatCompletionsToolCalling?: boolean`; Ollama local selection paths now use `skipResponsesProbe: true` and `requireChatCompletionsToolCalling: true`.
Probe implementation & helpers `src/lib/inference/onboard-probes.ts`	Added parsing helpers to detect structured `tool_calls` and leaked stringified tool-call JSON, `probeChatCompletionsToolCalling`, sandbox-internal fail-closed behavior, and stricter retry handling.
Tests & types `src/lib/inference/onboard-probes.test.ts`, `test/onboard.test.ts`	Added sandbox-internal probe tests, unit tests for detection helpers, and test-type exports/imports to expose helpers to tests.

Blueprint policy additions application

Layer / File(s)	Summary
Policy types & validators `nemoclaw/src/blueprint/runner.ts`	Added TypeScript types and predicate helpers to validate `components.policy.additions` schema (rules/endpoints and enums for methods/protocol/enforcement/tls).
Parse & merge utilities `nemoclaw/src/blueprint/runner.ts`	Added `parseCurrentPolicy(raw)` to extract YAML after the first `---` and `mergePolicyAdditions(currentPolicyRaw, additions)` which preserves other top-level keys, normalizes `version`, and merges `network_policies`.
actionApply wiring `nemoclaw/src/blueprint/runner.ts`	Create `stateDir` early; when `policy_additions` non-empty, fetch current policy, merge, write `merged-policy.yaml` with `0600` into stateDir, call `openshell policy set --policy <file> --wait <sandbox>`, and persist `policy_additions` into `plan.json`.
Tests `nemoclaw/src/blueprint/runner.test.ts`	Tests updated/added: loadBlueprint success for schema-valid additions, loadBlueprint failure for invalid additions, actionApply success merging+apply, actionApply failure on malformed current policy (no set), and actionApply no-op when additions empty.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Runner as actionApply
  participant Openshell
  participant FS as stateDir
  User->>Runner: run actionApply with blueprint
  Runner->>Runner: create stateDir, compute policyAdditions
  alt policyAdditions present
    Runner->>Openshell: openshell policy get --full <sandbox>
    Openshell-->>Runner: raw policy (--- + YAML)
    Runner->>Runner: parse & merge additions
    Runner->>FS: write merged-policy.yaml (0600)
    Runner->>Openshell: openshell policy set --policy merged-policy.yaml --wait <sandbox>
    Openshell-->>Runner: result
  else empty
    Runner-->Runner: skip openshell policy calls
  end

sequenceDiagram
  participant Onboard
  participant Probe as probeChatCompletionsToolCalling
  participant Endpoint as ModelEndpoint
  Onboard->>Probe: requireChatCompletionsToolCalling probe
  Probe->>Endpoint: POST /v1/chat/completions (force tool)
  Endpoint-->>Probe: response (choices -> message -> tool_calls / content)
  Probe->>Probe: validate structured tool_calls or detect leaked JSON
  alt structured tool_calls
    Probe-->>Onboard: ok:true
  else leaked or invalid
    Probe-->>Onboard: ok:false, error
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I sniffed the calls that hide in text,
I stitched the policies the blueprint next.
I probe for tools with careful art,
I write the merge and set it apart.
The rabbit hops—persist, apply, restart.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'fix: complete `#2731`, `#2697`, and `#2727` reliability fixes' accurately summarizes the main changes across onboarding tool-call validation, blueprint policy additions, and sandbox rebuild handling.
Linked Issues check	✅ Passed	All three linked issues are addressed: `#2731` tool-call leak detection implemented via hasChatCompletionsToolCall/hasChatCompletionsToolCallLeak helpers and probeChatCompletionsToolCalling probe; `#2697` policy additions now merged and applied via parseCurrentPolicy/mergePolicyAdditions; `#2727` tar partial-failure handling added (though tar exit code 2 resilience not visible in summaries).
Out of Scope Changes check	✅ Passed	All changes align with stated objectives: onboarding probe enhancements, policy additions merge/apply, and test coverage. No unrelated alterations detected in the file summaries.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

nemoclaw/src/blueprint/runner.ts (1)
224-230: ⚡ Quick win

Add try-catch around YAML.parse() to handle malformed policy output gracefully.

If openshell policy get --full returns output that passes the initial checks but contains malformed YAML (e.g., truncated output, error messages embedded in the response), YAML.parse() will throw an unhandled exception with a confusing YAMLParseError message. The established pattern in src/lib/policies.ts wraps this in a try-catch and returns a safe fallback.
🛡️ Proposed fix to add error handling
 function parseCurrentPolicy(raw: string): UnknownRecord {
   const sep = raw.indexOf("---");
   const yaml = (sep >= 0 ? raw.slice(sep + 3) : raw).trim();
   if (!yaml) return {};
-  const parsed = YAML.parse(yaml);
-  return isObjectLike(parsed) ? parsed : {};
+  try {
+    const parsed = YAML.parse(yaml);
+    return isObjectLike(parsed) ? parsed : {};
+  } catch {
+    return {};
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/blueprint/runner.ts` around lines 224 - 230, parseCurrentPolicy
currently calls YAML.parse(yaml) without protection; wrap the parse in a
try-catch inside parseCurrentPolicy so malformed YAML from `openshell policy get
--full` doesn't throw: catch parsing errors, optionally log or debug the error,
and return the safe fallback (an empty object) if parse fails; ensure you still
return parsed when isObjectLike(parsed) is true and keep the same function
signature and behavior otherwise.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard-inference-probes.ts`:
- Around line 62-73: The hasChatCompletionsToolCall detector currently treats
any tool_calls[*].function.name as a valid tool call; tighten validation in
hasChatCompletionsToolCall (which uses parseJsonObject) to require the proper
function-call shape: ensure each call is an object, that call.function is an
object with call.function.type === "function", a non-empty string
call.function.name, and a present arguments field (e.g., "arguments" in call or
typeof call.arguments === "string"/"object" depending on payload expectations)
before returning true; update the tool_calls and call.function checks to enforce
these conditions so malformed payloads no longer produce false positives.

In `@test/onboard.test.ts`:
- Around line 627-659: The test only covers string-valued message.content but
not the array-form content items the detector supports; update the test for
hasChatCompletionsToolCallLeak to include at least one case where
message.content is an array like [{ text: '...'}] containing the stringified
tool-call JSON (expect true) and one array-form normal text item (expect false),
and keep the existing malformed input case; locate and modify the test block
using the hasChatCompletionsToolCallLeak calls to add these array-form
assertions.

---

Nitpick comments:
In `@nemoclaw/src/blueprint/runner.ts`:
- Around line 224-230: parseCurrentPolicy currently calls YAML.parse(yaml)
without protection; wrap the parse in a try-catch inside parseCurrentPolicy so
malformed YAML from `openshell policy get --full` doesn't throw: catch parsing
errors, optionally log or debug the error, and return the safe fallback (an
empty object) if parse fails; ensure you still return parsed when
isObjectLike(parsed) is true and keep the same function signature and behavior
otherwise.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f8816128-c928-4691-bb9e-e4b9b7dff3ad

📥 Commits

Reviewing files that changed from the base of the PR and between 6677d48 and ef604a4.

📒 Files selected for processing (7)

nemoclaw/src/blueprint/runner.test.ts
nemoclaw/src/blueprint/runner.ts
src/lib/onboard-inference-probes.ts
src/lib/onboard.ts
src/lib/sandbox-state.ts
test/onboard.test.ts
test/shields.test.ts

wscurran · 2026-04-30T14:37:56Z

✨ Thanks for submitting this PR that fixes reliability bugs across onboarding, blueprint apply, and sandbox rebuild flows, and improves tool-call validation for local Ollama/Hermes responses.

Related open issues:

#2697 blueprint.yaml's components.policy.additions is parsed but never applied (silent no-op)
#2727 nemoclaw rebuild aborts when files in .openclaw-data are root-owned
#2731 Hermes-3 8B tool-call template leaks tool calls into content as text under realistic agent shape (Ollama)

…ixes Improve Ollama chat-completions tool-call validation, apply blueprint policy additions during runner apply, and make rebuild backups tolerate partial tar output from root-owned files.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test/onboard.test.ts (1)

890-922: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add array-form message.content leak coverage.

hasChatCompletionsToolCallLeak also handles array-form content items, but this test only covers string content. Please add one positive and one negative array-form case.

Suggested test additions

   it("detects leaked stringified tool-call JSON in chat-completions content", () => {
@@
     ).toBe(true);
+    expect(
+      hasChatCompletionsToolCallLeak(
+        JSON.stringify({
+          choices: [
+            {
+              message: {
+                role: "assistant",
+                content: [
+                  {
+                    type: "text",
+                    text: '{"name":"sessions_send","arguments":{"message":"hello"}}',
+                  },
+                ],
+                tool_calls: null,
+              },
+            },
+          ],
+        }),
+      ),
+    ).toBe(true);
     expect(
       hasChatCompletionsToolCallLeak(
         JSON.stringify({
@@
     ).toBe(false);
+    expect(
+      hasChatCompletionsToolCallLeak(
+        JSON.stringify({
+          choices: [
+            {
+              message: {
+                role: "assistant",
+                content: [{ type: "text", text: "Regular assistant text response." }],
+                tool_calls: null,
+              },
+            },
+          ],
+        }),
+      ),
+    ).toBe(false);
     expect(hasChatCompletionsToolCallLeak("{")).toBe(false);
   });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/onboard.test.ts` around lines 890 - 922, Update the "detects leaked
stringified tool-call JSON in chat-completions content" test to also cover
array-form message.content cases: add one positive case where message.content is
an array (e.g., [{type: "output_text", text: '{ "arguments":{...},
"name":"sessions_send" }'}]) that should make
hasChatCompletionsToolCallLeak(...) return true, and one negative case where
message.content is an array of normal text items that should return false;
locate the test block containing hasChatCompletionsToolCallLeak and add these
two expectations mirroring the existing string-form assertions.

🧹 Nitpick comments (2)

test/onboard.test.ts (1)

85-86: ⚡ Quick win

Fail fast when newly required internals are missing.

Lines 85-86 and 189-190 add new required helpers, but isOnboardTestInternals does not validate these keys. If exports drift, the suite fails later with less actionable errors.

Suggested patch

 function isOnboardTestInternals(
   value: OnboardTestInternalsCandidate,
 ): value is OnboardTestInternals {
   return (
@@
     typeof value.classifySandboxCreateFailure === "function" &&
+    typeof value.hasChatCompletionsToolCall === "function" &&
+    typeof value.hasChatCompletionsToolCallLeak === "function" &&
     typeof value.getDefaultSandboxNameForAgent === "function" &&
@@
   );
 }

Also applies to: 189-190

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/onboard.test.ts` around lines 85 - 86, Update the isOnboardTestInternals
runtime type guard to ensure the newly required helpers are present: validate
that the object has functions named hasChatCompletionsToolCall and
hasChatCompletionsToolCallLeak (accepting optional string|null and returning
boolean) alongside the existing checks so missing exports fail fast; modify the
isOnboardTestInternals implementation to check typeof
internals.hasChatCompletionsToolCall === 'function' and typeof
internals.hasChatCompletionsToolCallLeak === 'function' and include them in the
overall boolean result returned by isOnboardTestInternals.

nemoclaw/src/blueprint/runner.ts (1)

509-510: ⚡ Quick win

Consider restricting file permissions for the merged policy file.

The established pattern in src/lib/policies.ts (context snippet 2) writes policy files with mode: 0o600 to restrict access. The merged policy may contain sensitive network configuration.

🔒 Proposed fix

     const mergedPolicyFile = join(stateDir, "merged-policy.yaml");
-    writeFileSync(mergedPolicyFile, mergePolicyAdditions(currentPolicy.stdout, policyAdditions));
+    writeFileSync(mergedPolicyFile, mergePolicyAdditions(currentPolicy.stdout, policyAdditions), {
+      encoding: "utf-8",
+      mode: 0o600,
+    });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/blueprint/runner.ts` around lines 509 - 510, The merged policy
file is written without file-permission restrictions; update the write to use a
restricted mode (0o600) so only the owner can read/write the merged policy.
Locate the mergedPolicyFile and the writeFileSync call that writes
mergePolicyAdditions(currentPolicy.stdout, policyAdditions) (in the runner
module) and change the write to include file-mode options (mode: 0o600) or an
equivalent API to set permissions after write, ensuring the merged policy
inherits the same access restrictions used elsewhere (see pattern in
src/lib/policies.ts).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoclaw/src/blueprint/runner.ts`:
- Around line 224-230: The function parseCurrentPolicy shadows the imported path
separator named sep and calls YAML.parse without error handling; rename the
local separator variable (e.g., sepIndex or idx) to avoid shadowing the imported
sep, wrap the YAML.parse call in a try/catch following the pattern in
src/lib/policies.ts, and return {} (and optionally log the parse error) on parse
failure while still returning parsed only if isObjectLike(parsed) is true;
update references inside parseCurrentPolicy accordingly.

---

Duplicate comments:
In `@test/onboard.test.ts`:
- Around line 890-922: Update the "detects leaked stringified tool-call JSON in
chat-completions content" test to also cover array-form message.content cases:
add one positive case where message.content is an array (e.g., [{type:
"output_text", text: '{ "arguments":{...}, "name":"sessions_send" }'}]) that
should make hasChatCompletionsToolCallLeak(...) return true, and one negative
case where message.content is an array of normal text items that should return
false; locate the test block containing hasChatCompletionsToolCallLeak and add
these two expectations mirroring the existing string-form assertions.

---

Nitpick comments:
In `@nemoclaw/src/blueprint/runner.ts`:
- Around line 509-510: The merged policy file is written without file-permission
restrictions; update the write to use a restricted mode (0o600) so only the
owner can read/write the merged policy. Locate the mergedPolicyFile and the
writeFileSync call that writes mergePolicyAdditions(currentPolicy.stdout,
policyAdditions) (in the runner module) and change the write to include
file-mode options (mode: 0o600) or an equivalent API to set permissions after
write, ensuring the merged policy inherits the same access restrictions used
elsewhere (see pattern in src/lib/policies.ts).

In `@test/onboard.test.ts`:
- Around line 85-86: Update the isOnboardTestInternals runtime type guard to
ensure the newly required helpers are present: validate that the object has
functions named hasChatCompletionsToolCall and hasChatCompletionsToolCallLeak
(accepting optional string|null and returning boolean) alongside the existing
checks so missing exports fail fast; modify the isOnboardTestInternals
implementation to check typeof internals.hasChatCompletionsToolCall ===
'function' and typeof internals.hasChatCompletionsToolCallLeak === 'function'
and include them in the overall boolean result returned by
isOnboardTestInternals.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 82c81299-e984-4adf-a63e-ee24013306c9

📥 Commits

Reviewing files that changed from the base of the PR and between ef604a4 and f632cc0.

📒 Files selected for processing (7)

nemoclaw/src/blueprint/runner.test.ts
nemoclaw/src/blueprint/runner.ts
src/lib/onboard-inference-probes.ts
src/lib/onboard.ts
src/lib/sandbox-state.ts
test/onboard.test.ts
test/shields.test.ts

✅ Files skipped from review due to trivial changes (4)

test/shields.test.ts
src/lib/onboard.ts
nemoclaw/src/blueprint/runner.test.ts
src/lib/sandbox-state.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/lib/onboard-inference-probes.ts

Replays blueprint policy additions application and strict Ollama chat-completions tool-call validation on current main. Drops the rebuild backup portion because NVIDIA#2734 already covers NVIDIA#2727.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

src/lib/inference/onboard-probes.ts (1)
83-94: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten tool_calls validation before treating this probe as a pass.

This still accepts malformed entries like { function: { name: "x" } }, so strict onboarding can false-pass a backend that will fail once dispatch expects a real function call shape. Require type === "function" and a present arguments field before returning true (and ideally don’t stop at only choices[0]).
For the OpenAI-compatible Chat Completions API, what fields are required on each `message.tool_calls[]` entry for a valid function tool call? Please verify whether a valid entry requires `type: "function"` plus `function.name` and `function.arguments`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/inference/onboard-probes.ts` around lines 83 - 94, The
hasChatCompletionsToolCall probe is too permissive: tighten validation inside
hasChatCompletionsToolCall so it verifies each message.tool_calls entry has type
=== "function", a function object with a non-empty name string, and a present
arguments field (ensure typeof arguments !== "undefined" or is an object/string
as your runtime expects) before returning true; also iterate over all
parsed.choices (not only choices[0]) and check each choice.message.tool_calls
array entries for the stricter shape (use symbols: hasChatCompletionsToolCall,
parsed.choices, message.tool_calls, call.type, call.function.name,
call.function.arguments).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/inference/onboard-probes.ts`:
- Around line 68-81: parseStringifiedToolCall currently only accepts a bare
{name, arguments} object; update it to also detect and unwrap common Chat
Completions wrapper shapes so leaks like {type: "function", function: {...}} or
{tool_calls: [...]} are recognized. Modify parseStringifiedToolCall to
JSON.parse the trimmed string as before, then if the parsed object lacks direct
name/arguments, check for parsed.function (and treat parsed.function as the
candidate), check for parsed.tool_calls being an array and use
parsed.tool_calls[0], and any similar wrapper that contains a nested object with
name (string) and arguments (present). After unwrapping run the same validations
(name is non-empty string, hasOwnProperty "arguments") and return the inner tool
object or null.
- Around line 516-525: The retry branch must use the same structured-tool-call
probe when requireChatCompletionsToolCalling is true; modify the timeout-retry
path so it calls probeChatCompletionsToolCalling(endpointUrl, model, apiKey, {
authMode: options.authMode }) instead of falling back to
runChatCompletionsProbe(...). Locate the retry logic that currently chooses
between probeChatCompletionsToolCalling and runChatCompletionsProbe (using
symbols requireChatCompletionsToolCalling, probeChatCompletionsToolCalling,
runChatCompletionsProbe, authHeader, appendKey) and ensure both initial and
doubled-timeout retry branches preserve the same probe selection based on
options.requireChatCompletionsToolCalling.

---

Duplicate comments:
In `@src/lib/inference/onboard-probes.ts`:
- Around line 83-94: The hasChatCompletionsToolCall probe is too permissive:
tighten validation inside hasChatCompletionsToolCall so it verifies each
message.tool_calls entry has type === "function", a function object with a
non-empty name string, and a present arguments field (ensure typeof arguments
!== "undefined" or is an object/string as your runtime expects) before returning
true; also iterate over all parsed.choices (not only choices[0]) and check each
choice.message.tool_calls array entries for the stricter shape (use symbols:
hasChatCompletionsToolCall, parsed.choices, message.tool_calls, call.type,
call.function.name, call.function.arguments).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8880a1bb-b922-4aa6-a659-046d5a677a5b

📥 Commits

Reviewing files that changed from the base of the PR and between c0f3b90 and 5d0a127.

📒 Files selected for processing (4)

nemoclaw/src/blueprint/runner.test.ts
nemoclaw/src/blueprint/runner.ts
src/lib/inference/onboard-probes.test.ts
src/lib/inference/onboard-probes.ts

ericksoa

Approved after reviewing the current head and validating that checks pass.

## Summary - Bump the docs release metadata to `0.0.38`. - Document release-prep updates for status policy versions, Local Ollama validation and cleanup, blueprint policy additions, rebuild backup handling, and NemoHermes uninstall branding. - Refresh generated `nemoclaw-user-*` skills from the updated docs. ## Source summary - #3185 -> `docs/reference/commands.md`: Documents that `nemoclaw <name> status` displays the gateway active policy version when OpenShell reports one. - #3167 -> `docs/reference/commands.md`, `docs/inference/use-local-inference.md`: Documents uninstall cleanup for matching Local Ollama auth proxy processes. - #2737 -> `docs/inference/use-local-inference.md`, `docs/network-policy/customize-network-policy.md`, `docs/manage-sandboxes/lifecycle.md`, `docs/reference/commands.md`: Documents stricter Local Ollama tool-call validation, blueprint policy additions, and partial rebuild backup handling. - #3220 -> `docs/reference/commands.md`: Documents NemoHermes-specific uninstall progress and completion text. - #3158 -> `.agents/skills/nemoclaw-user-configure-inference/*`: Refreshes generated user skills from existing `docs/inference/switch-inference-providers.md` heartbeat documentation. - #3199 -> `.agents/skills/nemoclaw-user-get-started/SKILL.md`: Refreshes generated user skills from existing `docs/get-started/quickstart.md` Model Router wording. ## Skipped - #3272 and #3268 were already documented by their merged docs updates on `main`. - #3154, #3216, #3166, and #3195 have no additional user-facing docs impact for this release-prep pass. - No commits matched `docs/.docs-skip`. ## Test plan - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user` - `make docs` - `npm run build:cli` - Commit and pre-push hooks: markdownlint, docs-to-skills verification, gitleaks, commitlint, skills YAML tests, CLI typecheck Made with [Cursor](https://cursor.com)  ## Summary by CodeRabbit * **Behavior Changes** * Rebuild now safely handles partial backups, preserving successfully captured entries while reporting only unarchived paths * Uninstall for Local Ollama setups now stops proxy processes before cleanup * Local Ollama models require stricter tool-call response validation during onboarding * Blueprint policy additions enable custom network policy extensions via `components.policy.additions` * New `NEMOCLAW_AGENT_HEARTBEAT_EVERY` configuration controls agent periodic task frequency * Status display now shows active policy version when available  Co-authored-by: Cursor <cursoragent@cursor.com>

…ture (#3363)  ## Summary Supersedes #3290. Introduces declarative **scenario-based E2E orchestration** and the **Phase 1 pre-flight infrastructure** that prepares for wave-by-wave migration of the existing `test/e2e/test-*.sh` scripts into the new structure. The old legacy scripts remain in place and unchanged; no E2E workflow is deleted or muted by this PR. The scope grew beyond the original #3290 title after rebase against `main` and the addition of Phase 1 helpers, fixtures, parity harness, and convention lint. Because upstream rules forbid force-push on feature branches, this is opened as a fresh PR with clean linear history rather than rewriting #3290 in place. ## Related Issue Supersedes #3290. No single tracking issue — foundation for the E2E scenario-matrix refactor discussed across #2737, #3154, and the E2E test-gap audit. ## Changes ### Part 1 — scenario-based E2E orchestration (commits 1–3, previously in #3290) - **Declarative metadata** (`test/e2e/scenarios.yaml`, `expected-states.yaml`, `suites.yaml`) for initial scenarios: Ubuntu cloud OpenClaw/Hermes, macOS, WSL, GPU local Ollama, Brev launchable, no-Docker negative. - **TypeScript resolver/validator/plan-printer/coverage report** under `test/e2e/resolver/` (invoked via `tsx`, uses `js-yaml` already in root `package.json`; unit-tested in the Vitest `cli` project). - **Shell helpers** under `test/e2e/lib/` that produce a normalized context at `$E2E_CONTEXT_DIR` (default `.e2e/`). Wraps existing `sandbox-teardown.sh` and `install-path-refresh.sh`; does not duplicate them. - **Entry scripts**: `run-scenario.sh`, `run-suites.sh`, `coverage-report.sh` with `--plan-only`, `--dry-run` (`E2E_DRY_RUN=1`) flags. - **Suite scripts** under `test/e2e/suites/{smoke,inference,credentials,lifecycle,messaging,onboarding,platform,sandbox,security}/`. - **New workflow** `.github/workflows/e2e-scenarios.yaml` — manual (`workflow_dispatch`) with `scenario`, `plan_only`, and `suite_filter` inputs. Existing workflows are unchanged. - `.gitignore`: add `.e2e/` runtime context directory. ### Part 2 — Phase 1 pre-flight migration infrastructure (commits 4–7, new) - **`test/e2e/MIGRATION.md`** — in-tree tracker mapping each legacy `test-*.sh` script to its target scenario + suite home. Includes the reuse-absorption table showing where pair-variant coverage (e.g. OpenClaw + Hermes) collapses into a single suite. - **Fixtures** (`test/e2e/lib/fixtures/`): `fake-openai.sh`, `fake-{telegram,discord,slack}.sh` (built on shared `_fake-http-stub.sh`), `older-base-image.sh`. Follow the inline-Node `http.createServer` pattern from `test-messaging-providers.sh`; no new framework. - **Helpers** (`test/e2e/lib/`): `logging.sh` (PASS/FAIL/=== Phase markers), `sandbox-exec.sh` (`e2e_sandbox_exec` / `e2e_sandbox_exec_stdin`). `env.sh` auto-sources `logging.sh`. - **Assertion helpers** (`test/e2e/lib/assert/`): `inference-works.sh`, `no-credentials-leaked.sh`, `policy-preset-applied.sh`, `messaging-bridge-reachable.sh`. - **Install splits** (`test/e2e/lib/setup/`): `install-{repo,curl,ollama,launchable}.sh` + `install.sh` dispatcher, so each migrated suite can compose the minimal install it needs. - **Runtime probe wiring**: `run-scenario.sh --validate-only` flag (mutually exclusive with `--plan-only`) that reads `$E2E_CONTEXT_DIR/context.env`, runs the expected-state validator, and emits JSON — skipping install/onboard/suites. `resolver/index.ts` gains `E2E_PROBE_OVERRIDES_JSON` escape hatch for probe keys with embedded underscores. - **Convention lint**: `scripts/e2e/lint-conventions.ts` (6 rules, Vitest-backed in the `cli` project) blocks new orphan `test-*.sh` scripts and enforces `parity-map.yaml` completeness. - **Parity harness**: `scripts/e2e/compare-parity.sh`, `test/e2e/parity-map.yaml` (seeded with 39 legacy-script placeholder entries), `.github/workflows/e2e-parity-compare.yaml`. This is the mechanism that will drive waves 2–12 of the migration — each migrated suite gets dispatched against the parity-compare workflow until zero divergence vs. its legacy counterpart. ### What this PR does **not** change - Existing E2E workflows (`nightly-e2e.yaml`, `macos-e2e.yaml`, `wsl-e2e.yaml`, `ollama-proxy-e2e.yaml`, `e2e-branch-validation.yaml`, `sandbox-images-and-e2e.yaml`) run as before. - Existing `test/e2e/test-*.sh` scripts are **not** deleted. Per-wave retirement happens in follow-up PRs once `parity-map.yaml` entries report zero divergence. - No new behavior in the product itself; this is test-harness scaffolding only. ### Migration sequencing (follow-up PRs) Waves 2–12 migrate the 40 legacy `test-*.sh` scripts into the scenario matrix, one wave per PR, gated by parity-compare zero-divergence. Phase 13 is the merge gate that flips the default CI dispatch from `nightly-e2e.yaml` to `e2e-parity-compare.yaml`. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes — 3,258 `cli` tests, 0 failures (41 new tests from Phase 1) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes (`test/e2e/README.md`, `test/e2e/MIGRATION.md`, `AGENTS.md`) - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ### Fork E2E evidence (Phase 1 branch dispatch) Dispatched `nightly-e2e.yaml` on `jyaunches/NemoClaw` against this branch (with a temporary local-only repo-gate lift, not part of this PR). Result: | Outcome | Count | Notes | |---|---|---| | ✅ success | 35 | All CPU E2E jobs green on fork | | ⏭ skipped by design | 2 | `gpu-e2e`, `gpu-double-onboard-e2e` — need NVIDIA self-hosted GPU runner | | ⏸ unrunnable on fork | 1 | `hermes-slack-e2e` — needs `linux-amd64-cpu4` runner (NVIDIA-internal) | | ⚠️ pre-existing fork-environment hang | 1 | `hermes-e2e` hangs during `nemohermes onboard` on fork (not a Phase 1 regression — same legacy script, no changes to onboard path) | **35/35 of jobs that could run on the fork pass.** This attests that Phase 1 infrastructure does not regress any existing E2E coverage. --- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Introduced scenario-based end-to-end testing framework with declarative setup profiles and ordered validation suites. * **Tests** * Added CI workflows for scenario execution and parity comparison with legacy tests. * New validation suites for smoke tests, inference, credentials, platform checks, and security. * **Documentation** * Added E2E testing guide and migration tracker for moving tests to scenario-based architecture. * **Chores** * Added test infrastructure, fixtures, and assertion helpers supporting multiple platforms and configurations.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3363)   --------- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread src/lib/inference/onboard-probes.ts

Comment thread test/onboard.test.ts

wscurran added bug Something fails against expected or documented behavior dependencies Pull requests that update a dependency file provider: ollama Ollama local model provider behavior labels Apr 30, 2026

wscurran added the integration: hermes Hermes integration behavior label Apr 30, 2026

This was referenced Apr 30, 2026

Hermes-3 8B tool-call template leaks tool calls into content as text under realistic agent shape (Ollama) #2731

Closed

nemoclaw rebuild aborts when files in .openclaw-data are root-owned #2727

Closed

ericksoa self-assigned this May 2, 2026

fix: complete NVIDIA#2731, NVIDIA#2697, and NVIDIA#2727 reliability f…

f632cc0

…ixes Improve Ollama chat-completions tool-call validation, apply blueprint policy additions during runner apply, and make rebuild backups tolerate partial tar output from root-owned files.

Saibabu7770 force-pushed the fix/2731-ollama-tool-call-fallback branch from ef604a4 to f632cc0 Compare May 2, 2026 19:36

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

Comment thread nemoclaw/src/blueprint/runner.ts

Saibabu7770 and others added 4 commits May 2, 2026 15:18

Merge branch 'main' into fix/2731-ollama-tool-call-fallback

191a419

Merge branch 'main' into fix/2731-ollama-tool-call-fallback

497b2e5

Merge main and finish reliability fixes

c0f3b90

Replays blueprint policy additions application and strict Ollama chat-completions tool-call validation on current main. Drops the rebuild backup portion because NVIDIA#2734 already covers NVIDIA#2727.

Harden policy additions and Ollama validation

5d0a127

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread src/lib/inference/onboard-probes.ts

Comment thread src/lib/inference/onboard-probes.ts

ericksoa added 4 commits May 8, 2026 07:44

Keep strict tool-call retries strict

917d3e7

Tighten chat tool-call probe detection

bf62600

test: cover strict Ollama and policy validation

7957e64

style: format blueprint policy tests

99133f2

ericksoa approved these changes May 8, 2026

View reviewed changes

ericksoa enabled auto-merge (squash) May 8, 2026 17:28

ericksoa disabled auto-merge May 8, 2026 17:30

ericksoa merged commit 97f8a9a into NVIDIA:main May 8, 2026
12 of 13 checks passed

jyaunches mentioned this pull request May 8, 2026

fix(runtime): repair stale inference DNS routes #3267

Merged

miyoungc mentioned this pull request May 8, 2026

docs: prepare 0.0.38 release docs #3296

Merged

wscurran mentioned this pull request May 11, 2026

feat(e2e): introduce scenario-based setup matrix and runner #3290

Closed

12 tasks

jyaunches mentioned this pull request May 11, 2026

feat(e2e): scenario-based setup matrix + Phase 1 migration infrastructure #3363

Merged

12 tasks

wscurran added bug-fix PR fixes a bug or regression chore Build, CI, dependency, or tooling maintenance and removed bug Something fails against expected or documented behavior chore Build, CI, dependency, or tooling maintenance labels Jun 3, 2026

Conversation

Saibabu7770 commented Apr 30, 2026 • edited by ericksoa Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wscurran commented Apr 30, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Saibabu7770 commented Apr 30, 2026 •

edited by ericksoa

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading