fix(inference): inject tool-less system prompt for Ultra 550B (#4851) by cjagwani · Pull Request #5085 · NVIDIA/NemoClaw

cjagwani · 2026-06-09T20:01:29Z

Summary

When nvidia/nemotron-3-ultra-550b-a55b is asked to perform a multi-step task (e.g. "create a file then run it") and the request has no system message and no execution-capable tools, the model plans all steps in reasoning_content but silently drops intermediate steps from content. The reporter saw content with only the final run command; in my repro content was empty. chat_template_kwargs.force_nonempty_content does not help — verified by direct curl to NVIDIA Endpoints with and without that kwarg.

Extends the existing nemotron-inference-fix preload to also inject a one-paragraph system message when the model matches Ultra 550B AND the caller supplied no system message AND no execution-capable tools (bash_execute, write_file, etc. — tight allowlist, not a substring regex). Scoped narrowly so it never overrides caller intent, never interferes with tool-using flows, and doesn't trip on harmless business tools like create_ticket or run_query.

Changes

nemoclaw-blueprint/scripts/nemotron-inference-fix.js:
- Add TOOL_LESS_SYSTEM_PROMPT_RULES + applyToolLessSystemPrompt chained into patchJsonBody alongside applyChatTemplateKwargs
- Tight EXECUTION_TOOL_NAMES allowlist (Set of canonical exec/write tool names) to avoid false positives on harmless tool names containing substrings like "create", "run", "save", "command"
- Scan ALL messages (not just messages[0]) for an existing system message
- Source-of-truth / removal contract block matching the existing #4063 format (invalid state, source boundary, why-not-fix-source, regression proof, removal condition)
- Comment documenting path+model as the intentional trust boundary (this preload only runs inside NemoClaw-managed sandboxes via NODE_OPTIONS)
test/nemotron-inference-fix.test.ts:
- New (#4851) test with 9 branches: inject, skip-system-at-0, skip-system-at-mid, skip-with-exec-tool, skip-non-matching-model, inject-with-non-exec-tools, skip-with-mixed-tools-containing-exec, inject-with-broad-token-business-tools (create_ticket/run_query/save_search/command_palette), skip-with-write_file
- New pins path+model as the intended scope boundary contract test asserting all three hosts (inference.local, integrate.api.nvidia.com, some-other-openai-compat-host) get the injection — pins the documented scope contract
- Extended real-fetch/undici test to cover Ultra 550B injection AND Content-Length refresh

Live verification (GCP Brev box, direct curl to `integrate.api.nvidia.com`)

Variant	content length
Plain request (no preload)	1 char
+ `chat_template_kwargs.force_nonempty_content` only (existing preload)	1 char (no help)
+ this PR's system-prompt injection	501 chars (full heredoc + `python3 /tmp/hello.py`)
With caller-supplied system message	caller's preserved, no injection
With execution-capable tools	`finish_reason: tool_calls`, no injection (correct)

Reasoning side stays stable (~184 chars) — the fix lets the model emit what it was already planning.

Scope note (model-output runtime validation)

PR Review Advisor flagged "request-mutation tests do not prove the linked model-output behavior". Live curl above is the acceptance evidence. CI-side runtime model-output validation requires API-key secret infrastructure (not currently set up for this preload's CI test path) and is intentionally out of scope for this PR.

Verification checklist

npm test passes (6/6 on test/nemotron-inference-fix.test.ts, 9 injection branches + scope contract test + all kwargs regressions)
Live end-to-end verification through the preload against integrate.api.nvidia.com
Tests added for new behavior
No secrets, API keys, or credentials committed
CodeRabbit Major (system message scan) addressed
PR Review Advisor needs-attention items addressed (tool predicate refined)
PR Review Advisor worth-checking items addressed (SoT contract, fetch coverage, broad-regex tightened, scope contract test added)

Refs #4851 (NVB#6272828 tracked upstream).

Summary by CodeRabbit

New Features
- Model-specific system prompt is prepended for certain Nemotron Ultra requests when no system message and no execution-capable tools are present.
Bug Fixes
- Body mutations are applied in sequence and the request body/Content-Length are updated only when changes occur.
Tests
- Expanded unit and integration tests covering model matching, tool presence, message placement, and cross-host behavior.
Documentation
- Added an e2e runtime validation runbook for the Ultra tool-less injection scenario.

When `nvidia/nemotron-3-ultra-550b-a55b` is asked to perform a multi-step task (e.g. "create a file then run it") and the request has no system message and no tools, the model plans all steps in `reasoning_content` but silently drops intermediate steps from `content`. The reporter saw content with only the final run command; in my repro content was empty. chat_template_kwargs.force_nonempty_content does not help — verified by direct curl to NVIDIA Endpoints with and without that kwarg. Extends the existing nemotron-inference-fix preload to also inject a one-paragraph system message when the model matches Ultra 550B AND the caller supplied no system message AND no tools. Scoped narrowly so it never overrides caller intent or interferes with tool-using flows. Live-verified end-to-end through the preload via Node fetch to NVIDIA Endpoints: - Without fix: content = 1 char (empty) - With this fix: content = 501 chars including heredoc + run command - Caller-system: caller's "Respond only with 'OK'" preserved - With-tools: no injection, model uses tool path normally Tests cover the four branches (inject, skip-with-system, skip-with-tools, skip-for-other-model) and all four kwargs regression tests still pass. Refs #4851 (NVB#6272828 tracked upstream). Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

coderabbitai · 2026-06-09T20:01:43Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR extends the Nemotron inference preload's request-body mutation pipeline to prepend a model-specific system prompt for nvidia/nemotron-3-ultra-550b when body.messages contains no system role and body.tools is absent or empty, and updates tests to validate the conditional injection while preserving existing chat_template_kwargs behavior.

Changes

Ultra 550B System Message Injection

Layer / File(s)	Summary
System Prompt Rules and Request Mutation Pipeline `nemoclaw-blueprint/scripts/nemotron-inference-fix.js`	Defines `TOOL_LESS_SYSTEM_PROMPT_RULES` for `nvidia/nemotron-3-ultra-550b`, adds execution-capable tool detection (`EXECUTION_TOOL_NAME_RE`, `isExecutionCapableTool`, `hasExecutionCapableTool`), implements `toolLessSystemPromptForModel` and `applyToolLessSystemPrompt`, and updates `patchJsonBody` to apply `chat_template_kwargs` and tool-less system prompt injection, re-serializing only when changes occur.
System Prompt Injection Test Coverage `test/nemotron-inference-fix.test.ts`	Updates import ordering, extends the real fetch/undici harness with an Ultra 550B request asserting injected system message and refreshed `Content-Length`, and adds a stubbed `http.request` Vitest test covering injection, preservation, and skip cases across model/tool/message variants and verifying `chat_template_kwargs`.
E2E Runtime Validation Runbook `test/e2e-runtime/4851-ultra-toolless-validation.md`	Adds a runtime validation runbook with prerequisites and three curl/JQ scenarios (baseline, kwarg-only, and system-message+kwarg) describing expected verification steps and outcomes.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant FetchWrapper
  participant patchJsonBody
  participant ToolLessRuleEngine

  Client->>FetchWrapper: POST /v1/chat/completions (body)
  FetchWrapper->>patchJsonBody: patchJsonBody(body)
  patchJsonBody->>ToolLessRuleEngine: evaluate model, messages, tools
  ToolLessRuleEngine-->>patchJsonBody: decision (inject / skip)
  patchJsonBody->>patchJsonBody: applyChatTemplateKwargs()
  patchJsonBody->>patchJsonBody: applyToolLessSystemPrompt() (if applicable)
  patchJsonBody-->>FetchWrapper: patchedBody or null
  FetchWrapper->>Client: proceed with modified or original request

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

NVIDIA/NemoClaw#4188: Modifies the same nemotron-inference-fix.js request-body mutation pipeline for /v1/chat/completions.

Suggested labels

bug-fix

Suggested reviewers

cv

Poem

🐰 I hop through JSON, gentle and spry,
When Ultra is quiet and no tools apply,
I tuck a prompt at the very start,
So Nemotron greets the user's heart,
A merry fix — a carrot, oh my!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: injecting a tool-less system prompt for Ultra 550B models, with a specific issue reference.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/ultra-550b-toolless-systemprompt-4851

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T20:05:22Z

E2E Advisor Recommendation

Required E2E: cloud-inference-e2e, agent-turn-latency-e2e
Optional E2E: inference-routing-e2e, kimi-inference-compat-e2e

Dispatch hint: cloud-inference-e2e,agent-turn-latency-e2e

Auto-dispatched E2E: cloud-inference-e2e, agent-turn-latency-e2e via nightly-e2e.yaml at b64911fe2dfd1df9943857ccc32fefef84663df2 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

cloud-inference-e2e (medium; live NVIDIA API key, Docker, timeout 30 minutes): Validates the live sandbox -> inference.local -> NVIDIA Endpoints /v1/chat/completions path after changing the preload that mutates those requests. This should catch runtime preload wiring, syntax, Content-Length, and normal Nemotron-family live chat regressions.
agent-turn-latency-e2e (high; live NVIDIA API key, two sandbox installs, timeout 120 minutes): Closest existing live E2E coverage for the affected Ultra 550B model. It installs OpenClaw and Hermes sandboxes, configures nvidia/nemotron-3-ultra-550b-a55b through inference.local, and verifies real model-backed assistant turns do not stall or route through the slow/broken path.

Optional E2E

inference-routing-e2e (medium; live NVIDIA API key, timeout 30 minutes): Useful adjacent confidence for provider/gateway inference routing because the changed preload is scoped by /v1/chat/completions path and runs on sandbox inference traffic, even though this PR does not directly change route configuration.
kimi-inference-compat-e2e (medium; hermetic mock endpoint plus sandbox, timeout 45 minutes): The same preload still contains the Kimi thinking=false compatibility branch. This hermetic Kimi/OpenAI-compatible endpoint flow is a useful regression check that the refactored patchJson path did not break adjacent model-specific inference behavior.

New E2E recommendations

Ultra 550B tool-less response acceptance (high): No existing automated E2E appears to execute the exact [Ubuntu 24.04][Agent&Skills] Ultra 550B content omits intermediate steps when no tools configured — only final command returned #4851 live acceptance path: nvidia/nemotron-3-ultra-550b-a55b with no caller system message and no execution-capable tools, prompt asks to create a file and run it, and assertion verifies content includes both file-creation code and the run command. The new markdown runbook is manual only.
- Suggested test: Add a workflow-dispatchable live E2E that runs inside a NemoClaw sandbox against integrate.api.nvidia.com/inference.local with Ultra 550B, sends the [Ubuntu 24.04][Agent&Skills] Ultra 550B content omits intermediate steps when no tools configured — only final command returned #4851 tool-less prompt, and asserts the response content contains complete file creation steps plus python3 /tmp/hello.py or equivalent.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: cloud-inference-e2e,agent-turn-latency-e2e

github-actions · 2026-06-09T20:05:24Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: ubuntu-repo-cloud-openclaw
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required Vitest E2E scenarios

ubuntu-repo-cloud-openclaw: The PR changes the NemoClaw blueprint sandbox preload that mutates NVIDIA/OpenAI-compatible chat-completions requests. The Ubuntu repo cloud OpenClaw scenario is the smallest live-supported typed Vitest scenario that onboards an OpenClaw/NVIDIA sandbox and exercises the affected sandbox startup/inference-route surface.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional Vitest E2E scenarios

None.

Relevant changed files

nemoclaw-blueprint/scripts/nemotron-inference-fix.js

github-actions · 2026-06-09T20:05:46Z

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 1 still applies, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Execution-tool detection still only matches exact names despite the suffix contract (nemoclaw-blueprint/scripts/nemotron-inference-fix.js:183): The Ultra 550B tool-less prompt injection skips when `hasExecutionCapableTool(body)` detects an execution-capable tool, but `isExecutionCapableTool` still only performs an exact lookup in `EXECUTION_TOOL_NAMES`. The nearby comment says the allowlist matches exact known names plus canonical OpenClaw/MCP suffixes. If real OpenClaw/MCP requests expose namespaced or server-prefixed execution tool names, this preload could incorrectly prepend “You do not have tools...” even when an execution-capable tool is present.
- Recommendation: Either implement suffix-aware matching for the documented OpenClaw/MCP execution-tool suffixes, or clarify that exact-name-only matching is the intended contract and add a regression test that pins the real request shape.
- Evidence: `return EXECUTION_TOOL_NAMES.has(name.toLowerCase());` only checks exact lowercased names. Added tests cover exact names such as `exec`, `bash_execute`, `write_file`, `tool_call`, and top-level `tool.name`, but not a namespaced/server-prefixed MCP execution name.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Ultra 550B request with a namespaced/server-prefixed MCP execution tool name skips the tool-less system prompt, or exact-name-only real request shape is pinned.. Deterministic coverage is strong for request mutation, skip branches, and Content-Length handling, but the issue's end-user acceptance depends on live Ultra 550B model output. The PR adds a maintained manual runbook for that runtime validation.
**Runtime validation** — Live Ultra 550B tool-less sandbox request returns `content` containing both `/tmp/hello.py` file creation and `python3 /tmp/hello.py` after preload injection when NVIDIA API-key runtime infrastructure is available.. Deterministic coverage is strong for request mutation, skip branches, and Content-Length handling, but the issue's end-user acceptance depends on live Ultra 550B model output. The PR adds a maintained manual runbook for that runtime validation.
**Acceptance clause:** Model explains it lacks a file-write tool and shows the full code the user would need to run manually — add test evidence or identify existing coverage. The injected system prompt states the model does not have tools to write files or execute commands and asks it to include complete code/commands. Deterministic tests prove the prompt is prepended, but the checked-in Scenario C transcript primarily shows complete manual commands/code rather than literally saying it lacks a file-write tool. This is acceptable because the issue's expected result is an Either condition and the alternate clause is covered.
**Acceptance clause:** Steps to Reproduce: `nemoclaw onboard` with `nvidia/nemotron-3-ultra-550b-a55b` (NVIDIA Endpoints); `nemoclaw ultra-test connect && openclaw tui`; send `Create a file called hello.py in /tmp with a hello world script, then run it.`; observe model response and API-level `reasoning_content` vs `content` fields — add test evidence or identify existing coverage. The PR adds deterministic preload tests and a live validation runbook for the same prompt and model-output behavior. The repository tests do not execute the full `nemoclaw onboard`/TUI flow, which is reasonable without API-key runtime infrastructure.

Since last review details

Current findings:

Execution-tool detection still only matches exact names despite the suffix contract (nemoclaw-blueprint/scripts/nemotron-inference-fix.js:183): The Ultra 550B tool-less prompt injection skips when `hasExecutionCapableTool(body)` detects an execution-capable tool, but `isExecutionCapableTool` still only performs an exact lookup in `EXECUTION_TOOL_NAMES`. The nearby comment says the allowlist matches exact known names plus canonical OpenClaw/MCP suffixes. If real OpenClaw/MCP requests expose namespaced or server-prefixed execution tool names, this preload could incorrectly prepend “You do not have tools...” even when an execution-capable tool is present.
- Recommendation: Either implement suffix-aware matching for the documented OpenClaw/MCP execution-tool suffixes, or clarify that exact-name-only matching is the intended contract and add a regression test that pins the real request shape.
- Evidence: `return EXECUTION_TOOL_NAMES.has(name.toLowerCase());` only checks exact lowercased names. Added tests cover exact names such as `exec`, `bash_execute`, `write_file`, `tool_call`, and top-level `tool.name`, but not a namespaced/server-prefixed MCP execution name.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoclaw-blueprint/scripts/nemotron-inference-fix.js`:
- Around line 132-134: The current check only inspects the variable "first"
(messages[0]) and can miss system messages later in body.messages; update the
logic that returns null to instead detect any message with role === 'system' by
scanning body.messages (e.g., Array.prototype.some) and account for non-array or
non-object entries before checking role, so the preload respects a
caller-provided system prompt; modify the checks around the "first" usage to use
this array-wide detection and remove the narrow first-only assumption.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2a210f51-1e64-4a94-93a9-3b43bf89cd25

📥 Commits

Reviewing files that changed from the base of the PR and between 5e79195 and d7140da.

📒 Files selected for processing (2)

nemoclaw-blueprint/scripts/nemotron-inference-fix.js
test/nemotron-inference-fix.test.ts

github-actions · 2026-06-09T20:09:43Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27232511353
Target ref: d7140da50febc345d02ed4ef52997e4165e7fa6f
Workflow ref: main
Requested jobs: agent-turn-latency-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	⚠️ cancelled

Address PR #5085 review feedback: - CodeRabbit Major: the system-message check at lines 132-133 only inspected messages[0], but the OpenAI chat-completions contract permits a system message anywhere in the array. Switch to messages.some(...) so the "caller prompt wins" contract holds for any position. Add a fifth test case covering a system message at index 2 of a multi-turn conversation. - Biome ci flagged the test file's import order. Apply the auto-fix: alphabetize by module name (child_process, fs, os, path, vitest) and within the vitest import sort named imports (describe, expect, it). Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

github-actions · 2026-06-09T20:19:56Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27232728511
Target ref: 068efc8594b423de07b65f40e44e62d51967b195
Workflow ref: main
Requested jobs: agent-turn-latency-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	✅ success

github-actions · 2026-06-09T20:26:55Z

Selective E2E Results — ❌ Some jobs failed

Run: 27233334531
Target ref: 13c41609a60dda6b4086e60ce46c8254655cd2a2
Workflow ref: main
Requested jobs: cloud-inference-e2e,agent-turn-latency-e2e
Summary: 1 passed, 1 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	❌ failure
cloud-inference-e2e	✅ success

Failed jobs: agent-turn-latency-e2e. Check run artifacts for logs.

…ge (#4851) Address PR Review Advisor on #5085 — 1 needs-attention + 3 worth-checking. 1. Refine tool predicate (advisor "needs attention"): The original `tools.length > 0` skip missed the practical user case from #4851 where the request has `toolSearch` + `web.fetch` but no `bash_execute` / `write_file`. Replace with `hasExecutionCapableTool` matching names containing bash/exec/run/shell/cmd/command/write/edit/ patch/create/save/fs/filesystem. Non-execution tools (search, web, fetch, describe, read) no longer suppress the injection. 2. Add #4851 source-of-truth contract block mirroring the #4063 format: invalid state, source boundary, why-not-fix-source, regression proof, removal condition. 3. Add fetch/undici coverage: Extend the existing real-fetch test to assert Ultra 550B receives the injected system message AND that Content-Length is refreshed after the body grows. Previously only the stubbed http.request path was covered for injection. 4. Document path+model scope as the intended trust boundary: Add explicit comment near `TOOL_LESS_SYSTEM_PROMPT_RULES` explaining this preload runs inside NemoClaw-managed sandboxes where `inference.local` is the only chat-completions destination; non- sandbox OpenAI-compatible callers don't load this preload. Tests: 5/5 unit tests pass including 7 branches of the injection logic (inject, skip-system-at-0, skip-system-at-mid, skip-with-exec-tool, inject-with-non-exec-tools-only, skip-with-mixed-tools, skip-non- matching-model) and the new fetch-path assertion. Refs #4851. Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

…act test (#4851) Address remaining PR Review Advisor "worth-checking" items on #5085: 1. Tighten execution-tool detection (advisor "Execution-tool detection may skip the workaround for harmless tool names"): The previous regex `(bash|exec|execute|run|shell|cmd|command|write|edit| patch|create|save|fs|filesystem)` matched substrings, so harmless business tools like `create_ticket`, `run_query`, `save_search`, `command_palette` would have incorrectly suppressed the injection. Replace with an explicit allowlist (Set) of canonical exec/write tool names: bash, bash_execute, exec, execute, execute_command, shell, shell_execute, run_command, run_shell, write_file, file_write, edit_file, file_edit, patch_file, file_patch, create_file, file_create, apply_patch, str_replace_editor, computer. Two new test cases: - harmless business-tool names (create_ticket, run_query, save_search, command_palette) still trigger injection - explicit write_file correctly suppresses 2. Add explicit path+model scope boundary contract test (advisor "System- prompt injection is still scoped by path and model rather than a trusted provider boundary"): New `pins path+model as the intended scope boundary` test sends the same request to three different hosts (inference.local, integrate.api.nvidia.com, some-other-openai-compat-host.example.com) and asserts all three get the injection. This pins the documented contract: host is intentionally NOT part of the scope. A future move toward narrower host-aware gating must change this assertion too. Note on the third "worth-checking" item (request-mutation tests don't prove model-output behavior): this requires a live API call against NVIDIA Endpoints. Live verification is in the PR body; CI runtime validation needs API-key secret infrastructure (out of scope for this PR). PR body documents the live curl results. Tests: 6/6 unit tests pass with 9 branches of the injection logic. Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

github-actions · 2026-06-09T21:17:31Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27235858317
Target ref: 0b689356ded22bc9efb5939d5fbd314c5364dfc0
Workflow ref: main
Requested jobs: agent-turn-latency-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	✅ success

coderabbitai

🧹 Nitpick comments (1)

test/nemotron-inference-fix.test.ts (1)
406-425: ⚡ Quick win

Add one case for the alternate tool.name shape.

isExecutionCapableTool() supports both { name } and { function: { name } }, but these new allowlist regressions only exercise the nested form. Adding one top-level-name case here would pin the other supported branch too.

Also applies to: 483-494
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/nemotron-inference-fix.test.ts` around lines 406 - 425, The tests only
exercise the nested tool shape ({ function: { name } }) but
isExecutionCapableTool() also supports the top-level shape ({ name }), so add an
additional send() call mirroring one of the existing assertions (e.g., for the
Ultra 550B "write_file" NO injection case and/or the "create/run/save/command"
INJECTION case) using tools with the top-level name form (e.g., { type:
'function', name: 'write_file', parameters: {} }) to ensure the alternate branch
is covered; update both the case-7/8 group and the similar block at lines
~483-494 to include the top-level-name variant.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/nemotron-inference-fix.test.ts`:
- Around line 406-425: The tests only exercise the nested tool shape ({
function: { name } }) but isExecutionCapableTool() also supports the top-level
shape ({ name }), so add an additional send() call mirroring one of the existing
assertions (e.g., for the Ultra 550B "write_file" NO injection case and/or the
"create/run/save/command" INJECTION case) using tools with the top-level name
form (e.g., { type: 'function', name: 'write_file', parameters: {} }) to ensure
the alternate branch is covered; update both the case-7/8 group and the similar
block at lines ~483-494 to include the top-level-name variant.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ba2344c-59f6-4b20-baae-6321b022df35

📥 Commits

Reviewing files that changed from the base of the PR and between 0b68935 and b2e28a3.

📒 Files selected for processing (2)

nemoclaw-blueprint/scripts/nemotron-inference-fix.js
test/nemotron-inference-fix.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

nemoclaw-blueprint/scripts/nemotron-inference-fix.js

github-actions · 2026-06-09T21:23:05Z

Selective E2E Results — ❌ Some jobs failed

Run: 27236454584
Target ref: b2e28a37e336f78f2d0e1e88a95863f4b7c2d105
Workflow ref: main
Requested jobs: agent-turn-latency-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	❌ failure

Failed jobs: agent-turn-latency-e2e. Check run artifacts for logs.

cv · 2026-06-09T23:11:33Z

@cjagwani can you address the feedback in #5085 (comment) please?

github-actions · 2026-06-09T23:24:37Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27241955530
Target ref: c2fc436334ced5d2f96d0acb99492c956f011f87
Workflow ref: main
Requested jobs: agent-turn-latency-e2e,cloud-inference-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	✅ success
cloud-inference-e2e	✅ success

jyaunches · 2026-06-10T19:51:15Z

Local PR review follow-up

I re-ran the local PR review against head c2fc4363. The PR Review Advisor is current and still recommends needs_rework; I agree this should not move to CI-loop/merge yet.

Blocking / action-required items

Trusted [Ubuntu 24.04][Agent&Skills] Ultra 550B content omits intermediate steps when no tools configured — only final command returned #4851 acceptance evidence is still missing.
The repo tests prove request mutation and Content-Length refresh, but they do not prove the original model-output behavior from [Ubuntu 24.04][Agent&Skills] Ultra 550B content omits intermediate steps when no tools configured — only final command returned #4851 (reasoning_content / content / finish_reason, or that the response includes the complete file-creation code + run command / explicit no-tools explanation). The current SoT block still points to PR-body live curl evidence (nemoclaw-blueprint/scripts/nemotron-inference-fix.js:76-81), which the advisor called out as not repository-verifiable.
Execution-capable tool detection still looks incomplete.
EXECUTION_TOOL_NAMES in nemoclaw-blueprint/scripts/nemotron-inference-fix.js:135-156 still omits repo-known write-capable names: write, edit, and notebook_edit (nemoclaw/src/index.ts:342). It also does not account for compact catalog tool_call, which can delegate to real tools (scripts/patch-openclaw-tool-catalog.js:149-167). This can incorrectly inject a “no tools” system prompt when write/exec capability is actually available.
E2E scenario advisor recommendation appears not yet run.
Required branch E2Es agent-turn-latency-e2e and cloud-inference-e2e passed at c2fc4363 in run https://github.com/NVIDIA/NemoClaw/actions/runs/27241955530, but I did not find the required ubuntu-repo-cloud-openclaw e2e-scenarios.yaml run for this branch/head.

Already okay

Regular required CI checks are green.
CodeRabbit has no unresolved threads.
The host-agnostic path+model scope is documented and pinned by test; that looks like an intentional accepted boundary rather than accidental drift.

Recommendation: address the trusted validation artifact and tool-detection gaps first, then rerun the PR Review Advisor. After it comes back clean, the remaining merge blockers can be handled by the normal CI/E2E shepherding loop.

@jyaunches

…unbook (#4851) Address @jyaunches review feedback on #5085. 1. Align EXECUTION_TOOL_NAMES with nemoclaw/src/index.ts:WRITE_TOOL_NAMES so the allowlist stays in sync with the same write-capable surface OpenClaw scans for secrets. Adds bare `write`, `edit`, `notebook_edit`. 2. Add `tool_call` (the OpenClaw compact-catalog wrapper from scripts/patch-openclaw-tool-catalog.js) to the execution-capable set. When `tool_call` is in the tools array, we can't tell from the request alone which underlying tool will be dispatched, so treat it as execution-capable and skip the system-prompt injection. Otherwise the model could receive a "no tools" prompt when it actually has real exec/write capability behind the catalog wrapper. 3. Add a checked-in runtime validation runbook at test/e2e-runtime/4851-ultra-toolless-validation.md covering three scenarios (baseline, force_nonempty_content only, full preload injection) against integrate.api.nvidia.com. This is the repository-verifiable acceptance evidence the advisor and Julie's review asked for — anyone reviewing #4851 acceptance can re-run it directly against NVIDIA Endpoints rather than relying on PR text. Updated the source-of-truth contract block to reference the runbook. 4. Tests: - case 9: bare write/edit/notebook_edit (mirrors WRITE_TOOL_NAMES) - case 10: tool_call wrapper (compact catalog dispatch) - case 11: top-level tool.name shape (no nested .function) — CodeRabbit nit asking for alternate-shape coverage All 6 unit tests pass (12 injection branches + contract test + fetch-path). Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

github-actions · 2026-06-10T20:30:43Z

Selective E2E Results — ❌ Some jobs failed

Run: 27303964199
Target ref: 203536f53c014d81eb6b47f2906690bd80a24524
Workflow ref: main
Requested jobs: agent-turn-latency-e2e,cloud-e2e
Summary: 1 passed, 1 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	❌ failure
cloud-e2e	✅ success

Failed jobs: agent-turn-latency-e2e. Check run artifacts for logs.

Address remaining PR Review Advisor items on #5085. - jq nice-idea: every curl example in the runbook pipes to jq for readable parsing, but the prerequisites list only listed node and curl. Add jq alongside node + curl. - Provider-output acceptance worth-checking: the previous "Last live verification" entry pointed back at PR-body evidence rather than a durable checked-in artifact. Replace with a "Sanitized acceptance transcript" section that records the exact response shape we captured on 2026-06-09 for each of the three scenarios (baseline, kwarg-only, full preload). Future reviewers can compare new runs against the transcript instead of digging through the PR body, and the dated log below it tracks freshness of the live confirmation. Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

) Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

github-actions · 2026-06-10T20:42:55Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27304797639
Target ref: e98604975ad83fefd80bb40e123b0897d57905ef
Workflow ref: main
Requested jobs: agent-turn-latency-e2e,kimi-inference-compat-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	⚠️ cancelled
kimi-inference-compat-e2e	⚠️ cancelled

github-actions · 2026-06-10T20:45:01Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27305033204
Target ref: 026802ba8dca65d5c85858b3b0c5d72ead8b85b4
Workflow ref: main
Requested jobs: agent-turn-latency-e2e,cloud-inference-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	⚠️ cancelled
cloud-inference-e2e	⚠️ cancelled

github-actions · 2026-06-10T20:51:51Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27305151681
Target ref: 54ae58eddcb849c72ae82badf6d16a8757df1b7e
Workflow ref: main
Requested jobs: cloud-inference-e2e,kimi-inference-compat-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-inference-e2e	✅ success
kimi-inference-compat-e2e	✅ success

github-actions · 2026-06-10T21:26:02Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27306323784
Target ref: b64911fe2dfd1df9943857ccc32fefef84663df2
Workflow ref: main
Requested jobs: cloud-inference-e2e,agent-turn-latency-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	✅ success
cloud-inference-e2e	✅ success

@jyaunches

## Summary The "Selective E2E Results" comment posted by `.github/workflows/nightly-e2e.yaml` bucketed job results into `passed` / `failed` / `skipped` but never accounted for `cancelled`. A job ending in `cancelled` (e.g. when `cancel-in-progress` kills a stale run) slipped past all three tallies and fell through to the default `"✅ All requested jobs passed"` status, with the summary line reading `"0 passed, 0 failed, 0 skipped"` — masking that the run produced no signal at all. ## Repro (in the wild) PR #5085 commit `026802ba8`: ```text ### Selective E2E Results — ✅ All requested jobs passed Run: 27305033204 Requested jobs: agent-turn-latency-e2e,cloud-inference-e2e Summary: 0 passed, 0 failed, 0 skipped | Job | Result | |------------------------|---------------| | agent-turn-latency-e2e | ⚠️ cancelled | | cloud-inference-e2e | ⚠️ cancelled | ``` Both requested jobs were cancelled (cancel-in-progress superseding the older run) yet the headline read green. Same pattern earlier on #4610. ## Root `.github/workflows/nightly-e2e.yaml:2486-2495` (pre-fix): ```js const passed = ran.filter(([, v]) => v.result === 'success'); const failed = ran.filter(([, v]) => v.result === 'failure'); const skipped = reportedEntries.filter(([, v]) => v.result === 'skipped'); const status = failed.length > 0 || missingRequested.length > 0 ? '❌ Some jobs failed' : skipped.length > 0 && passed.length === 0 ? '⚠️ No requested jobs ran' : '✅ All requested jobs passed'; ``` A `cancelled` job is `!== 'success'`, `!== 'failure'`, `!== 'skipped'` — falls through to the default. ## Changes - Add a `cancelled` bucket derived from `ran` the same way the others are. - Insert a status branch between the failure case and the no-ran case: when cancelled jobs are present and nothing passed, surface `⚠️ Run cancelled — no signal` instead of falsely claiming success. - Include cancelled count in the summary line so the bucket is visible even when the other states are zero. Successful, failed, and skipped runs continue to render exactly as before — only the cancelled case changes from "false green" to "honest yellow." cc @jyaunches (flagged this earlier in slack)  ## Summary by CodeRabbit * **Tests** * Improved nightly e2e test reporting in PR comments to better distinguish cancelled jobs from other outcomes. PR comments now display cancelled job counts and provide clearer status messaging when test runs are cancelled.  Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

## Summary - Add v0.0.64 release notes from the release announcement and link them to the relevant deeper docs. - Document that custom policy presets recorded through `policy-add --from-file` and `--from-dir` survive snapshot restore and sandbox recreation. - Refresh generated NemoClaw user skills from the current source docs. ## Source summary - #5104 -> `docs/manage-sandboxes/backup-restore.mdx`, `docs/network-policy/customize-network-policy.mdx`: Documents custom policy presets preserved through snapshot restore. - #4955 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Brave web-search pinning and `BRAVE_API_KEY` placeholder preservation. - #5116, #5269 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Docker-driver gateway health and rootfs guard stability. - #5241, #5085 -> `docs/about/release-notes.mdx`: Adds release-note coverage for chat-completions provider selection and Nemotron Ultra 550B tool-less request compatibility. - #5268, #5210, #5257 -> `docs/about/release-notes.mdx`: Adds release-note coverage for messaging render plan refresh, OpenClaw scope-upgrade approval recovery, and Hermes WhatsApp bridge dependency setup. - Current source docs -> `.agents/skills/`: Regenerates user-skill references so agent-facing guidance matches the source documentation. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` - `npm run build:cli` - `npm run typecheck:cli` - Commit/pre-push hooks: markdownlint, gitleaks, docs-to-skills verification, TypeScript CLI, and skills YAML checks passed.  ## Summary by CodeRabbit * **Documentation** * Clarified sandbox snapshot restore preserves custom policy presets and restores them without original files. * Switched sandbox setup and remote deployment guidance to Docker-based workflows and emphasized remote onboarding flow. * Expanded troubleshooting for gateway recovery, Docker GPU/WSL issues, and onboarding resume. * Added/updated CLI docs: advanced maintenance, session export, upload/download wrappers, and status recovery guidance. * Added v0.0.64 release notes and links to NemoClaw Community; fixed command reference formatting.

cjagwani added bug Something fails against expected or documented behavior area: inference Inference routing, serving, model selection, or outputs platform: ubuntu Affects Ubuntu Linux environments v0.0.62 Release target labels Jun 9, 2026

cjagwani self-assigned this Jun 9, 2026

Merge branch 'main' into feat/ultra-550b-toolless-systemprompt-4851

068efc8

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread nemoclaw-blueprint/scripts/nemotron-inference-fix.js Outdated

cjagwani added 2 commits June 9, 2026 14:01

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

Merge branch 'main' into feat/ultra-550b-toolless-systemprompt-4851

c2fc436

cv added v0.0.63 Release target and removed v0.0.62 Release target labels Jun 10, 2026

ahunnargikar-nvidia linked an issue Jun 10, 2026 that may be closed by this pull request

[Ubuntu 24.04][Agent&Skills] Ultra 550B content omits intermediate steps when no tools configured — only final command returned #4851

Closed

cjagwani requested a review from cv June 10, 2026 19:43

cjagwani added 3 commits June 10, 2026 15:35

Merge branch 'main' into feat/ultra-550b-toolless-systemprompt-4851

e986049

docs(runbook): use 4-backtick outer fence to satisfy markdownlint (#4851

54ae58e

) Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

Merge branch 'main' into feat/ultra-550b-toolless-systemprompt-4851

b64911f

jyaunches added v0.0.64 Release target and removed v0.0.63 Release target labels Jun 11, 2026

cv approved these changes Jun 11, 2026

View reviewed changes

cv merged commit 3108a19 into main Jun 11, 2026
39 checks passed

cv deleted the feat/ultra-550b-toolless-systemprompt-4851 branch June 11, 2026 02:28

cjagwani mentioned this pull request Jun 11, 2026

fix(ci): bucket cancelled jobs in Selective E2E Results comment #5246

Merged

miyoungc mentioned this pull request Jun 12, 2026

docs: refresh v0.0.64 release docs #5358

Merged

Conversation

cjagwani commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Live verification (GCP Brev box, direct curl to integrate.api.nvidia.com)

Scope note (model-output runtime validation)

Verification checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 9, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 9, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 9, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 9, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

cv commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

jyaunches commented Jun 10, 2026

Local PR review follow-up

Blocking / action-required items

Already okay

Uh oh!

github-actions Bot commented Jun 10, 2026

cjagwani commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

Live verification (GCP Brev box, direct curl to `integrate.api.nvidia.com`)

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading