feat(inference): add Nemotron 3 Ultra build option by ericksoa · Pull Request #4769 · NVIDIA/NemoClaw

ericksoa · 2026-06-04T13:58:20Z

Summary

add nvidia/nemotron-3-ultra-550b-a55b as the second curated NVIDIA Endpoints model, directly below the default Nemotron 3 Super option
keep the default cloud model on nvidia/nemotron-3-super-120b-a12b
update docs and menu-index tests for the expanded Build model list

Build endpoint check

https://integrate.api.nvidia.com/v1/models returns nvidia/nemotron-3-ultra-550b-a55b with owned_by: nvidia
Build model card identifies it as Nemotron-3-Ultra-550B-A55B-NVFP4 with 550B total / 55B active parameters and up to 1M context

Tests

npm run build:cli
npx vitest run src/lib/inference/config.test.ts src/lib/inference/model-prompts.test.ts src/lib/inference/provider-models.test.ts test/onboard-selection.test.ts

Summary by CodeRabbit

New Features
- Added Nemotron 3 Ultra 550B to curated model options for NVIDIA Endpoints.
Documentation
- Updated onboarding and provider docs to list the new curated model.
Tests
- Updated selection tests to account for the new curated model.
- Added a new end-to-end agent turn latency test that measures latency and emits a structured report.
CI
- Added a nightly job to run the new latency E2E test and include results in nightly reports.

coderabbitai · 2026-06-04T13:58:33Z

Warning

Review limit reached

@cv, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 25 minutes and 34 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 280d1311-aeb8-4421-9c52-e035f5f24f4f

📥 Commits

Reviewing files that changed from the base of the PR and between a22adfc and 5932466.

📒 Files selected for processing (3)

.github/workflows/nightly-e2e.yaml
docs/inference/inference-options.mdx
test/onboard-selection.test.ts

📝 Walkthrough

Walkthrough

Adds Nemotron 3 Ultra 550B to curated cloud model options and docs; updates tests and onboarding selections for the new menu order; and introduces a nightly agent-turn-latency-e2e workflow plus a new end-to-end bash test that measures turn latency across OpenClaw and Hermes.

Changes

Nemotron 3 Ultra 550B curated model addition

Layer / File(s)	Summary
Model configuration and documentation `src/lib/inference/config.ts`, `docs/get-started/quickstart.mdx`, `docs/inference/inference-options.mdx`	`CLOUD_MODEL_OPTIONS` now includes `nvidia/nemotron-3-ultra-550b-a55b` with label "Nemotron 3 Ultra 550B". Quickstart and provider options docs updated to list the new curated model.
Configuration option test `src/lib/inference/config.test.ts`	Unit test updated to expect the new model id in curated cloud model options.
Selection and onboarding tests `src/lib/inference/model-prompts.test.ts`, `test/onboard-selection.test.ts`	Mocked prompt input and onboarding test fixture `answers` arrays adjusted to account for the added model shifting menu indices.

Nightly E2E job and agent-turn-latency script

Layer / File(s)	Summary
Nightly workflow job `.github/workflows/nightly-e2e.yaml`	Added `agent-turn-latency-e2e` to dispatch inputs, job list, job definition (reusable workflow invocation, script, timeout, artifacts), and downstream `needs` lists.
E2E script: header and helpers `test/e2e/test-agent-turn-latency-e2e.sh`	New end-to-end bash test: header/runtime setup, utilities, parsing and response extraction helpers, route retrieval and assertion logic.
E2E script: validators & health `test/e2e/test-agent-turn-latency-e2e.sh`	Validation probes for OpenClaw and Hermes configs, Hermes health polling, and sandbox teardown helpers.
E2E script: install & control flow `test/e2e/test-agent-turn-latency-e2e.sh`	Install runner, prerequisite checks, sandbox lifecycle orchestration, and main control flow executing OpenClaw and Hermes runs.
E2E script: runtime turns and results `test/e2e/test-agent-turn-latency-e2e.sh`	Execute OpenClaw and Hermes turns, assert reply contains `42`, measure per-runtime latency, serialize JSON results, and exit nonzero on failures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

feature, area: inference, E2E, area: e2e, CI/CD, documentation

Suggested reviewers

cv

Poem

🐰 A speedy model hops on stage,
Docs and tests shift by one page.
Nightly runs time every turn,
Latency logged for us to learn.
The rabbit cheers, “Let pipelines sing!”

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: adding Nemotron 3 Ultra 550B as a curated NVIDIA Endpoints model option throughout the codebase, documentation, and tests.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/nemotron-ultra-build-option

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-04T13:59:00Z

🌿 Preview your docs: https://nvidia-preview-pr-4769.docs.buildwithfern.com/nemoclaw

github-actions · 2026-06-04T14:00:43Z

PR Review Advisor

Findings: 1 needs attention, 1 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 0 still apply, 2 new items found

Review findings

🛠️ Needs attention

Do not run PR-controlled latency E2E with NVIDIA_API_KEY (.github/workflows/nightly-e2e.yaml:230): The new `agent-turn-latency-e2e` job is selective-dispatchable and passes `nvidia_api_key: true` into the reusable runner. That runner checks out `inputs.target_ref || github.ref` and runs the script from that checkout, so a workflow_dispatch/auto-dispatch targeting a PR head can execute PR-controlled `test/e2e/test-agent-turn-latency-e2e.sh` with the real NVIDIA_API_KEY in the environment. This expands the trusted-code boundary for a high-value inference credential.
- Recommendation: Withhold NVIDIA_API_KEY whenever `workflow_dispatch` supplies a non-empty `target_ref`, or run the latency harness from trusted main/workflow code while treating the PR checkout only as data under test. If this job must use secrets, gate it to trusted refs only and add a contract test for that boundary.
- Evidence: `agent-turn-latency-e2e` sets `ref: ${{ inputs.target_ref || github.ref }}`, `script: test/e2e/test-agent-turn-latency-e2e.sh`, and `nvidia_api_key: true`; `.github/workflows/e2e-script.yaml` then runs the checked-out repo script with `NVIDIA_API_KEY: ${{ inputs.nvidia_api_key && secrets.NVIDIA_API_KEY || '' }}`.

🔎 Worth checking

Redact latency E2E install logs before streaming and uploading artifacts (test/e2e/test-agent-turn-latency-e2e.sh:293): The new latency E2E captures install output to `/tmp/nemoclaw-e2e-*-turn-latency-install.log`, streams it with `tail -f`, prints the last 80 lines on failure, and the workflow uploads those logs on failure. GitHub log masking helps console output, but the artifact files themselves are not rewritten or redacted before upload.
- Recommendation: Add defense-in-depth redaction for literal NVIDIA_API_KEY values and common secret patterns such as `nvapi-...` and `Bearer ...` before printing log excerpts and before artifacts are uploaded, or avoid uploading logs that may contain credential-bearing installer/probe output.
- Evidence: `run_install` redirects `bash install.sh` to `$log_path`, tails it live, and prints `tail -80 "$log_path"` on failure; `.github/workflows/nightly-e2e.yaml` uploads both turn-latency install logs as failure artifacts.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — agent-turn-latency-e2e uses the Ultra model through `inference.local` for both OpenClaw and Hermes. Unit tests cover the curated model ordering and menu-index drift, but the PR also changes workflow dispatch and adds a real sandbox/inference E2E path where behavior and credential boundaries need runtime/contract validation.
**Runtime validation** — selective dispatch of agent-turn-latency-e2e reports the new job in report-to-pr and scorecard needs. Unit tests cover the curated model ordering and menu-index drift, but the PR also changes workflow dispatch and adds a real sandbox/inference E2E path where behavior and credential boundaries need runtime/contract validation.
**Runtime validation** — agent-turn-latency failure artifacts do not contain `NVIDIA_API_KEY`, `nvapi-`, or `Bearer` secrets. Unit tests cover the curated model ordering and menu-index drift, but the PR also changes workflow dispatch and adds a real sandbox/inference E2E path where behavior and credential boundaries need runtime/contract validation.
**Runtime validation** — target_ref dispatch for agent-turn-latency-e2e does not expose repository secrets to PR-controlled script code. Unit tests cover the curated model ordering and menu-index drift, but the PR also changes workflow dispatch and adds a real sandbox/inference E2E path where behavior and credential boundaries need runtime/contract validation.
**Acceptance clause:** `https://integrate.api.nvidia.com/v1/models\` returns `nvidia/nemotron-3-ultra-550b-a55b` with `owned_by: nvidia` — add test evidence or identify existing coverage. This is external endpoint evidence from the PR body; the repository diff does not independently prove the live API response, and this review did not execute network probes.
**Acceptance clause:** Build model card identifies it as Nemotron-3-Ultra-550B-A55B-NVFP4 with 550B total / 55B active parameters and up to 1M context — add test evidence or identify existing coverage. This is external model-card evidence from the PR body; the repository diff does not independently prove the model-card contents.

Since last review details

Current findings:

Do not run PR-controlled latency E2E with NVIDIA_API_KEY (.github/workflows/nightly-e2e.yaml:230): The new `agent-turn-latency-e2e` job is selective-dispatchable and passes `nvidia_api_key: true` into the reusable runner. That runner checks out `inputs.target_ref || github.ref` and runs the script from that checkout, so a workflow_dispatch/auto-dispatch targeting a PR head can execute PR-controlled `test/e2e/test-agent-turn-latency-e2e.sh` with the real NVIDIA_API_KEY in the environment. This expands the trusted-code boundary for a high-value inference credential.
- Recommendation: Withhold NVIDIA_API_KEY whenever `workflow_dispatch` supplies a non-empty `target_ref`, or run the latency harness from trusted main/workflow code while treating the PR checkout only as data under test. If this job must use secrets, gate it to trusted refs only and add a contract test for that boundary.
- Evidence: `agent-turn-latency-e2e` sets `ref: ${{ inputs.target_ref || github.ref }}`, `script: test/e2e/test-agent-turn-latency-e2e.sh`, and `nvidia_api_key: true`; `.github/workflows/e2e-script.yaml` then runs the checked-out repo script with `NVIDIA_API_KEY: ${{ inputs.nvidia_api_key && secrets.NVIDIA_API_KEY || '' }}`.
Redact latency E2E install logs before streaming and uploading artifacts (test/e2e/test-agent-turn-latency-e2e.sh:293): The new latency E2E captures install output to `/tmp/nemoclaw-e2e-*-turn-latency-install.log`, streams it with `tail -f`, prints the last 80 lines on failure, and the workflow uploads those logs on failure. GitHub log masking helps console output, but the artifact files themselves are not rewritten or redacted before upload.
- Recommendation: Add defense-in-depth redaction for literal NVIDIA_API_KEY values and common secret patterns such as `nvapi-...` and `Bearer ...` before printing log excerpts and before artifacts are uploaded, or avoid uploading logs that may contain credential-bearing installer/probe output.
- Evidence: `run_install` redirects `bash install.sh` to `$log_path`, tails it live, and prints `tail -80 "$log_path"` on failure; `.github/workflows/nightly-e2e.yaml` uploads both turn-latency install logs as failure artifacts.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

github-actions · 2026-06-04T14:02:19Z

E2E Advisor Recommendation

Required E2E: agent-turn-latency-e2e, cloud-onboard-e2e, inference-routing-e2e
Optional E2E: cloud-inference-e2e, openclaw-inference-switch-e2e, hermes-inference-switch-e2e, docs-validation-e2e

Dispatch hint: agent-turn-latency-e2e,cloud-onboard-e2e,inference-routing-e2e

Auto-dispatched E2E: cloud-onboard-e2e, inference-routing-e2e via nightly-e2e.yaml at 593246650ccc83f1ba2d4ee9dbf3a36cbb5e839e — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

agent-turn-latency-e2e (high; installs two sandboxes and performs live NVIDIA endpoint turns with a 120 minute job timeout): Directly validates the newly added E2E job and script, including OpenClaw and Hermes install/onboard, inference.local route/config assertions, and real NVIDIA Build turns through the new Ultra model.
cloud-onboard-e2e (medium-high; installs and onboards a cloud sandbox with live NVIDIA credentials): Covers the installer/onboarding path for NVIDIA Endpoints selection, sandbox creation, credential isolation, and inference.local setup, all adjacent to the changed provider model catalog.
inference-routing-e2e (medium; installs NemoClaw as needed and runs routing checks, with live-key portions gated by secrets): The touched inference config participates in routed provider setup; this job validates gateway inference routing, credential isolation, and provider/error handling paths around inference.local.

Optional E2E

cloud-inference-e2e (medium-high; live NVIDIA endpoint install and chat): Useful additional confidence that a freshly installed OpenClaw sandbox can complete live chat through inference.local to NVIDIA Endpoints after the catalog change, though agent-turn-latency-e2e already exercises live turns with the new model.
openclaw-inference-switch-e2e (medium-high; sandbox lifecycle plus live inference switching): Adjacent coverage for OpenClaw route/config reconciliation and live requests after inference model/provider changes.
hermes-inference-switch-e2e (medium-high; Hermes sandbox lifecycle plus live inference switching): Adjacent coverage for Hermes route/config reconciliation and live requests, relevant because the new latency E2E also asserts Hermes inference.local configuration.
docs-validation-e2e (low-medium; installs NemoClaw and runs docs validation): Documentation changed under docs/inference; this catches docs build/link/validation issues but is not the main runtime risk.

New E2E recommendations

onboarding curated NVIDIA Endpoints model selection (medium): Existing E2E coverage validates cloud onboarding and the new env-driven Ultra model turn, while unit tests cover menu indices. There does not appear to be an E2E that drives the interactive NVIDIA Endpoints model menu, selects the newly inserted curated option, and asserts the resulting route/config.
- Suggested test: Add an interactive/scripted onboarding E2E that selects NVIDIA Endpoints option 1 and curated model option 2 (Nemotron 3 Ultra 550B), then verifies openshell inference get and sandbox config use that model.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: agent-turn-latency-e2e,cloud-onboard-e2e,inference-routing-e2e

github-actions · 2026-06-04T14:02:20Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw, ubuntu-repo-cloud-hermes
Optional scenario E2E: macos-repo-cloud-openclaw, wsl-repo-cloud-openclaw

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: Core inference provider/model selection changed for NVIDIA Endpoints; this scenario exercises repo-current OpenClaw cloud onboarding and inference.local routing on the primary Ubuntu runner.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
ubuntu-repo-cloud-hermes: Core inference provider/model selection changed for NVIDIA Endpoints; this scenario exercises repo-current Hermes cloud onboarding and inference.local routing on the primary Ubuntu runner.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes

Optional scenario E2E

macos-repo-cloud-openclaw: Optional adjacent coverage for the same cloud OpenClaw onboarding surface on macOS; special-runner/platform coverage is not the primary target for this inference model list change.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=macos-repo-cloud-openclaw
wsl-repo-cloud-openclaw: Optional adjacent coverage for the same cloud OpenClaw onboarding surface on WSL; special-runner/platform coverage is not the primary target for this inference model list change.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw

Relevant changed files

src/lib/inference/config.ts

github-actions · 2026-06-04T14:07:57Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26956676989
Target ref: 4421c64aeaf6c789a3d504a444d648d7a0b15215
Workflow ref: main
Requested jobs: cloud-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
cloud-e2e	✅ success

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-agent-turn-latency-e2e.sh`:
- Around line 79-93: The parsing helper parse_chat_content is incorrectly
treating reasoning/reasoning_content as a successful answer; change it to only
accept choices[0].message.content as the valid response (i.e., use
c.get("content") or "" and remove fallbacks to reasoning_content/reasoning), so
that reasoning fields are not considered a final answer (apply the same change
to the other occurrences mentioned around the script sections at 388-397 and
410-426); keep the existing error handling unchanged and ensure the printed
output is the stripped content string only.
- Around line 112-127: The assert helpers (e.g., assert_route) call fail and
then use a bare "return" which returns success (0); change those to return a
non-zero status so failures propagate—replace the bare "return" after fail with
"return 1" (or "exit 1" if you intend to abort the whole script) in assert_route
and the other preflight helpers referenced (the blocks at 129-167, 169-237,
331-333, 384-386) so run_openclaw_turn/run_hermes_turn won't proceed when the
preflight checks fail.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b808ee2d-9ed5-4de6-b933-413433d2ab81

📥 Commits

Reviewing files that changed from the base of the PR and between 4421c64 and d0cf4d1.

📒 Files selected for processing (2)

.github/workflows/nightly-e2e.yaml
test/e2e/test-agent-turn-latency-e2e.sh

coderabbitai · 2026-06-04T14:27:55Z

+parse_chat_content() {
+  python3 -c '
+import json
+import sys
+
+try:
+    r = json.load(sys.stdin)
+    c = r["choices"][0]["message"]
+    content = c.get("content") or c.get("reasoning_content") or c.get("reasoning") or ""
+    print(content.strip())
+except Exception as exc:
+    print(f"PARSE_ERROR: {exc}", file=sys.stderr)
+    sys.exit(1)
+'
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't treat reasoning_content as a successful answer.

This can pass Hermes even when choices[0].message.content is empty, as long as the reasoning field happens to include 42. That weakens the lane and diverges from the existing OpenAI-compatible response check in src/lib/onboard/compatible-endpoint-smoke.ts:261-314, which only uses reasoning fields as a retry hint when the response stopped at finish_reason=length.

Suggested tightening

parse_chat_content() { python3 -c ' import json import sys try: - r = json.load(sys.stdin) - c = r["choices"][0]["message"] - content = c.get("content") or c.get("reasoning_content") or c.get("reasoning") or "" - print(content.strip()) + r = json.load(sys.stdin) + choice = r["choices"][0] + c = choice["message"] + content = c.get("content") + if not isinstance(content, str) or not content.strip(): + raise ValueError( + f"missing non-empty choices[0].message.content " + f"(finish_reason={choice.get(\"finish_reason\")!r})" + ) + print(content.strip()) except Exception as exc: print(f"PARSE_ERROR: {exc}", file=sys.stderr) sys.exit(1) ' }

Also applies to: 388-397, 410-426

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-agent-turn-latency-e2e.sh` around lines 79 - 93, The parsing helper parse_chat_content is incorrectly treating reasoning/reasoning_content as a successful answer; change it to only accept choices[0].message.content as the valid response (i.e., use c.get("content") or "" and remove fallbacks to reasoning_content/reasoning), so that reasoning fields are not considered a final answer (apply the same change to the other occurrences mentioned around the script sections at 388-397 and 410-426); keep the existing error handling unchanged and ensure the printed output is the stripped content string only.

coderabbitai · 2026-06-04T14:27:55Z

+assert_route() {
+  local label="$1"
+  local output plain_output
+  if ! output=$(get_route_output); then
+    fail "${label}: openshell inference get failed: ${output:0:240}"
+    return
+  fi
+  plain_output=$(printf '%s' "$output" | strip_ansi)
+
+  if grep -Fq "Provider: ${EXPECTED_ROUTE_PROVIDER}" <<<"$plain_output" \
+    && grep -Fq "Model: ${TURN_MODEL}" <<<"$plain_output"; then
+    pass "${label}: OpenShell route is ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}"
+  else
+    fail "${label}: route is not ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}: ${plain_output:0:400}"
+  fi
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the preflight asserts fail fast.

These helpers call fail and then return success, so run_openclaw_turn and run_hermes_turn still execute the expensive real turn after route/config/health checks already failed. That creates secondary noise and can measure the wrong target.

Suggested fix pattern

assert_route() { local label="$1" local output plain_output if ! output=$(get_route_output); then fail "${label}: openshell inference get failed: ${output:0:240}" - return + return 1 fi plain_output=$(printf '%s' "$output" | strip_ansi) if grep -Fq "Provider: ${EXPECTED_ROUTE_PROVIDER}" <<<"$plain_output" \ && grep -Fq "Model: ${TURN_MODEL}" <<<"$plain_output"; then pass "${label}: OpenShell route is ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}" else fail "${label}: route is not ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}: ${plain_output:0:400}" + return 1 fi } assert_openclaw_config() { ... ' <<<"$config" 2>&1) || { fail "OpenClaw config: expected Ultra model via inference.local: ${probe:0:400}" - return + return 1 } pass "OpenClaw config uses inference/${TURN_MODEL}" } run_openclaw_turn() { ... - assert_route "OpenClaw" - assert_openclaw_config "$sandbox" + assert_route "OpenClaw" || return + assert_openclaw_config "$sandbox" || return ... } run_hermes_turn() { ... - assert_route "Hermes" - assert_hermes_config "$sandbox" - assert_hermes_health "$sandbox" + assert_route "Hermes" || return + assert_hermes_config "$sandbox" || return + assert_hermes_health "$sandbox" || return ... }

Also applies to: 129-167, 169-237, 331-333, 384-386

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-agent-turn-latency-e2e.sh` around lines 112 - 127, The assert helpers (e.g., assert_route) call fail and then use a bare "return" which returns success (0); change those to return a non-zero status so failures propagate—replace the bare "return" after fail with "return 1" (or "exit 1" if you intend to abort the whole script) in assert_route and the other preflight helpers referenced (the blocks at 129-167, 169-237, 331-333, 384-386) so run_openclaw_turn/run_hermes_turn won't proceed when the preflight checks fail.

github-actions · 2026-06-04T14:30:48Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26957802252
Target ref: d0cf4d1d6fc4a3a794729368d489187b826554c2
Workflow ref: feat/nemotron-ultra-build-option
Requested jobs: agent-turn-latency-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
agent-turn-latency-e2e	✅ success

github-actions · 2026-06-04T18:38:10Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26971628360
Target ref: 1e65fda2874497830a3a706c4ec238b0a094633d
Workflow ref: main
Requested jobs: cloud-onboard-e2e,cloud-inference-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-inference-e2e	✅ success
cloud-onboard-e2e	✅ success

github-actions · 2026-06-04T20:49:47Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26978415946
Target ref: 8580f5335ef109315a7eafeae75feb85827c95e3
Workflow ref: main
Requested jobs: cloud-onboard-e2e,cloud-inference-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-inference-e2e	✅ success
cloud-onboard-e2e	✅ success

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

github-actions · 2026-06-04T21:01:13Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26978991651
Target ref: a22adfcf12cb7b553f5ac1ca0807501850edcbc5
Workflow ref: main
Requested jobs: cloud-onboard-e2e,cloud-inference-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-inference-e2e	✅ success
cloud-onboard-e2e	✅ success

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

github-actions · 2026-06-04T21:16:02Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26979734796
Target ref: 40acfcbffd4cc943b65b7bb383f0e16b5d10009d
Workflow ref: main
Requested jobs: cloud-onboard-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
cloud-onboard-e2e	✅ success

github-actions · 2026-06-04T21:26:42Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26980271170
Target ref: 593246650ccc83f1ba2d4ee9dbf3a36cbb5e839e
Workflow ref: main
Requested jobs: cloud-onboard-e2e,inference-routing-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-onboard-e2e	✅ success
inference-routing-e2e	✅ success

## Summary - Add the v0.0.59 release notes from the GitHub announcement discussion. - Refresh local inference and credential-storage guidance for the current release behavior. - Regenerate the user skills from the updated Fern docs. - Tighten release-prep and docs review guidance for generated skills, PR labels, and shared `$$nemoclaw` command placeholders. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" --glob '*.{md,mdx}'` - `git diff --check` - `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC permission failure) - `npm run typecheck:cli` - Pre-commit hooks during commit passed, including markdownlint, docs-to-skills verification, gitleaks, commitlint, and skills YAML tests. ## Source Summary - #3679, #4437, #4681, #4766, #4772, #4775, #4786 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`, `docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27 compatibility, runtime path pinning, plugin registry recovery, live gateway reconciliation, and clearer host-alias/startup diagnostics. - #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`, `docs/inference/inference-options.mdx`, `docs/inference/use-local-inference.mdx`, `docs/inference/switch-inference-providers.mdx`: Document the release inference changes covering Local NIM waits, Hermes Anthropic routing, Nemotron 3 Ultra, the current Ollama starter fallback, and Spark managed-vLLM context length. - #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`, `docs/security/credential-storage.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/troubleshooting.mdx`: Capture permission healing, gateway-stored credential reuse, cross-sandbox messaging credential conflict checks, and CDI preflight diagnostics. - #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`: Regenerate the user skill references from the updated source docs. - Follow-up maintenance -> `.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`, `.coderabbit.yaml`: Add release-prep area labels for docs and skills PRs, and teach docs review guidance that `$$nemoclaw` is the correct shared command placeholder for examples that work across agent aliases. Note: the `documentation` label was not present in the repository, so this PR is labeled with `v0.0.59` only.  ## Summary by CodeRabbit * **Documentation** * Updated default model for local Ollama inference setup to qwen3.5:9b * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option * Clarified credential storage and reuse behavior for post-deployment (day-two) operations * Added v0.0.59 release notes covering OpenClaw compatibility, inference options, Hermes messaging sync, and troubleshooting * Clarified CLI selection guidance and updated OpenClaw version example in status output * Revised release-prep instructions and docs review guidance for CLI alias usage

feat(inference): add Nemotron 3 Ultra build option

4421c64

ericksoa added the v0.0.58 Release target label Jun 4, 2026

ericksoa self-assigned this Jun 4, 2026

test(e2e): add real agent turn latency nightly

d0cf4d1

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

cv approved these changes Jun 4, 2026

View reviewed changes

Merge branch 'main' into feat/nemotron-ultra-build-option

1e65fda

cv enabled auto-merge (squash) June 4, 2026 18:28

Merge branch 'main' into feat/nemotron-ultra-build-option

8580f53

ci: fix nightly e2e workflow whitespace

a22adfc

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv added v0.0.59 Release target and removed v0.0.58 Release target labels Jun 4, 2026

merge: sync with main

40acfcb

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

merge: sync with main

5932466

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv merged commit f27dbbb into main Jun 4, 2026
29 checks passed

cv deleted the feat/nemotron-ultra-build-option branch June 4, 2026 21:24

miyoungc mentioned this pull request Jun 5, 2026

docs: refresh 0.0.59 release notes #4790

Merged

wscurran added area: docs Documentation, examples, guides, or docs build area: inference Inference routing, serving, model selection, or outputs feature PR adds or expands user-visible functionality labels Jun 5, 2026

Conversation

ericksoa commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Build endpoint check

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Reviewers

Assignees

Labels

Projects

ericksoa commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading