Skip to content

feat(inference): add Nemotron 3 Ultra build option#4769

Merged
cv merged 7 commits into
mainfrom
feat/nemotron-ultra-build-option
Jun 4, 2026
Merged

feat(inference): add Nemotron 3 Ultra build option#4769
cv merged 7 commits into
mainfrom
feat/nemotron-ultra-build-option

Conversation

@ericksoa

@ericksoa ericksoa commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add nvidia/nemotron-3-ultra-550b-a55b as the second curated NVIDIA Endpoints model, directly below the default Nemotron 3 Super option
  • keep the default cloud model on nvidia/nemotron-3-super-120b-a12b
  • update docs and menu-index tests for the expanded Build model list

Build endpoint check

  • https://integrate.api.nvidia.com/v1/models returns nvidia/nemotron-3-ultra-550b-a55b with owned_by: nvidia
  • Build model card identifies it as Nemotron-3-Ultra-550B-A55B-NVFP4 with 550B total / 55B active parameters and up to 1M context

Tests

  • npm run build:cli
  • npx vitest run src/lib/inference/config.test.ts src/lib/inference/model-prompts.test.ts src/lib/inference/provider-models.test.ts test/onboard-selection.test.ts

Summary by CodeRabbit

  • New Features

    • Added Nemotron 3 Ultra 550B to curated model options for NVIDIA Endpoints.
  • Documentation

    • Updated onboarding and provider docs to list the new curated model.
  • Tests

    • Updated selection tests to account for the new curated model.
    • Added a new end-to-end agent turn latency test that measures latency and emits a structured report.
  • CI

    • Added a nightly job to run the new latency E2E test and include results in nightly reports.

@ericksoa ericksoa added the v0.0.58 Release target label Jun 4, 2026
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@cv, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 25 minutes and 34 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 280d1311-aeb8-4421-9c52-e035f5f24f4f

📥 Commits

Reviewing files that changed from the base of the PR and between a22adfc and 5932466.

📒 Files selected for processing (3)
  • .github/workflows/nightly-e2e.yaml
  • docs/inference/inference-options.mdx
  • test/onboard-selection.test.ts
📝 Walkthrough

Walkthrough

Adds Nemotron 3 Ultra 550B to curated cloud model options and docs; updates tests and onboarding selections for the new menu order; and introduces a nightly agent-turn-latency-e2e workflow plus a new end-to-end bash test that measures turn latency across OpenClaw and Hermes.

Changes

Nemotron 3 Ultra 550B curated model addition

Layer / File(s) Summary
Model configuration and documentation
src/lib/inference/config.ts, docs/get-started/quickstart.mdx, docs/inference/inference-options.mdx
CLOUD_MODEL_OPTIONS now includes nvidia/nemotron-3-ultra-550b-a55b with label "Nemotron 3 Ultra 550B". Quickstart and provider options docs updated to list the new curated model.
Configuration option test
src/lib/inference/config.test.ts
Unit test updated to expect the new model id in curated cloud model options.
Selection and onboarding tests
src/lib/inference/model-prompts.test.ts, test/onboard-selection.test.ts
Mocked prompt input and onboarding test fixture answers arrays adjusted to account for the added model shifting menu indices.

Nightly E2E job and agent-turn-latency script

Layer / File(s) Summary
Nightly workflow job
.github/workflows/nightly-e2e.yaml
Added agent-turn-latency-e2e to dispatch inputs, job list, job definition (reusable workflow invocation, script, timeout, artifacts), and downstream needs lists.
E2E script: header and helpers
test/e2e/test-agent-turn-latency-e2e.sh
New end-to-end bash test: header/runtime setup, utilities, parsing and response extraction helpers, route retrieval and assertion logic.
E2E script: validators & health
test/e2e/test-agent-turn-latency-e2e.sh
Validation probes for OpenClaw and Hermes configs, Hermes health polling, and sandbox teardown helpers.
E2E script: install & control flow
test/e2e/test-agent-turn-latency-e2e.sh
Install runner, prerequisite checks, sandbox lifecycle orchestration, and main control flow executing OpenClaw and Hermes runs.
E2E script: runtime turns and results
test/e2e/test-agent-turn-latency-e2e.sh
Execute OpenClaw and Hermes turns, assert reply contains 42, measure per-runtime latency, serialize JSON results, and exit nonzero on failures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

feature, area: inference, E2E, area: e2e, CI/CD, documentation

Suggested reviewers

  • cv

Poem

🐰 A speedy model hops on stage,
Docs and tests shift by one page.
Nightly runs time every turn,
Latency logged for us to learn.
The rabbit cheers, “Let pipelines sing!”

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding Nemotron 3 Ultra 550B as a curated NVIDIA Endpoints model option throughout the codebase, documentation, and tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/nemotron-ultra-build-option

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@ericksoa ericksoa self-assigned this Jun 4, 2026
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 1 needs attention, 1 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 0 still apply, 2 new items found

Review findings

🛠️ Needs attention

  • Do not run PR-controlled latency E2E with NVIDIA_API_KEY (.github/workflows/nightly-e2e.yaml:230): The new `agent-turn-latency-e2e` job is selective-dispatchable and passes `nvidia_api_key: true` into the reusable runner. That runner checks out `inputs.target_ref || github.ref` and runs the script from that checkout, so a workflow_dispatch/auto-dispatch targeting a PR head can execute PR-controlled `test/e2e/test-agent-turn-latency-e2e.sh` with the real NVIDIA_API_KEY in the environment. This expands the trusted-code boundary for a high-value inference credential.
    • Recommendation: Withhold NVIDIA_API_KEY whenever `workflow_dispatch` supplies a non-empty `target_ref`, or run the latency harness from trusted main/workflow code while treating the PR checkout only as data under test. If this job must use secrets, gate it to trusted refs only and add a contract test for that boundary.
    • Evidence: `agent-turn-latency-e2e` sets `ref: ${{ inputs.target_ref || github.ref }}`, `script: test/e2e/test-agent-turn-latency-e2e.sh`, and `nvidia_api_key: true`; `.github/workflows/e2e-script.yaml` then runs the checked-out repo script with `NVIDIA_API_KEY: ${{ inputs.nvidia_api_key && secrets.NVIDIA_API_KEY || '' }}`.

🔎 Worth checking

  • Redact latency E2E install logs before streaming and uploading artifacts (test/e2e/test-agent-turn-latency-e2e.sh:293): The new latency E2E captures install output to `/tmp/nemoclaw-e2e-*-turn-latency-install.log`, streams it with `tail -f`, prints the last 80 lines on failure, and the workflow uploads those logs on failure. GitHub log masking helps console output, but the artifact files themselves are not rewritten or redacted before upload.
    • Recommendation: Add defense-in-depth redaction for literal NVIDIA_API_KEY values and common secret patterns such as `nvapi-...` and `Bearer ...` before printing log excerpts and before artifacts are uploaded, or avoid uploading logs that may contain credential-bearing installer/probe output.
    • Evidence: `run_install` redirects `bash install.sh` to `$log_path`, tails it live, and prints `tail -80 "$log_path"` on failure; `.github/workflows/nightly-e2e.yaml` uploads both turn-latency install logs as failure artifacts.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — agent-turn-latency-e2e uses the Ultra model through `inference.local` for both OpenClaw and Hermes. Unit tests cover the curated model ordering and menu-index drift, but the PR also changes workflow dispatch and adds a real sandbox/inference E2E path where behavior and credential boundaries need runtime/contract validation.
  • **Runtime validation** — selective dispatch of agent-turn-latency-e2e reports the new job in report-to-pr and scorecard needs. Unit tests cover the curated model ordering and menu-index drift, but the PR also changes workflow dispatch and adds a real sandbox/inference E2E path where behavior and credential boundaries need runtime/contract validation.
  • **Runtime validation** — agent-turn-latency failure artifacts do not contain `NVIDIA_API_KEY`, `nvapi-`, or `Bearer` secrets. Unit tests cover the curated model ordering and menu-index drift, but the PR also changes workflow dispatch and adds a real sandbox/inference E2E path where behavior and credential boundaries need runtime/contract validation.
  • **Runtime validation** — target_ref dispatch for agent-turn-latency-e2e does not expose repository secrets to PR-controlled script code. Unit tests cover the curated model ordering and menu-index drift, but the PR also changes workflow dispatch and adds a real sandbox/inference E2E path where behavior and credential boundaries need runtime/contract validation.
  • **Acceptance clause:** `https://integrate.api.nvidia.com/v1/models\` returns `nvidia/nemotron-3-ultra-550b-a55b` with `owned_by: nvidia` — add test evidence or identify existing coverage. This is external endpoint evidence from the PR body; the repository diff does not independently prove the live API response, and this review did not execute network probes.
  • **Acceptance clause:** Build model card identifies it as Nemotron-3-Ultra-550B-A55B-NVFP4 with 550B total / 55B active parameters and up to 1M context — add test evidence or identify existing coverage. This is external model-card evidence from the PR body; the repository diff does not independently prove the model-card contents.
Since last review details

Current findings:

  • Do not run PR-controlled latency E2E with NVIDIA_API_KEY (.github/workflows/nightly-e2e.yaml:230): The new `agent-turn-latency-e2e` job is selective-dispatchable and passes `nvidia_api_key: true` into the reusable runner. That runner checks out `inputs.target_ref || github.ref` and runs the script from that checkout, so a workflow_dispatch/auto-dispatch targeting a PR head can execute PR-controlled `test/e2e/test-agent-turn-latency-e2e.sh` with the real NVIDIA_API_KEY in the environment. This expands the trusted-code boundary for a high-value inference credential.
    • Recommendation: Withhold NVIDIA_API_KEY whenever `workflow_dispatch` supplies a non-empty `target_ref`, or run the latency harness from trusted main/workflow code while treating the PR checkout only as data under test. If this job must use secrets, gate it to trusted refs only and add a contract test for that boundary.
    • Evidence: `agent-turn-latency-e2e` sets `ref: ${{ inputs.target_ref || github.ref }}`, `script: test/e2e/test-agent-turn-latency-e2e.sh`, and `nvidia_api_key: true`; `.github/workflows/e2e-script.yaml` then runs the checked-out repo script with `NVIDIA_API_KEY: ${{ inputs.nvidia_api_key && secrets.NVIDIA_API_KEY || '' }}`.
  • Redact latency E2E install logs before streaming and uploading artifacts (test/e2e/test-agent-turn-latency-e2e.sh:293): The new latency E2E captures install output to `/tmp/nemoclaw-e2e-*-turn-latency-install.log`, streams it with `tail -f`, prints the last 80 lines on failure, and the workflow uploads those logs on failure. GitHub log masking helps console output, but the artifact files themselves are not rewritten or redacted before upload.
    • Recommendation: Add defense-in-depth redaction for literal NVIDIA_API_KEY values and common secret patterns such as `nvapi-...` and `Bearer ...` before printing log excerpts and before artifacts are uploaded, or avoid uploading logs that may contain credential-bearing installer/probe output.
    • Evidence: `run_install` redirects `bash install.sh` to `$log_path`, tails it live, and prints `tail -80 "$log_path"` on failure; `.github/workflows/nightly-e2e.yaml` uploads both turn-latency install logs as failure artifacts.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: agent-turn-latency-e2e, cloud-onboard-e2e, inference-routing-e2e
Optional E2E: cloud-inference-e2e, openclaw-inference-switch-e2e, hermes-inference-switch-e2e, docs-validation-e2e

Dispatch hint: agent-turn-latency-e2e,cloud-onboard-e2e,inference-routing-e2e

Auto-dispatched E2E: cloud-onboard-e2e, inference-routing-e2e via nightly-e2e.yaml at 593246650ccc83f1ba2d4ee9dbf3a36cbb5e839enightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • agent-turn-latency-e2e (high; installs two sandboxes and performs live NVIDIA endpoint turns with a 120 minute job timeout): Directly validates the newly added E2E job and script, including OpenClaw and Hermes install/onboard, inference.local route/config assertions, and real NVIDIA Build turns through the new Ultra model.
  • cloud-onboard-e2e (medium-high; installs and onboards a cloud sandbox with live NVIDIA credentials): Covers the installer/onboarding path for NVIDIA Endpoints selection, sandbox creation, credential isolation, and inference.local setup, all adjacent to the changed provider model catalog.
  • inference-routing-e2e (medium; installs NemoClaw as needed and runs routing checks, with live-key portions gated by secrets): The touched inference config participates in routed provider setup; this job validates gateway inference routing, credential isolation, and provider/error handling paths around inference.local.

Optional E2E

  • cloud-inference-e2e (medium-high; live NVIDIA endpoint install and chat): Useful additional confidence that a freshly installed OpenClaw sandbox can complete live chat through inference.local to NVIDIA Endpoints after the catalog change, though agent-turn-latency-e2e already exercises live turns with the new model.
  • openclaw-inference-switch-e2e (medium-high; sandbox lifecycle plus live inference switching): Adjacent coverage for OpenClaw route/config reconciliation and live requests after inference model/provider changes.
  • hermes-inference-switch-e2e (medium-high; Hermes sandbox lifecycle plus live inference switching): Adjacent coverage for Hermes route/config reconciliation and live requests, relevant because the new latency E2E also asserts Hermes inference.local configuration.
  • docs-validation-e2e (low-medium; installs NemoClaw and runs docs validation): Documentation changed under docs/inference; this catches docs build/link/validation issues but is not the main runtime risk.

New E2E recommendations

  • onboarding curated NVIDIA Endpoints model selection (medium): Existing E2E coverage validates cloud onboarding and the new env-driven Ultra model turn, while unit tests cover menu indices. There does not appear to be an E2E that drives the interactive NVIDIA Endpoints model menu, selects the newly inserted curated option, and asserts the resulting route/config.
    • Suggested test: Add an interactive/scripted onboarding E2E that selects NVIDIA Endpoints option 1 and curated model option 2 (Nemotron 3 Ultra 550B), then verifies openshell inference get and sandbox config use that model.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: agent-turn-latency-e2e,cloud-onboard-e2e,inference-routing-e2e

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw, ubuntu-repo-cloud-hermes
Optional scenario E2E: macos-repo-cloud-openclaw, wsl-repo-cloud-openclaw

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • ubuntu-repo-cloud-openclaw: Core inference provider/model selection changed for NVIDIA Endpoints; this scenario exercises repo-current OpenClaw cloud onboarding and inference.local routing on the primary Ubuntu runner.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
  • ubuntu-repo-cloud-hermes: Core inference provider/model selection changed for NVIDIA Endpoints; this scenario exercises repo-current Hermes cloud onboarding and inference.local routing on the primary Ubuntu runner.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes

Optional scenario E2E

  • macos-repo-cloud-openclaw: Optional adjacent coverage for the same cloud OpenClaw onboarding surface on macOS; special-runner/platform coverage is not the primary target for this inference model list change.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=macos-repo-cloud-openclaw
  • wsl-repo-cloud-openclaw: Optional adjacent coverage for the same cloud OpenClaw onboarding surface on WSL; special-runner/platform coverage is not the primary target for this inference model list change.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw

Relevant changed files

  • src/lib/inference/config.ts

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26956676989
Target ref: 4421c64aeaf6c789a3d504a444d648d7a0b15215
Workflow ref: main
Requested jobs: cloud-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
cloud-e2e ✅ success

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-agent-turn-latency-e2e.sh`:
- Around line 79-93: The parsing helper parse_chat_content is incorrectly
treating reasoning/reasoning_content as a successful answer; change it to only
accept choices[0].message.content as the valid response (i.e., use
c.get("content") or "" and remove fallbacks to reasoning_content/reasoning), so
that reasoning fields are not considered a final answer (apply the same change
to the other occurrences mentioned around the script sections at 388-397 and
410-426); keep the existing error handling unchanged and ensure the printed
output is the stripped content string only.
- Around line 112-127: The assert helpers (e.g., assert_route) call fail and
then use a bare "return" which returns success (0); change those to return a
non-zero status so failures propagate—replace the bare "return" after fail with
"return 1" (or "exit 1" if you intend to abort the whole script) in assert_route
and the other preflight helpers referenced (the blocks at 129-167, 169-237,
331-333, 384-386) so run_openclaw_turn/run_hermes_turn won't proceed when the
preflight checks fail.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b808ee2d-9ed5-4de6-b933-413433d2ab81

📥 Commits

Reviewing files that changed from the base of the PR and between 4421c64 and d0cf4d1.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-agent-turn-latency-e2e.sh

Comment on lines +79 to +93
parse_chat_content() {
python3 -c '
import json
import sys

try:
r = json.load(sys.stdin)
c = r["choices"][0]["message"]
content = c.get("content") or c.get("reasoning_content") or c.get("reasoning") or ""
print(content.strip())
except Exception as exc:
print(f"PARSE_ERROR: {exc}", file=sys.stderr)
sys.exit(1)
'
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't treat reasoning_content as a successful answer.

This can pass Hermes even when choices[0].message.content is empty, as long as the reasoning field happens to include 42. That weakens the lane and diverges from the existing OpenAI-compatible response check in src/lib/onboard/compatible-endpoint-smoke.ts:261-314, which only uses reasoning fields as a retry hint when the response stopped at finish_reason=length.

Suggested tightening
 parse_chat_content() {
   python3 -c '
 import json
 import sys
 
 try:
-    r = json.load(sys.stdin)
-    c = r["choices"][0]["message"]
-    content = c.get("content") or c.get("reasoning_content") or c.get("reasoning") or ""
-    print(content.strip())
+    r = json.load(sys.stdin)
+    choice = r["choices"][0]
+    c = choice["message"]
+    content = c.get("content")
+    if not isinstance(content, str) or not content.strip():
+        raise ValueError(
+            f"missing non-empty choices[0].message.content "
+            f"(finish_reason={choice.get(\"finish_reason\")!r})"
+        )
+    print(content.strip())
 except Exception as exc:
     print(f"PARSE_ERROR: {exc}", file=sys.stderr)
     sys.exit(1)
 '
 }

Also applies to: 388-397, 410-426

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-agent-turn-latency-e2e.sh` around lines 79 - 93, The parsing
helper parse_chat_content is incorrectly treating reasoning/reasoning_content as
a successful answer; change it to only accept choices[0].message.content as the
valid response (i.e., use c.get("content") or "" and remove fallbacks to
reasoning_content/reasoning), so that reasoning fields are not considered a
final answer (apply the same change to the other occurrences mentioned around
the script sections at 388-397 and 410-426); keep the existing error handling
unchanged and ensure the printed output is the stripped content string only.

Comment on lines +112 to +127
assert_route() {
local label="$1"
local output plain_output
if ! output=$(get_route_output); then
fail "${label}: openshell inference get failed: ${output:0:240}"
return
fi
plain_output=$(printf '%s' "$output" | strip_ansi)

if grep -Fq "Provider: ${EXPECTED_ROUTE_PROVIDER}" <<<"$plain_output" \
&& grep -Fq "Model: ${TURN_MODEL}" <<<"$plain_output"; then
pass "${label}: OpenShell route is ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}"
else
fail "${label}: route is not ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}: ${plain_output:0:400}"
fi
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the preflight asserts fail fast.

These helpers call fail and then return success, so run_openclaw_turn and run_hermes_turn still execute the expensive real turn after route/config/health checks already failed. That creates secondary noise and can measure the wrong target.

Suggested fix pattern
 assert_route() {
   local label="$1"
   local output plain_output
   if ! output=$(get_route_output); then
     fail "${label}: openshell inference get failed: ${output:0:240}"
-    return
+    return 1
   fi
   plain_output=$(printf '%s' "$output" | strip_ansi)
 
   if grep -Fq "Provider: ${EXPECTED_ROUTE_PROVIDER}" <<<"$plain_output" \
     && grep -Fq "Model: ${TURN_MODEL}" <<<"$plain_output"; then
     pass "${label}: OpenShell route is ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}"
   else
     fail "${label}: route is not ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}: ${plain_output:0:400}"
+    return 1
   fi
 }
 
 assert_openclaw_config() {
   ...
 ' <<<"$config" 2>&1) || {
     fail "OpenClaw config: expected Ultra model via inference.local: ${probe:0:400}"
-    return
+    return 1
   }
   pass "OpenClaw config uses inference/${TURN_MODEL}"
 }
 
 run_openclaw_turn() {
   ...
-  assert_route "OpenClaw"
-  assert_openclaw_config "$sandbox"
+  assert_route "OpenClaw" || return
+  assert_openclaw_config "$sandbox" || return
   ...
 }
 
 run_hermes_turn() {
   ...
-  assert_route "Hermes"
-  assert_hermes_config "$sandbox"
-  assert_hermes_health "$sandbox"
+  assert_route "Hermes" || return
+  assert_hermes_config "$sandbox" || return
+  assert_hermes_health "$sandbox" || return
   ...
 }

Also applies to: 129-167, 169-237, 331-333, 384-386

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-agent-turn-latency-e2e.sh` around lines 112 - 127, The assert
helpers (e.g., assert_route) call fail and then use a bare "return" which
returns success (0); change those to return a non-zero status so failures
propagate—replace the bare "return" after fail with "return 1" (or "exit 1" if
you intend to abort the whole script) in assert_route and the other preflight
helpers referenced (the blocks at 129-167, 169-237, 331-333, 384-386) so
run_openclaw_turn/run_hermes_turn won't proceed when the preflight checks fail.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26957802252
Target ref: d0cf4d1d6fc4a3a794729368d489187b826554c2
Workflow ref: feat/nemotron-ultra-build-option
Requested jobs: agent-turn-latency-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
agent-turn-latency-e2e ✅ success

@cv cv enabled auto-merge (squash) June 4, 2026 18:28
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26971628360
Target ref: 1e65fda2874497830a3a706c4ec238b0a094633d
Workflow ref: main
Requested jobs: cloud-onboard-e2e,cloud-inference-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-inference-e2e ✅ success
cloud-onboard-e2e ✅ success

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26978415946
Target ref: 8580f5335ef109315a7eafeae75feb85827c95e3
Workflow ref: main
Requested jobs: cloud-onboard-e2e,cloud-inference-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-inference-e2e ✅ success
cloud-onboard-e2e ✅ success

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv added v0.0.59 Release target and removed v0.0.58 Release target labels Jun 4, 2026
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26978991651
Target ref: a22adfcf12cb7b553f5ac1ca0807501850edcbc5
Workflow ref: main
Requested jobs: cloud-onboard-e2e,cloud-inference-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-inference-e2e ✅ success
cloud-onboard-e2e ✅ success

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26979734796
Target ref: 40acfcbffd4cc943b65b7bb383f0e16b5d10009d
Workflow ref: main
Requested jobs: cloud-onboard-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success

@cv cv merged commit f27dbbb into main Jun 4, 2026
29 checks passed
@cv cv deleted the feat/nemotron-ultra-build-option branch June 4, 2026 21:24
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26980271170
Target ref: 593246650ccc83f1ba2d4ee9dbf3a36cbb5e839e
Workflow ref: main
Requested jobs: cloud-onboard-e2e,inference-routing-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
inference-routing-e2e ✅ success

cv pushed a commit that referenced this pull request Jun 5, 2026
## Summary
- Add the v0.0.59 release notes from the GitHub announcement discussion.
- Refresh local inference and credential-storage guidance for the
current release behavior.
- Regenerate the user skills from the updated Fern docs.
- Tighten release-prep and docs review guidance for generated skills, PR
labels, and shared `$$nemoclaw` command placeholders.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" --glob '*.{md,mdx}'`
- `git diff --check`
- `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC
permission failure)
- `npm run typecheck:cli`
- Pre-commit hooks during commit passed, including markdownlint,
docs-to-skills verification, gitleaks, commitlint, and skills YAML
tests.

## Source Summary
- #3679, #4437, #4681, #4766, #4772, #4775, #4786 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`,
`docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27
compatibility, runtime path pinning, plugin registry recovery, live
gateway reconciliation, and clearer host-alias/startup diagnostics.
- #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`,
`docs/inference/inference-options.mdx`,
`docs/inference/use-local-inference.mdx`,
`docs/inference/switch-inference-providers.mdx`: Document the release
inference changes covering Local NIM waits, Hermes Anthropic routing,
Nemotron 3 Ultra, the current Ollama starter fallback, and Spark
managed-vLLM context length.
- #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`,
`docs/security/credential-storage.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`: Capture permission healing,
gateway-stored credential reuse, cross-sandbox messaging credential
conflict checks, and CDI preflight diagnostics.
- #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`:
Regenerate the user skill references from the updated source docs.
- Follow-up maintenance ->
`.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`,
`.coderabbit.yaml`: Add release-prep area labels for docs and skills
PRs, and teach docs review guidance that `$$nemoclaw` is the correct
shared command placeholder for examples that work across agent aliases.

Note: the `documentation` label was not present in the repository, so
this PR is labeled with `v0.0.59` only.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
  * Updated default model for local Ollama inference setup to qwen3.5:9b
  * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option
* Clarified credential storage and reuse behavior for post-deployment
(day-two) operations
* Added v0.0.59 release notes covering OpenClaw compatibility, inference
options, Hermes messaging sync, and troubleshooting
* Clarified CLI selection guidance and updated OpenClaw version example
in status output
* Revised release-prep instructions and docs review guidance for CLI
alias usage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added area: docs Documentation, examples, guides, or docs build area: inference Inference routing, serving, model selection, or outputs feature PR adds or expands user-visible functionality labels Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: docs Documentation, examples, guides, or docs build area: inference Inference routing, serving, model selection, or outputs feature PR adds or expands user-visible functionality v0.0.59 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants