feat(inference): add Nemotron 3 Ultra build option#4769
Conversation
|
Warning Review limit reached
More reviews will be available in 25 minutes and 34 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds Nemotron 3 Ultra 550B to curated cloud model options and docs; updates tests and onboarding selections for the new menu order; and introduces a nightly ChangesNemotron 3 Ultra 550B curated model addition
Nightly E2E job and agent-turn-latency script
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
🌿 Preview your docs: https://nvidia-preview-pr-4769.docs.buildwithfern.com/nemoclaw |
PR Review AdvisorFindings: 1 needs attention, 1 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
Selective E2E Results — ✅ All requested jobs passedRun: 26956676989
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/test-agent-turn-latency-e2e.sh`:
- Around line 79-93: The parsing helper parse_chat_content is incorrectly
treating reasoning/reasoning_content as a successful answer; change it to only
accept choices[0].message.content as the valid response (i.e., use
c.get("content") or "" and remove fallbacks to reasoning_content/reasoning), so
that reasoning fields are not considered a final answer (apply the same change
to the other occurrences mentioned around the script sections at 388-397 and
410-426); keep the existing error handling unchanged and ensure the printed
output is the stripped content string only.
- Around line 112-127: The assert helpers (e.g., assert_route) call fail and
then use a bare "return" which returns success (0); change those to return a
non-zero status so failures propagate—replace the bare "return" after fail with
"return 1" (or "exit 1" if you intend to abort the whole script) in assert_route
and the other preflight helpers referenced (the blocks at 129-167, 169-237,
331-333, 384-386) so run_openclaw_turn/run_hermes_turn won't proceed when the
preflight checks fail.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b808ee2d-9ed5-4de6-b933-413433d2ab81
📒 Files selected for processing (2)
.github/workflows/nightly-e2e.yamltest/e2e/test-agent-turn-latency-e2e.sh
| parse_chat_content() { | ||
| python3 -c ' | ||
| import json | ||
| import sys | ||
|
|
||
| try: | ||
| r = json.load(sys.stdin) | ||
| c = r["choices"][0]["message"] | ||
| content = c.get("content") or c.get("reasoning_content") or c.get("reasoning") or "" | ||
| print(content.strip()) | ||
| except Exception as exc: | ||
| print(f"PARSE_ERROR: {exc}", file=sys.stderr) | ||
| sys.exit(1) | ||
| ' | ||
| } |
There was a problem hiding this comment.
Don't treat reasoning_content as a successful answer.
This can pass Hermes even when choices[0].message.content is empty, as long as the reasoning field happens to include 42. That weakens the lane and diverges from the existing OpenAI-compatible response check in src/lib/onboard/compatible-endpoint-smoke.ts:261-314, which only uses reasoning fields as a retry hint when the response stopped at finish_reason=length.
Suggested tightening
parse_chat_content() {
python3 -c '
import json
import sys
try:
- r = json.load(sys.stdin)
- c = r["choices"][0]["message"]
- content = c.get("content") or c.get("reasoning_content") or c.get("reasoning") or ""
- print(content.strip())
+ r = json.load(sys.stdin)
+ choice = r["choices"][0]
+ c = choice["message"]
+ content = c.get("content")
+ if not isinstance(content, str) or not content.strip():
+ raise ValueError(
+ f"missing non-empty choices[0].message.content "
+ f"(finish_reason={choice.get(\"finish_reason\")!r})"
+ )
+ print(content.strip())
except Exception as exc:
print(f"PARSE_ERROR: {exc}", file=sys.stderr)
sys.exit(1)
'
}Also applies to: 388-397, 410-426
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/test-agent-turn-latency-e2e.sh` around lines 79 - 93, The parsing
helper parse_chat_content is incorrectly treating reasoning/reasoning_content as
a successful answer; change it to only accept choices[0].message.content as the
valid response (i.e., use c.get("content") or "" and remove fallbacks to
reasoning_content/reasoning), so that reasoning fields are not considered a
final answer (apply the same change to the other occurrences mentioned around
the script sections at 388-397 and 410-426); keep the existing error handling
unchanged and ensure the printed output is the stripped content string only.
| assert_route() { | ||
| local label="$1" | ||
| local output plain_output | ||
| if ! output=$(get_route_output); then | ||
| fail "${label}: openshell inference get failed: ${output:0:240}" | ||
| return | ||
| fi | ||
| plain_output=$(printf '%s' "$output" | strip_ansi) | ||
|
|
||
| if grep -Fq "Provider: ${EXPECTED_ROUTE_PROVIDER}" <<<"$plain_output" \ | ||
| && grep -Fq "Model: ${TURN_MODEL}" <<<"$plain_output"; then | ||
| pass "${label}: OpenShell route is ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}" | ||
| else | ||
| fail "${label}: route is not ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}: ${plain_output:0:400}" | ||
| fi | ||
| } |
There was a problem hiding this comment.
Make the preflight asserts fail fast.
These helpers call fail and then return success, so run_openclaw_turn and run_hermes_turn still execute the expensive real turn after route/config/health checks already failed. That creates secondary noise and can measure the wrong target.
Suggested fix pattern
assert_route() {
local label="$1"
local output plain_output
if ! output=$(get_route_output); then
fail "${label}: openshell inference get failed: ${output:0:240}"
- return
+ return 1
fi
plain_output=$(printf '%s' "$output" | strip_ansi)
if grep -Fq "Provider: ${EXPECTED_ROUTE_PROVIDER}" <<<"$plain_output" \
&& grep -Fq "Model: ${TURN_MODEL}" <<<"$plain_output"; then
pass "${label}: OpenShell route is ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}"
else
fail "${label}: route is not ${EXPECTED_ROUTE_PROVIDER} / ${TURN_MODEL}: ${plain_output:0:400}"
+ return 1
fi
}
assert_openclaw_config() {
...
' <<<"$config" 2>&1) || {
fail "OpenClaw config: expected Ultra model via inference.local: ${probe:0:400}"
- return
+ return 1
}
pass "OpenClaw config uses inference/${TURN_MODEL}"
}
run_openclaw_turn() {
...
- assert_route "OpenClaw"
- assert_openclaw_config "$sandbox"
+ assert_route "OpenClaw" || return
+ assert_openclaw_config "$sandbox" || return
...
}
run_hermes_turn() {
...
- assert_route "Hermes"
- assert_hermes_config "$sandbox"
- assert_hermes_health "$sandbox"
+ assert_route "Hermes" || return
+ assert_hermes_config "$sandbox" || return
+ assert_hermes_health "$sandbox" || return
...
}Also applies to: 129-167, 169-237, 331-333, 384-386
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/test-agent-turn-latency-e2e.sh` around lines 112 - 127, The assert
helpers (e.g., assert_route) call fail and then use a bare "return" which
returns success (0); change those to return a non-zero status so failures
propagate—replace the bare "return" after fail with "return 1" (or "exit 1" if
you intend to abort the whole script) in assert_route and the other preflight
helpers referenced (the blocks at 129-167, 169-237, 331-333, 384-386) so
run_openclaw_turn/run_hermes_turn won't proceed when the preflight checks fail.
Selective E2E Results — ✅ All requested jobs passedRun: 26957802252
|
Selective E2E Results — ✅ All requested jobs passedRun: 26971628360
|
Selective E2E Results — ✅ All requested jobs passedRun: 26978415946
|
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26978991651
|
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26979734796
|
Selective E2E Results — ✅ All requested jobs passedRun: 26980271170
|
## Summary
- Add the v0.0.59 release notes from the GitHub announcement discussion.
- Refresh local inference and credential-storage guidance for the
current release behavior.
- Regenerate the user skills from the updated Fern docs.
- Tighten release-prep and docs review guidance for generated skills, PR
labels, and shared `$$nemoclaw` command placeholders.
## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" --glob '*.{md,mdx}'`
- `git diff --check`
- `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC
permission failure)
- `npm run typecheck:cli`
- Pre-commit hooks during commit passed, including markdownlint,
docs-to-skills verification, gitleaks, commitlint, and skills YAML
tests.
## Source Summary
- #3679, #4437, #4681, #4766, #4772, #4775, #4786 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`,
`docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27
compatibility, runtime path pinning, plugin registry recovery, live
gateway reconciliation, and clearer host-alias/startup diagnostics.
- #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`,
`docs/inference/inference-options.mdx`,
`docs/inference/use-local-inference.mdx`,
`docs/inference/switch-inference-providers.mdx`: Document the release
inference changes covering Local NIM waits, Hermes Anthropic routing,
Nemotron 3 Ultra, the current Ollama starter fallback, and Spark
managed-vLLM context length.
- #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`,
`docs/security/credential-storage.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`: Capture permission healing,
gateway-stored credential reuse, cross-sandbox messaging credential
conflict checks, and CDI preflight diagnostics.
- #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`:
Regenerate the user skill references from the updated source docs.
- Follow-up maintenance ->
`.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`,
`.coderabbit.yaml`: Add release-prep area labels for docs and skills
PRs, and teach docs review guidance that `$$nemoclaw` is the correct
shared command placeholder for examples that work across agent aliases.
Note: the `documentation` label was not present in the repository, so
this PR is labeled with `v0.0.59` only.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Updated default model for local Ollama inference setup to qwen3.5:9b
* Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option
* Clarified credential storage and reuse behavior for post-deployment
(day-two) operations
* Added v0.0.59 release notes covering OpenClaw compatibility, inference
options, Hermes messaging sync, and troubleshooting
* Clarified CLI selection guidance and updated OpenClaw version example
in status output
* Revised release-prep instructions and docs review guidance for CLI
alias usage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
nvidia/nemotron-3-ultra-550b-a55bas the second curated NVIDIA Endpoints model, directly below the default Nemotron 3 Super optionnvidia/nemotron-3-super-120b-a12bBuild endpoint check
https://integrate.api.nvidia.com/v1/modelsreturnsnvidia/nemotron-3-ultra-550b-a55bwithowned_by: nvidiaTests
npm run build:clinpx vitest run src/lib/inference/config.test.ts src/lib/inference/model-prompts.test.ts src/lib/inference/provider-models.test.ts test/onboard-selection.test.tsSummary by CodeRabbit
New Features
Documentation
Tests
CI