fix(detect): recognise DGX Spark unified-memory GPU + tighten Ollama default-model gate by laitingsheng · Pull Request #3518 · NVIDIA/NemoClaw

laitingsheng · 2026-05-14T15:46:17Z

Summary

On DGX Spark hosts (#3510), nemoclaw onboard reports contradictory preflight output — "no GPU detected" on one line and "NVIDIA GPU hardware detected but nvidia-smi is not available" on the next — because src/lib/inference/nim.ts:detectGpu cannot recognise the device that nvidia-smi reports as NVIDIA JMJWOA-Generic-GPU with memory.total=[N/A]. The primary parse path skips the entry (parseInt("[N/A]") → NaN), and the unified-memory fallback rejects it because the name matches none of the legacy regex tags (GB10, Thor, Orin, Xavier, Jetson, Tegra). detectGpu() returns null → hostGpuDetected=false → preflight prints "no GPU".

This PR makes detectGpu() accept the case directly, and tightens the downstream Ollama default-model picker so a future partial detection cannot silently pull a 22 GB model onto a host whose acceleration is unconfirmed.

Related Issue

Resolves #3510

Changes

src/lib/inference/nim.ts: in the unified-memory fallback path, cross-check the firmware platform up front via detectNvidiaPlatform(). When firmware confirms a unified-memory platform (spark / jetson), accept any name nvidia-smi reports — not just names matching the legacy regex tag list. Discrete-GPU Linux hosts are unaffected; on those, firmware reports linux and the regex-tagged behaviour stays the gate.
src/lib/inference/local.ts: extract a shared isLargeOllamaCapableGpu predicate and add type?: string to GpuInfo so the bootstrap-model and default-model helpers gate the 22 GB pull on gpu.type === "nvidia" || gpu.type === "apple" in addition to the existing memory threshold. Defensive: a future partial-detection regression with a high totalMemoryMB but an unknown type no longer triggers the 22 GB pull.
src/lib/inference/nim.test.ts: two new cases — JMJWOA-Generic-GPU accepted on Spark firmware with host RAM as totalMemoryMB; unrecognised name on generic Linux firmware stays null.
src/lib/inference/local.test.ts: existing memory-only cases updated to pass type: "nvidia"; new cases for Apple-Silicon promotion and the type-guard downgrade.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved GPU type detection accuracy with enhanced support for NVIDIA and Apple Silicon GPUs
- Refined model selection logic to use confirmed GPU type information for better defaults
- Enhanced fallback behavior for ambiguous GPU detection scenarios

…eports an unrecognised name Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…or Apple GPU type Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

copy-pr-bot · 2026-05-14T15:46:21Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-14T15:46:23Z

📝 Walkthrough

Walkthrough

This PR gates large Ollama model selection on confirmed GPU type and memory, preventing CPU-only fallback when GPU detection is ambiguous or incomplete. A new isLargeOllamaCapableGpu predicate centralizes this check, and GPU detection is hardened to validate firmware platform before accepting unrecognized device names.

Changes

GPU Type-Gated Ollama Model Selection

Layer / File(s)	Summary
GPU type field and large-model capability predicate `src/lib/inference/local.ts`	`GpuInfo` gains optional `type?: string` field; new `isLargeOllamaCapableGpu(gpu)` predicate requires `type` to be `"nvidia"` or `"apple"` and `totalMemoryMB` to meet the large-model threshold.
GPU detection platform awareness `src/lib/inference/nim.ts`	Unified-memory GPU detection now cross-checks firmware platform before accepting unrecognized `nvidia-smi` device names; only promotes untagged GPUs to unified-memory if firmware indicates a known unified-memory platform (spark or jetson).
Model selection and GPU detection validation `src/lib/inference/local.test.ts`, `src/lib/inference/nim.test.ts`	Tests validate Ollama model selection gates on GPU type (nvidia/apple with large memory, missing/unrecognized types as regression guard for `#3510`) and Spark firmware handling of unrecognized GPU names.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NemoClaw#3422: Both PRs modify src/lib/inference/local.ts's Ollama "large-memory" bootstrap/default model selection logic (adding/promoting qwen3.6:35b and updating getBootstrapOllamaModelOptions/getDefaultOllamaModel behavior around LARGE_OLLAMA_MIN_MEMORY_MB).

Suggested reviewers

ericksoa

Poem

🐰 When GPUs are shy and types disappear,
We downgrade our models to keep inference clear.
No 35 billion parameters on CPU alone—
Spark's firmware check now guards what's known! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the two main changes: recognizing DGX Spark unified-memory GPU detection and tightening Ollama default-model selection gates.
Linked Issues check	✅ Passed	The PR successfully addresses `#3510` by fixing JMJWOA-Generic-GPU detection on Spark platforms and preventing large Ollama model pulls when GPU type is unconfirmed.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to resolving `#3510`: GPU detection fixes, Ollama model-selection gating, and associated test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/3510-spark-gpu-detect-jmjwoa

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-14T15:49:12Z

E2E Advisor Recommendation

Required E2E: gpu-e2e
Optional E2E: gpu-double-onboard-e2e, inference-routing-e2e

Dispatch hint: gpu-e2e

Auto-dispatched E2E: gpu-e2e via nightly-e2e.yaml at 57fd5d389adbcc32d4615e4af19dd5290948c237 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

gpu-e2e (high; requires enabled self-hosted NVIDIA GPU runner and pulls/runs an Ollama model): Runs the real NVIDIA GPU local Ollama flow: install Ollama, install/onboard NemoClaw with NEMOCLAW_PROVIDER=ollama, auto-select/pull a model from detected GPU info, create a sandbox, verify Ollama/auth-proxy reachability, and perform live inference through the sandbox. This is the highest-signal existing E2E for the changed GPU detection and Ollama default-model logic.

Optional E2E

gpu-double-onboard-e2e (high; requires enabled self-hosted NVIDIA GPU runner and a second onboard cycle): Useful adjacent coverage for repeated Ollama onboarding on a GPU host. It exercises the same local Ollama provider path plus auth-proxy token consistency after re-onboard, but the PR does not directly change token/proxy persistence.
inference-routing-e2e (medium; Ubuntu Docker workflow with some cases skipping without provider secrets): Optional broad sanity check for inference.local routing and error classification. It is less directly targeted because it primarily covers cloud/OpenAI-compatible routing rather than local Ollama GPU sizing.

New E2E recommendations

DGX Spark unified-memory GPU detection (high): This PR fixes a Spark-specific detection path where nvidia-smi may report an unrecognised name such as NVIDIA JMJWOA-Generic-GPU with memory.total=[N/A]. Existing dispatchable workflows include GPU Ollama on an RTX Pro runner, but no workflow job appears to run on real DGX Spark hardware to validate firmware-based Spark classification and resulting Ollama model selection.
- Suggested test: Add a DGX Spark local Ollama onboarding E2E job or scenario that runs on a Spark-class runner, asserts detectGpu/preflight reports platform=spark/unifiedMemory, verifies the selected local Ollama model, and completes sandbox inference.
Apple Silicon local Ollama model defaults (medium): The PR explicitly allows Apple Silicon with sufficient unified memory to receive large Ollama model options, but the existing macOS E2E uses cloud inference/full E2E when Docker is available and does not validate local Ollama onboarding/model selection on Apple Silicon.
- Suggested test: Add a macOS Apple Silicon local Ollama onboarding smoke or scenario that validates bootstrap model options/default model selection and local provider setup without requiring a full large-model pull unless gated by runner capacity.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: gpu-e2e

coderabbitai

🧹 Nitpick comments (1)

src/lib/inference/local.test.ts (1)

363-365: 💤 Low value

Unused variable isProxy.

The variable isProxy is assigned but never used in this test callback. Per coding guidelines, prefix unused variables with underscore.

Proposed fix

       runCurlProbeImpl: (argv: string[]) => {
-        const isProxy = argv.some(
+        const _isProxy = argv.some(
           (a) => typeof a === "string" && a.includes("11435"),
         );

As per coding guidelines: "Prefix unused variables with underscore (e.g., _unused)".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/inference/local.test.ts` around lines 363 - 365, The variable isProxy
in the test callback is assigned but never used; rename it to _isProxy (prefix
with an underscore) to follow the unused-variable convention, i.e., change the
declaration "const isProxy = argv.some(...)" to "const _isProxy =
argv.some(...)" so the linter/guidelines recognize it as intentionally unused
while leaving the argv.some(...) logic intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/inference/local.test.ts`:
- Around line 363-365: The variable isProxy in the test callback is assigned but
never used; rename it to _isProxy (prefix with an underscore) to follow the
unused-variable convention, i.e., change the declaration "const isProxy =
argv.some(...)" to "const _isProxy = argv.some(...)" so the linter/guidelines
recognize it as intentionally unused while leaving the argv.some(...) logic
intact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 048cf08b-dbe3-4bc9-9945-be7f4fa86216

📥 Commits

Reviewing files that changed from the base of the PR and between 25a9ee9 and aea578f.

📒 Files selected for processing (4)

src/lib/inference/local.test.ts
src/lib/inference/local.ts
src/lib/inference/nim.test.ts
src/lib/inference/nim.ts

github-actions · 2026-05-14T16:02:10Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 25870617732
Target ref: aea578f9787519e19898241b6ae882120b9da0cd
Workflow ref: main
Requested jobs: gpu-e2e,gpu-double-onboard-e2e
Summary: 0 passed, 0 failed, 2 skipped

Job	Result
gpu-double-onboard-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped

github-actions · 2026-05-14T19:38:58Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 25881387333
Target ref: 75d671568fccfa19d8cc9009b37436a34085e1e5
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

github-actions · 2026-05-15T01:48:54Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 25895829099
Target ref: 57fd5d389adbcc32d4615e4af19dd5290948c237
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

github-actions · 2026-05-15T02:44:33Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 25897416587
Target ref: 57fd5d389adbcc32d4615e4af19dd5290948c237
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

laitingsheng added 3 commits May 14, 2026 15:27

fix(detect): recognise DGX Spark unified-memory GPU when nvidia-smi r…

5230199

…eports an unrecognised name Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

fix(inference): gate large Ollama model defaults on confirmed NVIDIA …

cf356b9

…or Apple GPU type Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

chore(detect): drop NVB reference from JMJWOA detection comment

aea578f

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng marked this pull request as ready for review May 14, 2026 15:57

laitingsheng added fix labels May 14, 2026

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

laitingsheng added the v0.0.42 label May 14, 2026

Merge branch 'main' into fix/3510-spark-gpu-detect-jmjwoa

75d6715

cv added v0.0.43 and removed v0.0.42 labels May 14, 2026

Merge branch 'main' into fix/3510-spark-gpu-detect-jmjwoa

57fd5d3

cv added v0.0.44 and removed v0.0.43 labels May 15, 2026

cv approved these changes May 15, 2026

View reviewed changes

cv merged commit f4cf76a into main May 15, 2026
24 checks passed

miyoungc mentioned this pull request May 16, 2026

docs: update for 0.0.44 #3637

Merged

12 tasks

wscurran added area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression and removed NemoClaw CLI labels Jun 3, 2026

Conversation

laitingsheng commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 14, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented May 14, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented May 15, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented May 15, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

laitingsheng commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading