Skip to content

fix(detect): recognise DGX Spark unified-memory GPU + tighten Ollama default-model gate#3518

Merged
cv merged 5 commits into
mainfrom
fix/3510-spark-gpu-detect-jmjwoa
May 15, 2026
Merged

fix(detect): recognise DGX Spark unified-memory GPU + tighten Ollama default-model gate#3518
cv merged 5 commits into
mainfrom
fix/3510-spark-gpu-detect-jmjwoa

Conversation

@laitingsheng

@laitingsheng laitingsheng commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

On DGX Spark hosts (#3510), nemoclaw onboard reports contradictory preflight output — "no GPU detected" on one line and "NVIDIA GPU hardware detected but nvidia-smi is not available" on the next — because src/lib/inference/nim.ts:detectGpu cannot recognise the device that nvidia-smi reports as NVIDIA JMJWOA-Generic-GPU with memory.total=[N/A]. The primary parse path skips the entry (parseInt("[N/A]") → NaN), and the unified-memory fallback rejects it because the name matches none of the legacy regex tags (GB10, Thor, Orin, Xavier, Jetson, Tegra). detectGpu() returns null → hostGpuDetected=false → preflight prints "no GPU".

This PR makes detectGpu() accept the case directly, and tightens the downstream Ollama default-model picker so a future partial detection cannot silently pull a 22 GB model onto a host whose acceleration is unconfirmed.

Related Issue

Resolves #3510

Changes

  • src/lib/inference/nim.ts: in the unified-memory fallback path, cross-check the firmware platform up front via detectNvidiaPlatform(). When firmware confirms a unified-memory platform (spark / jetson), accept any name nvidia-smi reports — not just names matching the legacy regex tag list. Discrete-GPU Linux hosts are unaffected; on those, firmware reports linux and the regex-tagged behaviour stays the gate.
  • src/lib/inference/local.ts: extract a shared isLargeOllamaCapableGpu predicate and add type?: string to GpuInfo so the bootstrap-model and default-model helpers gate the 22 GB pull on gpu.type === "nvidia" || gpu.type === "apple" in addition to the existing memory threshold. Defensive: a future partial-detection regression with a high totalMemoryMB but an unknown type no longer triggers the 22 GB pull.
  • src/lib/inference/nim.test.ts: two new cases — JMJWOA-Generic-GPU accepted on Spark firmware with host RAM as totalMemoryMB; unrecognised name on generic Linux firmware stays null.
  • src/lib/inference/local.test.ts: existing memory-only cases updated to pass type: "nvidia"; new cases for Apple-Silicon promotion and the type-guard downgrade.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved GPU type detection accuracy with enhanced support for NVIDIA and Apple Silicon GPUs
    • Refined model selection logic to use confirmed GPU type information for better defaults
    • Enhanced fallback behavior for ambiguous GPU detection scenarios

Review Change Stack

…eports an unrecognised name

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…or Apple GPU type

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 14, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR gates large Ollama model selection on confirmed GPU type and memory, preventing CPU-only fallback when GPU detection is ambiguous or incomplete. A new isLargeOllamaCapableGpu predicate centralizes this check, and GPU detection is hardened to validate firmware platform before accepting unrecognized device names.

Changes

GPU Type-Gated Ollama Model Selection

Layer / File(s) Summary
GPU type field and large-model capability predicate
src/lib/inference/local.ts
GpuInfo gains optional type?: string field; new isLargeOllamaCapableGpu(gpu) predicate requires type to be "nvidia" or "apple" and totalMemoryMB to meet the large-model threshold.
GPU detection platform awareness
src/lib/inference/nim.ts
Unified-memory GPU detection now cross-checks firmware platform before accepting unrecognized nvidia-smi device names; only promotes untagged GPUs to unified-memory if firmware indicates a known unified-memory platform (spark or jetson).
Model selection and GPU detection validation
src/lib/inference/local.test.ts, src/lib/inference/nim.test.ts
Tests validate Ollama model selection gates on GPU type (nvidia/apple with large memory, missing/unrecognized types as regression guard for #3510) and Spark firmware handling of unrecognized GPU names.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#3422: Both PRs modify src/lib/inference/local.ts's Ollama "large-memory" bootstrap/default model selection logic (adding/promoting qwen3.6:35b and updating getBootstrapOllamaModelOptions/getDefaultOllamaModel behavior around LARGE_OLLAMA_MIN_MEMORY_MB).

Suggested reviewers

  • ericksoa

Poem

🐰 When GPUs are shy and types disappear,
We downgrade our models to keep inference clear.
No 35 billion parameters on CPU alone—
Spark's firmware check now guards what's known! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: recognizing DGX Spark unified-memory GPU detection and tightening Ollama default-model selection gates.
Linked Issues check ✅ Passed The PR successfully addresses #3510 by fixing JMJWOA-Generic-GPU detection on Spark platforms and preventing large Ollama model pulls when GPU type is unconfirmed.
Out of Scope Changes check ✅ Passed All changes are directly scoped to resolving #3510: GPU detection fixes, Ollama model-selection gating, and associated test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/3510-spark-gpu-detect-jmjwoa

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: gpu-e2e
Optional E2E: gpu-double-onboard-e2e, inference-routing-e2e

Dispatch hint: gpu-e2e

Auto-dispatched E2E: gpu-e2e via nightly-e2e.yaml at 57fd5d389adbcc32d4615e4af19dd5290948c237nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • gpu-e2e (high; requires enabled self-hosted NVIDIA GPU runner and pulls/runs an Ollama model): Runs the real NVIDIA GPU local Ollama flow: install Ollama, install/onboard NemoClaw with NEMOCLAW_PROVIDER=ollama, auto-select/pull a model from detected GPU info, create a sandbox, verify Ollama/auth-proxy reachability, and perform live inference through the sandbox. This is the highest-signal existing E2E for the changed GPU detection and Ollama default-model logic.

Optional E2E

  • gpu-double-onboard-e2e (high; requires enabled self-hosted NVIDIA GPU runner and a second onboard cycle): Useful adjacent coverage for repeated Ollama onboarding on a GPU host. It exercises the same local Ollama provider path plus auth-proxy token consistency after re-onboard, but the PR does not directly change token/proxy persistence.
  • inference-routing-e2e (medium; Ubuntu Docker workflow with some cases skipping without provider secrets): Optional broad sanity check for inference.local routing and error classification. It is less directly targeted because it primarily covers cloud/OpenAI-compatible routing rather than local Ollama GPU sizing.

New E2E recommendations

  • DGX Spark unified-memory GPU detection (high): This PR fixes a Spark-specific detection path where nvidia-smi may report an unrecognised name such as NVIDIA JMJWOA-Generic-GPU with memory.total=[N/A]. Existing dispatchable workflows include GPU Ollama on an RTX Pro runner, but no workflow job appears to run on real DGX Spark hardware to validate firmware-based Spark classification and resulting Ollama model selection.
    • Suggested test: Add a DGX Spark local Ollama onboarding E2E job or scenario that runs on a Spark-class runner, asserts detectGpu/preflight reports platform=spark/unifiedMemory, verifies the selected local Ollama model, and completes sandbox inference.
  • Apple Silicon local Ollama model defaults (medium): The PR explicitly allows Apple Silicon with sufficient unified memory to receive large Ollama model options, but the existing macOS E2E uses cloud inference/full E2E when Docker is available and does not validate local Ollama onboarding/model selection on Apple Silicon.
    • Suggested test: Add a macOS Apple Silicon local Ollama onboarding smoke or scenario that validates bootstrap model options/default model selection and local provider setup without requiring a full large-model pull unless gated by runner capacity.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: gpu-e2e

@laitingsheng laitingsheng marked this pull request as ready for review May 14, 2026 15:57

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/inference/local.test.ts (1)

363-365: 💤 Low value

Unused variable isProxy.

The variable isProxy is assigned but never used in this test callback. Per coding guidelines, prefix unused variables with underscore.

Proposed fix
       runCurlProbeImpl: (argv: string[]) => {
-        const isProxy = argv.some(
+        const _isProxy = argv.some(
           (a) => typeof a === "string" && a.includes("11435"),
         );

As per coding guidelines: "Prefix unused variables with underscore (e.g., _unused)".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/inference/local.test.ts` around lines 363 - 365, The variable isProxy
in the test callback is assigned but never used; rename it to _isProxy (prefix
with an underscore) to follow the unused-variable convention, i.e., change the
declaration "const isProxy = argv.some(...)" to "const _isProxy =
argv.some(...)" so the linter/guidelines recognize it as intentionally unused
while leaving the argv.some(...) logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/inference/local.test.ts`:
- Around line 363-365: The variable isProxy in the test callback is assigned but
never used; rename it to _isProxy (prefix with an underscore) to follow the
unused-variable convention, i.e., change the declaration "const isProxy =
argv.some(...)" to "const _isProxy = argv.some(...)" so the linter/guidelines
recognize it as intentionally unused while leaving the argv.some(...) logic
intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 048cf08b-dbe3-4bc9-9945-be7f4fa86216

📥 Commits

Reviewing files that changed from the base of the PR and between 25a9ee9 and aea578f.

📒 Files selected for processing (4)
  • src/lib/inference/local.test.ts
  • src/lib/inference/local.ts
  • src/lib/inference/nim.test.ts
  • src/lib/inference/nim.ts

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 25870617732
Target ref: aea578f9787519e19898241b6ae882120b9da0cd
Workflow ref: main
Requested jobs: gpu-e2e,gpu-double-onboard-e2e
Summary: 0 passed, 0 failed, 2 skipped

Job Result
gpu-double-onboard-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 25881387333
Target ref: 75d671568fccfa19d8cc9009b37436a34085e1e5
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

@cv cv added v0.0.43 and removed v0.0.42 labels May 14, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 25895829099
Target ref: 57fd5d389adbcc32d4615e4af19dd5290948c237
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

@cv cv added v0.0.44 and removed v0.0.43 labels May 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 25897416587
Target ref: 57fd5d389adbcc32d4615e4af19dd5290948c237
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

@cv cv merged commit f4cf76a into main May 15, 2026
24 checks passed
@miyoungc miyoungc mentioned this pull request May 16, 2026
12 tasks
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression and removed NemoClaw CLI labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DGX Spark][Install] install-ollama pulls 35B model after preflight reports "no GPU detected" — no guard or model downgrade

3 participants