fix(detect): recognise DGX Spark unified-memory GPU + tighten Ollama default-model gate#3518
Conversation
…eports an unrecognised name Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…or Apple GPU type Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThis PR gates large Ollama model selection on confirmed GPU type and memory, preventing CPU-only fallback when GPU detection is ambiguous or incomplete. A new ChangesGPU Type-Gated Ollama Model Selection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/inference/local.test.ts (1)
363-365: 💤 Low valueUnused variable
isProxy.The variable
isProxyis assigned but never used in this test callback. Per coding guidelines, prefix unused variables with underscore.Proposed fix
runCurlProbeImpl: (argv: string[]) => { - const isProxy = argv.some( + const _isProxy = argv.some( (a) => typeof a === "string" && a.includes("11435"), );As per coding guidelines: "Prefix unused variables with underscore (e.g.,
_unused)".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/inference/local.test.ts` around lines 363 - 365, The variable isProxy in the test callback is assigned but never used; rename it to _isProxy (prefix with an underscore) to follow the unused-variable convention, i.e., change the declaration "const isProxy = argv.some(...)" to "const _isProxy = argv.some(...)" so the linter/guidelines recognize it as intentionally unused while leaving the argv.some(...) logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/inference/local.test.ts`:
- Around line 363-365: The variable isProxy in the test callback is assigned but
never used; rename it to _isProxy (prefix with an underscore) to follow the
unused-variable convention, i.e., change the declaration "const isProxy =
argv.some(...)" to "const _isProxy = argv.some(...)" so the linter/guidelines
recognize it as intentionally unused while leaving the argv.some(...) logic
intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 048cf08b-dbe3-4bc9-9945-be7f4fa86216
📒 Files selected for processing (4)
src/lib/inference/local.test.tssrc/lib/inference/local.tssrc/lib/inference/nim.test.tssrc/lib/inference/nim.ts
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-double-onboard-e2e | ⏭️ skipped |
| gpu-e2e | ⏭️ skipped |
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
Summary
On DGX Spark hosts (#3510),
nemoclaw onboardreports contradictory preflight output — "no GPU detected" on one line and "NVIDIA GPU hardware detected but nvidia-smi is not available" on the next — because src/lib/inference/nim.ts:detectGpu cannot recognise the device thatnvidia-smireports asNVIDIA JMJWOA-Generic-GPUwithmemory.total=[N/A]. The primary parse path skips the entry (parseInt("[N/A]") → NaN), and the unified-memory fallback rejects it because the name matches none of the legacy regex tags (GB10,Thor,Orin,Xavier,Jetson,Tegra).detectGpu()returns null →hostGpuDetected=false→ preflight prints "no GPU".This PR makes
detectGpu()accept the case directly, and tightens the downstream Ollama default-model picker so a future partial detection cannot silently pull a 22 GB model onto a host whose acceleration is unconfirmed.Related Issue
Resolves #3510
Changes
src/lib/inference/nim.ts: in the unified-memory fallback path, cross-check the firmware platform up front viadetectNvidiaPlatform(). When firmware confirms a unified-memory platform (spark/jetson), accept any namenvidia-smireports — not just names matching the legacy regex tag list. Discrete-GPU Linux hosts are unaffected; on those, firmware reportslinuxand the regex-tagged behaviour stays the gate.src/lib/inference/local.ts: extract a sharedisLargeOllamaCapableGpupredicate and addtype?: stringtoGpuInfoso the bootstrap-model and default-model helpers gate the 22 GB pull ongpu.type === "nvidia" || gpu.type === "apple"in addition to the existing memory threshold. Defensive: a future partial-detection regression with a hightotalMemoryMBbut an unknowntypeno longer triggers the 22 GB pull.src/lib/inference/nim.test.ts: two new cases — JMJWOA-Generic-GPU accepted on Spark firmware with host RAM as totalMemoryMB; unrecognised name on generic Linux firmware stays null.src/lib/inference/local.test.ts: existing memory-only cases updated to passtype: "nvidia"; new cases for Apple-Silicon promotion and the type-guard downgrade.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit
Release Notes