Fix Local NIM onboarding on RTX 3090 / WSL by benwgarton · Pull Request #236 · NVIDIA/NemoClaw

benwgarton · 2026-03-17T19:25:54Z

Summary

This PR fixes a Local NIM onboarding path that fails on a common consumer-GPU setup: a single RTX 3090 running under WSL.

What this changes

adds Local NIM runtime support for NGC_API_KEY
avoids requesting a GPU-backed OpenShell sandbox for host-side Local NIM
increases the Local NIM health wait so slower first-start paths are not treated as failures
applies the runtime overrides needed for meta/llama-3.1-8b-instruct to start successfully on RTX 3090 + WSL:
- NIM_MODEL_PROFILE=default
- NIM_RELAX_MEM_CONSTRAINTS=1
- NIM_MAX_GPU_MEMORY_UTILIZATION_STARTUP=1.0
- NIM_MAX_MODEL_LEN=32768

Why

During validation on an RTX 3090, the original requested model (
vidia/llama-3.3-nemotron-super-49b-v1) had no runnable profile on this hardware. Switching to the smaller Local NIM image (meta/llama-3.1-8b-instruct) worked, but only after:

passing NGC_API_KEY at container runtime
selecting the generic vLLM profile
reducing max model length so KV cache fits on the card
allowing enough time for the slower WSL startup path to finish

Without these changes, onboarding either falls back incorrectly or fails even though a working Local NIM configuration exists for this host class.

Scope

This PR keeps the fix narrow and only changes the Local NIM onboarding/runtime path.

Validation

Validated locally with:

OpenShell healthy and GPU-visible
Docker GPU passthrough working
Local NIM healthy on http://localhost:8000/v1/models
NemoClaw sandbox configured to use �llm-local with meta/llama-3.1-8b-instruct

…ocal-nim # Conflicts: # bin/lib/onboard.js

cv · 2026-03-21T19:28:52Z

Hey @benwgarton, appreciate you sorting out the local NIM onboarding on RTX 3090 / WSL — that's a setup a lot of people run into issues with. Just a quick ask: there have been a good number of changes to main since this PR (new CI, features, etc.), and a rebase would help us review this with confidence. Could you update against the latest main whenever you get a chance? Looking forward to checking it out!

* Updated readme * Updated readme * Updated readme * Updated readme * Updated readme

## Summary Local NIM onboarding fails at the image pull step because `docker pull nvcr.io/nim/...` requires NGC registry authentication. This adds an NGC API key prompt during onboard that runs `docker login nvcr.io --password-stdin` before pulling the NIM image. The key is masked during input and handled securely via stdin. ## Related Issue Based on the investigation in PR #236. ## Changes - `src/lib/nim.ts`: Add `isNgcLoggedIn()` to check if Docker is already authenticated with nvcr.io, and `dockerLoginNgc()` to login securely via `--password-stdin`. - `src/lib/onboard.ts`: Prompt for NGC API key before NIM image pull when not already logged in. Masked input, one retry on failure. - `test/onboard-selection.test.ts`: Mock `isNgcLoggedIn` in NIM-local selection test. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Claude Code ---  Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>  ## Summary by CodeRabbit * **New Features** * Setup wizard enforces NGC Docker authentication for NIM model setup: interactive mode prompts for an NGC API key (one retry); non-interactive mode prints login instructions and exits. * **Bug Fixes / Reliability** * Improved detection and login handling for NGC Docker credentials so image pulls proceed only after successful authentication and failures are reported. * **Tests** * Added unit tests for NGC auth detection and updated onboarding tests to cover authenticated flows.  --------- Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com> Co-authored-by: Aaron Erickson 🦞 <aerickson@nvidia.com>

wscurran · 2026-04-21T18:13:52Z

Thanks for digging into the RTX 3090 / WSL Local NIM onboarding path — the specific gaps you hit here are real.

The files this PR targets (bin/lib/nim.js, bin/lib/onboard.js) were migrated to TypeScript in PR #1669, so this diff no longer applies directly to the codebase. bin/lib/credentials.js is still present, but nim.js and onboard.js are gone.

If the NIM onboarding issue persists on the current codebase, we'd welcome a resubmit targeting the TypeScript equivalents in src/. The RTX 3090 / WSL path is worth getting right — if you can confirm the issue still exists against main, that's a great starting point.

codex added 2 commits March 17, 2026 12:24

Add

6154a44

Merge remote-tracking branch 'origin/main' into codex/nemoclaw-3090-l…

e54e2b3

…ocal-nim # Conflicts: # bin/lib/onboard.js

wscurran added Windows/WSL labels Mar 18, 2026

mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026

docs: Readme updates (NVIDIA#236)

fe4b01d

* Updated readme * Updated readme * Updated readme * Updated readme * Updated readme

ahunnargikar-nvidia assigned zyang-dev Apr 17, 2026

zyang-dev mentioned this pull request Apr 17, 2026

fix: prompt for NGC API key when pulling local NIM images #2043

Merged

13 tasks

wscurran closed this Apr 21, 2026

wscurran added area: local-models Local model providers, downloads, launch, or connectivity area: providers Inference provider integrations and provider behavior platform: wsl Affects Windows Subsystem for Linux bug-fix PR fixes a bug or regression and removed priority: medium labels Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Local NIM onboarding on RTX 3090 / WSL#236

Fix Local NIM onboarding on RTX 3090 / WSL#236
benwgarton wants to merge 2 commits into
NVIDIA:mainfrom
benwgarton:codex/nemoclaw-3090-local-nim

benwgarton commented Mar 17, 2026

Uh oh!

cv commented Mar 21, 2026

Uh oh!

wscurran commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

benwgarton commented Mar 17, 2026

Summary

What this changes

Why

Scope

Validation

Uh oh!

cv commented Mar 21, 2026

Uh oh!

wscurran commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants