Fix Local NIM onboarding on RTX 3090 / WSL#236
Conversation
…ocal-nim # Conflicts: # bin/lib/onboard.js
|
Hey @benwgarton, appreciate you sorting out the local NIM onboarding on RTX 3090 / WSL — that's a setup a lot of people run into issues with. Just a quick ask: there have been a good number of changes to main since this PR (new CI, features, etc.), and a rebase would help us review this with confidence. Could you update against the latest main whenever you get a chance? Looking forward to checking it out! |
* Updated readme * Updated readme * Updated readme * Updated readme * Updated readme
<!-- markdownlint-disable MD041 --> ## Summary Local NIM onboarding fails at the image pull step because `docker pull nvcr.io/nim/...` requires NGC registry authentication. This adds an NGC API key prompt during onboard that runs `docker login nvcr.io --password-stdin` before pulling the NIM image. The key is masked during input and handled securely via stdin. ## Related Issue Based on the investigation in PR #236. ## Changes - `src/lib/nim.ts`: Add `isNgcLoggedIn()` to check if Docker is already authenticated with nvcr.io, and `dockerLoginNgc()` to login securely via `--password-stdin`. - `src/lib/onboard.ts`: Prompt for NGC API key before NIM image pull when not already logged in. Masked input, one retry on failure. - `test/onboard-selection.test.ts`: Mock `isNgcLoggedIn` in NIM-local selection test. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Claude Code --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Setup wizard enforces NGC Docker authentication for NIM model setup: interactive mode prompts for an NGC API key (one retry); non-interactive mode prints login instructions and exits. * **Bug Fixes / Reliability** * Improved detection and login handling for NGC Docker credentials so image pulls proceed only after successful authentication and failures are reported. * **Tests** * Added unit tests for NGC auth detection and updated onboarding tests to cover authenticated flows. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com> Co-authored-by: Aaron Erickson 🦞 <aerickson@nvidia.com>
|
Thanks for digging into the RTX 3090 / WSL Local NIM onboarding path — the specific gaps you hit here are real. The files this PR targets ( If the NIM onboarding issue persists on the current codebase, we'd welcome a resubmit targeting the TypeScript equivalents in |
Summary
This PR fixes a Local NIM onboarding path that fails on a common consumer-GPU setup: a single RTX 3090 running under WSL.
What this changes
Why
During validation on an RTX 3090, the original requested model (
vidia/llama-3.3-nemotron-super-49b-v1) had no runnable profile on this hardware. Switching to the smaller Local NIM image (meta/llama-3.1-8b-instruct) worked, but only after:
Without these changes, onboarding either falls back incorrectly or fails even though a working Local NIM configuration exists for this host class.
Scope
This PR keeps the fix narrow and only changes the Local NIM onboarding/runtime path.
Validation
Validated locally with: