fix(onboard): pick Ollama bootstrap model from a memory-aware registry by laitingsheng · Pull Request #4132 · NVIDIA/NemoClaw

laitingsheng · 2026-05-23T15:40:21Z

Summary

Bootstrap-model selection sized only against totalMemoryMB, so a 128 GiB host with ~12 GiB actually free still got promoted to the 23 GiB qwen3.6:35b default and Ollama crashed mid-load. The non-interactive NEMOCLAW_MODEL=qwen3.6:35b path bypassed the menu entirely.

Add a memory-aware registry as the single source of truth, populate availableMemoryMB from nvidia-smi memory.free / MemAvailable / vm_stat, and route every selection path (menu, default, interactive, explicit env var, recovered session) through one capacity check. Unknown user-supplied tags still pass through.

Related Issue

Fixes #4113

Changes

src/lib/inference/ollama-model-registry.ts — new module holding the memory-aware model registry and capacity helpers.
src/lib/inference/local.ts — selection helpers delegate to the registry; new resolveNonInteractiveOllamaModel gates the explicit-model path.
src/lib/inference/nim.ts — detectGpu populates availableMemoryMB on NVIDIA, unified-memory Linux, Tegra, and macOS.
src/lib/inference/ollama/proxy.ts — interactive menu filters installed models through the capacity check.
src/lib/inference/ollama/model-size.ts — fallback download-size table now derives from the registry.
src/lib/onboard.ts — non-interactive selection delegates to resolveNonInteractiveOllamaModel.
docs/inference/use-local-inference.mdx — wording updated for available-memory-based selection.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

New Features
- GPU detection now reports available (free) memory and uses it to pick starter and default models.
- Onboarding and model prompts now show only models that fit the host GPU and warn when a requested model won’t fit.
Documentation
- Updated local inference setup to reflect memory-aware starter model selection and non-interactive fallback behavior.
Tests
- Added extensive tests covering model registry, memory-aware selection, and GPU detection.

Bootstrap-model selection on unified-memory hosts (DGX Spark, Apple Silicon, Jetson) sized only against totalMemoryMB, so a host with 128 GiB total but another GPU workload eating most of the system pool would still be promoted to the 23 GiB qwen3.6:35b default and crash the Ollama runner mid-load. Move the per-model footprints into src/lib/inference/ollama-model-registry.ts so adding a future model is a one-line registry edit, and have detectGpu populate availableMemoryMB (nvidia-smi memory.free on the primary path, MemAvailable on unified-memory and Tegra fallbacks). The selector keeps every registry entry whose requiredMemoryMB fits available memory and falls back to the smallest model when nothing else fits. Fixes #4113 Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-05-23T15:40:31Z

📝 Walkthrough

Walkthrough

Adds available GPU memory (availableMemoryMB) to detection, a registry of Ollama models with memory/download metadata, and refactors local model selection, prompts, non-interactive onboarding, and model-size fallbacks to prefer models that fit currently available memory.

Changes

Available Memory Detection and Registry-Based Model Selection

Layer / File(s)	Summary
GPU detection with available memory `src/lib/inference/nim.ts`, `src/lib/inference/nim.test.ts`	detectGpu/VM/host probes expose optional `availableMemoryMB` from `nvidia-smi` `memory.free`, `free -m` MemAvailable, or `vm_stat`; tests updated for the new field.
Ollama model registry and helpers `src/lib/inference/ollama-model-registry.ts`, `src/lib/inference/ollama-model-registry.test.ts`	New registry exports per-tag `requiredMemoryMB` and `downloadSizeBytes`, `effectiveGpuMemoryMB()`, `modelFitsAvailableMemory()`, `fittableOllamaModelTags()`, `largestFittableOllamaModelTag()`, and download-size fallback map with comprehensive tests.
Local selection, prompts, onboarding integration `src/lib/inference/local.ts`, `src/lib/inference/local.test.ts`, `src/lib/inference/ollama/model-size.ts`, `src/lib/inference/ollama/proxy.ts`, `src/lib/inference/ollama/proxy.test.ts`, `src/lib/onboard.ts`, `docs/inference/use-local-inference.mdx`	`GpuInfo` extended with `availableMemoryMB?`; bootstrap/default selection now uses registry fittable tags; non-interactive resolution downgrades known oversize tags with warnings; installed-model prompts filter by fit; model-size fallback uses registry map; onboarding calls resolver. Tests and docs updated.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

[DGX Spark][Ollama] Onboarding selects qwen3.6:35b despite insufficient currently available GPU memory #4113: Addresses the same root cause by routing non-interactive selection through GPU-aware helpers to avoid choosing qwen3.6 when availableMemoryMB is insufficient.

Suggested labels

Provider: Ollama, documentation, enhancement: testing, v0.0.50

Suggested reviewers

ericksoa
cv
jyaunches

Poem

🐇 I hopped through GPUs late at night,
Counting free memory by soft moonlight.
When giants don't fit the space at hand,
I nudge you toward a smaller, safer land.
Models load happy — the rabbit's delight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'fix(onboard): pick Ollama bootstrap model from a memory-aware registry' directly and clearly summarizes the main change: routing Ollama model selection through a new memory-aware registry to respect available host memory.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/ollama-memory-aware-model-4113

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-23T15:41:42Z

E2E Advisor Recommendation

Required E2E: gpu-e2e
Optional E2E: gpu-double-onboard-e2e, ollama-proxy-e2e, gpu-repo-local-ollama-openclaw

Dispatch hint: gpu-e2e

Auto-dispatched E2E: gpu-e2e via nightly-e2e.yaml at d7688131f358be43eaae025565469f05490226f4 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

gpu-e2e (high (~30 minutes; self-hosted GPU)): Required because the PR changes real local Ollama onboarding/model selection and GPU-memory detection. This job exercises the source install, NEMOCLAW_PROVIDER=ollama onboarding, Ollama install/start/pull, auth proxy setup, sandbox creation, and live sandbox inference on a NVIDIA GPU runner, catching regressions in the changed detectGpu → model selection → validation path.

Optional E2E

gpu-double-onboard-e2e (high (~30 minutes; self-hosted GPU)): Useful adjacent coverage for re-running Ollama onboarding with an existing sandbox/token and recovered model state. The PR changes non-interactive and recovered/requested Ollama model resolution, but proxy token consistency itself is not the primary diff, so this is optional rather than merge-blocking.
ollama-proxy-e2e (medium (~15 minutes; installs/pulls small Ollama model)): Useful lower-scope validation because src/lib/inference/ollama/proxy.ts was touched and the local Ollama flow depends on the authenticated proxy. It validates real Ollama inference through the proxy, token enforcement, persistence, recovery, and container reachability, but it does not directly exercise GPU available-memory model selection.
gpu-repo-local-ollama-openclaw (high (self-hosted GPU scenario)): Scenario-runner equivalent coverage for the local Ollama OpenClaw profile with smoke, local-ollama-inference, and ollama-proxy suites. Good confidence if validating the newer scenario framework, but overlaps with gpu-e2e for this PR.

New E2E recommendations

local Ollama available-memory downgrade (high): Existing GPU E2E normally runs on an idle GPU and validates the happy path, but it is unlikely to exercise the new behavior where total memory is large while currently available memory is too low and onboarding downgrades from qwen3.6:35b/nemotron to qwen2.5:7b.
- Suggested test: Add a local Ollama onboarding E2E that creates a low-available-memory condition (for example by pre-allocating GPU memory or by using a controlled nvidia-smi/free shim in an E2E harness), runs non-interactive NEMOCLAW_PROVIDER=ollama onboarding, asserts the oversize-model warning/fallback, and verifies sandbox inference succeeds with the fallback model.
macOS Apple Silicon local Ollama sizing (medium): The PR adds macOS vm_stat-based availableMemoryMB handling, but the existing macOS scenario is cloud/Docker-optional and does not validate local Ollama model sizing on Apple Silicon.
- Suggested test: Add a macOS local Ollama onboarding or dry-run E2E/assertion that verifies Apple Silicon available-memory detection influences the starter model menu/default without requiring Docker-dependent sandbox suites on GitHub-hosted macOS.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: gpu-e2e

github-actions · 2026-05-23T15:41:42Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

None.

Relevant changed files

None.

github-actions · 2026-05-23T15:41:44Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26336844880
Target ref: 0d24286bdc1eb22b8c4e66aed63703454238f810
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

github-actions · 2026-05-23T15:42:53Z

PR Review Advisor

Findings: 1 needs attention, 1 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 2 still apply, 0 new items found

Review findings

🛠️ Needs attention

Inference/onboarding hotspots grew further instead of being extracted: Codebase drift check: the PR patches files that still exist and have active recent history, including inference/network and onboarding host-glue paths. The behavioral change is in-scope, but the implementation continues adding memory-detection and model-selection logic to already-large security-adjacent files rather than extracting focused helpers. This keeps Docker/NIM detection, local inference networking, Ollama proxy behavior, and onboarding selection glue concentrated in monoliths, increasing future review and regression risk.
- Recommendation: Extract Ollama selection policy and GPU available-memory probing/parsing into focused modules, and move new local/nim/proxy regression coverage out of the large hotspot test files where practical. At minimum, offset this PR's hotspot growth by extracting existing helper code before merge.
- Evidence: This is the prior advisor hotspot finding and it still applies. Trusted monolith deltas show src/lib/inference/nim.test.ts 1241→1361 (+120), src/lib/inference/local.test.ts 839→953 (+114), src/lib/inference/nim.ts 710→794 (+84), src/lib/inference/local.ts 1014→1089 (+75), and src/lib/inference/ollama/proxy.ts 812→834 (+22). Drift evidence confirms these files still exist and have active recent history; src/lib/onboard.ts also overlaps many open PRs even though this patch is net-zero there.

🔎 Worth checking

Missing explicit unified-memory fallback coverage for absent or malformed MemAvailable (src/lib/inference/nim.test.ts:633): The new tests cover availableMemoryMB propagation on primary NVIDIA, GB10/Spark, Orin, Jetson/Tegra, macOS, and a primary-path memory.free parse failure. However, the unified-memory Linux path still has a fixture whose `free -m` output lacks the `available` column without asserting the resulting `availableMemoryMB` contract. A regression in `readHostAvailableMemoryMB` could silently treat malformed output as usable or unintentionally size against an invalid value.
- Recommendation: Add a focused `detectGpu` test for a unified-memory/Spark or Jetson path where `free -m` lacks or malforms the `available` column, and assert that `availableMemoryMB` is omitted so downstream selection intentionally falls back to `totalMemoryMB`. Keep the positive propagation and primary-path parse-failure assertions already added.
- Evidence: This prior advisor finding still applies. The mixed unified-memory fixture in `src/lib/inference/nim.test.ts` returns `free -m` output with only `total used free`, but only asserts name/gpus behavior. The PR added `omits availableMemoryMB when memory.free fails to parse on the primary path` and macOS parse-failure coverage, but not the requested unified-memory `MemAvailable` fallback assertion.

🌱 Nice ideas

None.

Since last review details

Current findings:

Inference/onboarding hotspots grew further instead of being extracted: Codebase drift check: the PR patches files that still exist and have active recent history, including inference/network and onboarding host-glue paths. The behavioral change is in-scope, but the implementation continues adding memory-detection and model-selection logic to already-large security-adjacent files rather than extracting focused helpers. This keeps Docker/NIM detection, local inference networking, Ollama proxy behavior, and onboarding selection glue concentrated in monoliths, increasing future review and regression risk.
- Recommendation: Extract Ollama selection policy and GPU available-memory probing/parsing into focused modules, and move new local/nim/proxy regression coverage out of the large hotspot test files where practical. At minimum, offset this PR's hotspot growth by extracting existing helper code before merge.
- Evidence: This is the prior advisor hotspot finding and it still applies. Trusted monolith deltas show src/lib/inference/nim.test.ts 1241→1361 (+120), src/lib/inference/local.test.ts 839→953 (+114), src/lib/inference/nim.ts 710→794 (+84), src/lib/inference/local.ts 1014→1089 (+75), and src/lib/inference/ollama/proxy.ts 812→834 (+22). Drift evidence confirms these files still exist and have active recent history; src/lib/onboard.ts also overlaps many open PRs even though this patch is net-zero there.
Missing explicit unified-memory fallback coverage for absent or malformed MemAvailable (src/lib/inference/nim.test.ts:633): The new tests cover availableMemoryMB propagation on primary NVIDIA, GB10/Spark, Orin, Jetson/Tegra, macOS, and a primary-path memory.free parse failure. However, the unified-memory Linux path still has a fixture whose `free -m` output lacks the `available` column without asserting the resulting `availableMemoryMB` contract. A regression in `readHostAvailableMemoryMB` could silently treat malformed output as usable or unintentionally size against an invalid value.
- Recommendation: Add a focused `detectGpu` test for a unified-memory/Spark or Jetson path where `free -m` lacks or malforms the `available` column, and assert that `availableMemoryMB` is omitted so downstream selection intentionally falls back to `totalMemoryMB`. Keep the positive propagation and primary-path parse-failure assertions already added.
- Evidence: This prior advisor finding still applies. The mixed unified-memory fixture in `src/lib/inference/nim.test.ts` returns `free -m` output with only `total used free`, but only asserts name/gpus behavior. The PR added `omits availableMemoryMB when memory.free fails to parse on the primary path` and macOS parse-failure coverage, but not the requested unified-memory `MemAvailable` fallback assertion.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/inference/nim.ts`:
- Around line 79-84: The macOS/Apple branch in src/lib/inference/nim.ts
currently sets only totalMemoryMB and omits availableMemoryMB, causing
memory-aware sizing to be incorrect on Apple Silicon; update the macOS path in
the memory-probing logic (the function that builds the memory probe result which
includes totalMemoryMB and availableMemoryMB) to compute availableMemoryMB from
host vm statistics (e.g., parse vm_stat or use the appropriate host APIs to sum
free + inactive/available pages), set availableMemoryMB on the returned object
alongside totalMemoryMB, and fall back to totalMemoryMB only if the vm_stat/host
call fails so downstream callers still have a sensible value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3349ba31-8bf0-453d-9060-3829382be205

📥 Commits

Reviewing files that changed from the base of the PR and between 0f48781 and 0d24286.

📒 Files selected for processing (6)

src/lib/inference/local.test.ts
src/lib/inference/local.ts
src/lib/inference/nim.test.ts
src/lib/inference/nim.ts
src/lib/inference/ollama-model-registry.test.ts
src/lib/inference/ollama-model-registry.ts

Address review feedback on the bootstrap-model selector: - Add downloadSizeBytes alongside requiredMemoryMB in the registry and have model-size.ts read its fallback table from there, removing the duplicated model facts in src/lib/inference/ollama/model-size.ts. - Route the non-interactive NEMOCLAW_MODEL / recovered-session path through a new resolveNonInteractiveOllamaModel helper so an explicit oversized model triggers the same downgrade + warning as the menu-default path. Unknown user-supplied tags stay respected. - Filter the installed-model selection in getDefaultOllamaModel so a previously-pulled large model is not blindly returned on a host that can no longer fit it. - Narrow the macOS scope in the GpuDetection comment: the platform still only reports total memory (no vm_stat probe yet); the registry test is explicit that apple silicon only matches when availableMemoryMB is supplied by the caller. - Update the user-facing docs to describe the new available-memory driven downgrade rather than the old 32 GiB total threshold. - Drop the lingering issue-number references from new code comments. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-23T16:09:22Z

🌿 Preview your docs: https://nvidia-preview-pr-4132.docs.buildwithfern.com/nemoclaw

github-actions · 2026-05-23T16:11:16Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26337459706
Target ref: d45a6044cb44585e8386ea4320bcb2e38971561e
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

Address second round of review feedback on the bootstrap-model selector: - promptOllamaModel filters its installed-model list through modelFitsAvailableMemory before computing the default index. Without this, a host with only an oversized model installed would surface the registry fallback default at index 0, so pressing Enter would re- select the model the runner is about to crash on. - Warn explicitly when nothing in the registry fits available memory via anyRegistryModelFits; both the interactive menu and the non- interactive resolver now log a "free memory or expect the runner to reject the load" line before returning the smallest fallback. - nim.test.ts now asserts availableMemoryMB on the GB10 / Spark / Orin / Tegra unified-memory paths and adds a parse-failure case where memory.free returns `[N/A]` but memory.total still parses. - Centralise the role aliases: local.ts now derives SMALL_OLLAMA_MODEL from SMALLEST_OLLAMA_MODEL_TAG and asserts that DEFAULT_OLLAMA_MODEL / QWEN3_6_OLLAMA_MODEL still resolve to live registry entries, so a registry edit fails module load instead of silently desyncing. - Add a macOS vm_stat probe so apple silicon hosts also populate availableMemoryMB and the registry filter is no longer a one-way claim for that platform. - Drop the lingering #3510 reference in a new comment; update model-size.ts wording from "re-exported" to "aliased locally". Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-23T16:38:23Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26338017309
Target ref: 0f7c55578f48cc365af9d2be82924839169ccdfe
Workflow ref: main
Requested jobs: gpu-e2e,gpu-double-onboard-e2e
Summary: 0 passed, 0 failed, 2 skipped

Job	Result
gpu-double-onboard-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped

coderabbitai

🧹 Nitpick comments (1)

docs/inference/use-local-inference.mdx (1)

47-47: ⚡ Quick win

Split into two sentences, one per line.

The guideline requires one sentence per line for diff readability. This line contains two independent clauses separated by a semicolon that should be on separate lines.

📝 Proposed formatting fix

-On hosts where the larger starter models fit the currently available GPU memory, the starter list includes `qwen3.6:35b` and selects it by default; when another GPU workload is using most of the memory at onboard time, NemoClaw downgrades the menu to the largest model that still fits.
+On hosts where the larger starter models fit the currently available GPU memory, the starter list includes `qwen3.6:35b` and selects it by default.
+When another GPU workload is using most of the memory at onboard time, NemoClaw downgrades the menu to the largest model that still fits.

As per coding guidelines: "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/use-local-inference.mdx` at line 47, Split the single line
that contains two independent clauses into two separate sentences, each on its
own line: change the line containing "On hosts where the larger starter models
fit the currently available GPU memory, the starter list includes `qwen3.6:35b`
and selects it by default; when another GPU workload is using most of the memory
at onboard time, NemoClaw downgrades the menu to the largest model that still
fits." into two lines such as "On hosts where the larger starter models fit the
currently available GPU memory, the starter list includes `qwen3.6:35b` and
selects it by default." and "When another GPU workload is using most of the
memory at onboard time, NemoClaw downgrades the menu to the largest model that
still fits." ensuring each sentence occupies its own line for diff readability.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/inference/use-local-inference.mdx`:
- Line 47: Split the single line that contains two independent clauses into two
separate sentences, each on its own line: change the line containing "On hosts
where the larger starter models fit the currently available GPU memory, the
starter list includes `qwen3.6:35b` and selects it by default; when another GPU
workload is using most of the memory at onboard time, NemoClaw downgrades the
menu to the largest model that still fits." into two lines such as "On hosts
where the larger starter models fit the currently available GPU memory, the
starter list includes `qwen3.6:35b` and selects it by default." and "When
another GPU workload is using most of the memory at onboard time, NemoClaw
downgrades the menu to the largest model that still fits." ensuring each
sentence occupies its own line for diff readability.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2c318d35-7d7b-4ed8-bc1b-0804e3e6e8de

📥 Commits

Reviewing files that changed from the base of the PR and between 0d24286 and 0f7c555.

📒 Files selected for processing (10)

docs/inference/use-local-inference.mdx
src/lib/inference/local.test.ts
src/lib/inference/local.ts
src/lib/inference/nim.test.ts
src/lib/inference/nim.ts
src/lib/inference/ollama-model-registry.test.ts
src/lib/inference/ollama-model-registry.ts
src/lib/inference/ollama/model-size.ts
src/lib/inference/ollama/proxy.ts
src/lib/onboard.ts

…probe Address third round of review feedback: - resolveNonInteractiveOllamaModel now surfaces the no-fit warning on the explicit-oversize path too: when NEMOCLAW_MODEL names a known oversized tag and the fallback also exceeds available memory, the user sees both the "falling back to qwen2.5:7b" line and the "no known model fits" line so the second probe failure is not surprising. Add a regression test exercising the <8 GB free case. - New src/lib/inference/ollama/proxy.test.ts exercises the interactive menu installed-model fit filter: an installed-only oversized tag downgrades to a fitting starter, a fitting installed tag stays as the default, and an unknown tag is respected. - nim.test.ts adds macOS coverage: a Darwin mock returning system_profiler + sysctl + vm_stat → expects availableMemoryMB, plus a vm_stat parse-failure case that drops the field cleanly. - docs/inference/use-local-inference.mdx now notes that known oversized NEMOCLAW_MODEL tags are downgraded with a warning while unknown tags pass through to the Ollama runner's own validation, and splits the L47 semicolon-joined sentence into two. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…re-model-4113

github-actions · 2026-05-23T17:15:46Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26338780965
Target ref: a1f500476f26fe4424e3e17aed6e6f1179d56f42
Workflow ref: main
Requested jobs: gpu-e2e,gpu-double-onboard-e2e
Summary: 0 passed, 0 failed, 2 skipped

Job	Result
gpu-double-onboard-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped

…hold export - promptOllamaModel now declares its parameter as `GpuInfo | null` via a type-only import, so the emitted .d.ts no longer pins the type to the default-value `null`. proxy.test.ts callers (and any other typed consumer) can pass real GpuInfo shapes without tsc complaints. - LARGE_OLLAMA_MIN_MEMORY_MB was only kept around to make existing tests look symmetric after the registry refactor took over the selector. Drop the export and have local.test.ts derive the "large enough to fit everything" memory threshold from the live registry, so a future model change does not silently desync the test fixture. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-23T17:31:34Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 26339106774
Target ref: d7688131f358be43eaae025565469f05490226f4
Workflow ref: main
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

Comment thread src/lib/inference/nim.ts

laitingsheng added NemoClaw CLI and removed NemoClaw CLI labels May 23, 2026

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

laitingsheng added 2 commits May 23, 2026 17:12

Merge remote-tracking branch 'origin/main' into fix/ollama-memory-awa…

a1f5004

…re-model-4113

laitingsheng added the v0.0.51 Release target label May 23, 2026

cv enabled auto-merge (squash) May 23, 2026 19:59

cv approved these changes May 23, 2026

View reviewed changes

cv merged commit d178b79 into main May 23, 2026
40 of 41 checks passed

coderabbitai Bot mentioned this pull request May 25, 2026

fix(onboard): offer Ollama upgrade when host version too old #4186

Merged

12 tasks

jyaunches mentioned this pull request May 26, 2026

test(e2e): migrate platform and remote coverage to scenario suites #3816

Closed

coderabbitai Bot mentioned this pull request May 26, 2026

fix(inference): auto-detect Ollama context window during onboard #4253

Merged

12 tasks

wscurran added the area: inference Inference routing, serving, model selection, or outputs label Jun 3, 2026

wscurran added bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed fix labels Jun 3, 2026

coderabbitai Bot mentioned this pull request Jun 5, 2026

fix(inference): tighten Ollama bootstrap fit and raise runtime context floor #4852

Merged

12 tasks

wscurran removed the feature PR adds or expands user-visible functionality label Jun 9, 2026

Conversation

laitingsheng commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 23, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented May 23, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 23, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

github-actions Bot commented May 23, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

laitingsheng commented May 23, 2026 •

edited

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading

github-actions Bot commented May 23, 2026 •

edited

Loading

github-actions Bot commented May 23, 2026 •

edited

Loading

github-actions Bot commented May 23, 2026 •

edited

Loading