feat(inference): replace qwen2.5:7b with qwen3.5:9b as the default Ollama starter model by zyang-dev · Pull Request #4776 · NVIDIA/NemoClaw

zyang-dev · 2026-06-04T17:18:44Z

Summary

Updates the local Ollama starter model from qwen2.5:7b to qwen3.5:9b so new local-inference onboarding uses a newer tool-capable Qwen model.

Changes

Replaced the smallest Ollama bootstrap registry entry with qwen3.5:9b.
Updated memory and download-size metadata for the new starter model.
Updated onboarding, registry, proxy, model-size, config, inventory, and e2e test fixtures that assert starter-model behavior.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

Tests
- Updated test expectations and mock data across inference, onboarding, and E2E test suites to reflect the updated default Ollama model configuration.
Chores
- Updated Ollama model registry and associated default model references throughout the codebase.

…lama starter model Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai · 2026-06-04T17:18:56Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 07ed9c2b-cfc9-43d2-9dff-ec06d6c40ce9

📥 Commits

Reviewing files that changed from the base of the PR and between 5c82cd9 and 67d896f.

📒 Files selected for processing (16)

src/lib/inference/local.test.ts
src/lib/inference/ollama-model-registry.test.ts
src/lib/inference/ollama-model-registry.ts
src/lib/inference/ollama/model-size.test.ts
src/lib/inference/ollama/proxy.test.ts
src/lib/inventory/index.test.ts
src/lib/onboard/dockerfile-patch.test.ts
src/lib/onboard/ollama-probe-failure.test.ts
test/e2e/brev-e2e.test.ts
test/generate-openclaw-config.test.ts
test/get-ollama-model-options.test.ts
test/ollama-pull-timeout.test.ts
test/ollama-tools-capability.test.ts
test/onboard-ollama-autostart.test.ts
test/onboard-selection.test.ts
test/onboard.test.ts

📝 Walkthrough

Walkthrough

This PR updates the Ollama bootstrap model registry from qwen2.5:7b to qwen3.5:9b, including the model's memory requirements and download size. The production registry change propagates through derived constants and requires test adjustments across inference, onboarding, and E2E suites to reflect the new model identifier.

Changes

Model Registry and Integration Tests

Layer / File(s)	Summary
Bootstrap model registry and derived constants `src/lib/inference/ollama-model-registry.ts`, `src/lib/inference/ollama-model-registry.test.ts`	Updated `OLLAMA_MODEL_REGISTRY` final entry from `qwen2.5:7b` to `qwen3.5:9b` with revised `requiredMemoryMB` and `downloadSizeBytes`. The exported constants `SMALLEST_OLLAMA_MODEL_TAG` and `OLLAMA_DOWNLOAD_SIZE_FALLBACK_BYTES` are automatically recalculated with the new model.
Model sizing and registry probe tests `src/lib/inference/ollama/model-size.test.ts`	Test expectations for `probeRegistrySize` and `getOllamaModelSize` updated to use `qwen3.5:9b`, verifying manifest URL resolution, curl invocations, JSON layer parsing, and fallback table lookups with the new model.
Local inference model selection and resolution tests `src/lib/inference/local.test.ts`	Comprehensive test suite covering `/api/tags` parsing, model option filtering, bootstrap menu generation, memory-constrained downgrades, installed model fitting, non-interactive model resolution, Apple Silicon offerings, and probe/warmup command generation—all adjusted to expect `qwen3.5:9b` as the selected or downgraded model.
Installed model proxy and fitting tests `src/lib/inference/ollama/proxy.test.ts`	Tests for `promptOllamaModel` installed-model filtering and memory-constrained selection updated to expect `qwen3.5:9b` when available memory requires downgrade.
Onboarding and feature integration tests `src/lib/inventory/index.test.ts`, `src/lib/onboard/dockerfile-patch.test.ts`, `src/lib/onboard/ollama-probe-failure.test.ts`, `test/onboard-selection.test.ts`, `test/onboard.test.ts`	Downstream test suites across inventory, dockerfile GPU networking, probe failure handling, and onboarding flows updated to pass `qwen3.5:9b` as the model argument and expect it in logs, configuration, and state assertions.
E2E and configuration tests `test/e2e/brev-e2e.test.ts`, `test/generate-openclaw-config.test.ts`, `test/get-ollama-model-options.test.ts`, `test/ollama-pull-timeout.test.ts`, `test/ollama-tools-capability.test.ts`, `test/onboard-ollama-autostart.test.ts`	E2E remote-execution model, config generation with `supportsUsageInStreaming`, mock `/api/tags` responses, pull timeout precision, tools capability probes, and autostart harness tests updated to use `qwen3.5:9b`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

NVIDIA/NemoClaw#3683: The test updates in test/generate-openclaw-config.test.ts adjust the model identifiers used in existing supportsUsageInStreaming assertions for ollama/ollama-local, which relates to PR #3683's ollama-local supportsUsageInStreaming compat behavior.

Suggested labels

Local Models, feature, area: inference, area: local-models, area: providers, v0.0.58

Suggested reviewers

cv
ericksoa

Poem

🐰 A nimble model trade, so neat and clean,
From seven-point-five to nine-point-three we've seen,
Qwen's newer sibling takes the smallest role,
Tests across the realm now reach their goal,
One registry change ripples through them all! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically summarizes the main change: replacing qwen2.5:7b with qwen3.5:9b as the default Ollama starter model, which aligns with the file summaries.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/update-ollama-starter-qwen35-9b

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-04T17:21:31Z

E2E Advisor Recommendation

Required E2E: gpu-e2e
Optional E2E: gpu-double-onboard-e2e, ollama-proxy-e2e, strict-tool-call-probe-e2e

Dispatch hint: gpu-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

gpu-e2e (high; requires linux-amd64 GPU runner and pulls an Ollama model): Required because the source change affects local Ollama onboarding and inference. This is the existing live GPU/Ollama E2E that installs Ollama, runs NemoClaw onboarding with NEMOCLAW_PROVIDER=ollama, pulls a model, creates a sandbox, and verifies Ollama-backed inference through OpenShell/OpenClaw.

Optional E2E

gpu-double-onboard-e2e (high; requires linux-amd64 GPU runner and repeated Ollama onboarding): Useful adjacent confidence for Ollama re-onboarding and auth-proxy token consistency after changing the bootstrap model registry, but less directly tied to the model metadata change than the single onboarding GPU proof.
ollama-proxy-e2e (medium; installs Ollama and pulls a small CPU model): Optional host-side validation for real Ollama plus auth proxy reachability/auth/inference. It does not exercise the changed bootstrap registry directly, but it covers adjacent local Ollama proxy behavior touched by nearby tests.
strict-tool-call-probe-e2e (low to medium; hermetic mock, no GPU/Ollama infrastructure): Optional hermetic regression for local Ollama model validation/tool-call probe behavior. The PR changes Ollama capability/model-selection tests, but the live registry tag change is better covered by gpu-e2e.

New E2E recommendations

local-ollama-bootstrap-model (high): Existing GPU E2Es typically run on high-VRAM GPUs and may select a larger registry model, so they might not prove that the new smallest fallback qwen3.5:9b is actually pulled, routed, and usable end-to-end.
- Suggested test: Add a targeted Ollama GPU E2E variant or scenario that pins NEMOCLAW_MODEL=qwen3.5:9b (or constrains detected available memory) and verifies install/onboard, model pull, Dockerfile/OpenClaw config, and sandbox inference.
e2e-gpu-model-defaults (medium): The Brev/GPU E2E harness default changes to qwen3.5:9b, but workflow-level env defaults can still pin older model names. Coverage should catch stale E2E model pins when the registry smallest tag changes.
- Suggested test: Add a lightweight CI assertion that E2E workflow defaults for NEMOCLAW_GPU_E2E_MODEL match SMALLEST_OLLAMA_MODEL_TAG or intentionally override it with an explicit comment.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: gpu-e2e

github-actions · 2026-06-04T17:21:32Z

E2E Scenario Advisor Recommendation

Required scenario E2E: gpu-repo-local-ollama-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

gpu-repo-local-ollama-openclaw: The PR changes the Ollama bootstrap model registry, including the smallest/default local Ollama model metadata. The only dispatchable scenario that exercises local Ollama onboarding and inference is gpu-repo-local-ollama-openclaw.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

None.

Relevant changed files

src/lib/inference/ollama-model-registry.ts

github-actions · 2026-06-04T17:22:40Z

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Top item: Update the GPU E2E default to qwen3.5:9b

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: Ollama starter model registry metadata: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `src/lib/inference/ollama-model-registry.ts` sets `qwen3.5:9b` with `requiredMemoryMB: 12_000` and `downloadSizeBytes: 6_600_000_000`; tests mostly assert table mirroring or mocked capabilities.
Source-of-truth review needed: Ollama starter model defaults across runtime and workflow validation: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `.github/workflows/e2e-branch-validation.yaml:266` still uses `'qwen2.5:7b'`, while `test/e2e/brev-e2e.test.ts` now falls back to `'qwen3.5:9b'` only when `NEMOCLAW_GPU_E2E_MODEL` is unset.
Branch GPU E2E still defaults to the old Ollama starter (.github/workflows/e2e-branch-validation.yaml:266): The PR changes `test/e2e/brev-e2e.test.ts` so its local fallback is `qwen3.5:9b`, but the branch-validation workflow exports `NEMOCLAW_GPU_E2E_MODEL` with a fallback of `qwen2.5:7b`. Because the test only uses its internal fallback when that environment variable is absent, this workflow path will continue exercising the old model by default. That leaves the main runtime/sandbox validation path out of sync with the PR's stated replacement of the starter model.
- Recommendation: Update the workflow fallback to `qwen3.5:9b`, or remove the workflow override so the test's fallback is used. If the old value is intentionally retained for capacity reasons, document that exception and add separate validation for the new default.
- Evidence: `test/e2e/brev-e2e.test.ts` now uses `process.env.NEMOCLAW_GPU_E2E_MODEL || "qwen3.5:9b"`, while `.github/workflows/e2e-branch-validation.yaml:266` still sets `NEMOCLAW_GPU_E2E_MODEL: ${{ vars.NEMOCLAW_GPU_E2E_MODEL || 'qwen2.5:7b' }}`.
New default model metadata is manually trusted (src/lib/inference/ollama-model-registry.ts:37): The new starter model's tag, 12 GB fit threshold, 6.6 GB fallback download size, and tool-capable behavior are now central to onboarding decisions, but the changed tests mostly assert that code paths mirror the registry table or use mocked capability responses. An incorrect tag or size would still pass these unit fixtures and surface only during runtime pull/probe or user onboarding.
- Recommendation: Add or identify an independent validation for the new registry entry: verify that `qwen3.5:9b` exists at the expected Ollama manifest URL, that the fallback size is close enough to the actual download size, that it supports tools, and that the 12 GB boundary is intentional. A targeted boundary unit test for 11,999 MB versus 12,000 MB would also make the changed fit threshold explicit.
- Evidence: `OLLAMA_MODEL_REGISTRY` now contains `{ tag: "qwen3.5:9b", requiredMemoryMB: 12_000, downloadSizeBytes: 6_600_000_000 }`. The registry tests assert derived behavior, and `test/ollama-tools-capability.test.ts` mocks `/api/show` as returning `capabilities: ["completion", "tools"]` rather than proving the real model capability.

🌱 Nice ideas

None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

## Summary - Add the v0.0.59 release notes from the GitHub announcement discussion. - Refresh local inference and credential-storage guidance for the current release behavior. - Regenerate the user skills from the updated Fern docs. - Tighten release-prep and docs review guidance for generated skills, PR labels, and shared `$$nemoclaw` command placeholders. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" --glob '*.{md,mdx}'` - `git diff --check` - `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC permission failure) - `npm run typecheck:cli` - Pre-commit hooks during commit passed, including markdownlint, docs-to-skills verification, gitleaks, commitlint, and skills YAML tests. ## Source Summary - #3679, #4437, #4681, #4766, #4772, #4775, #4786 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`, `docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27 compatibility, runtime path pinning, plugin registry recovery, live gateway reconciliation, and clearer host-alias/startup diagnostics. - #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`, `docs/inference/inference-options.mdx`, `docs/inference/use-local-inference.mdx`, `docs/inference/switch-inference-providers.mdx`: Document the release inference changes covering Local NIM waits, Hermes Anthropic routing, Nemotron 3 Ultra, the current Ollama starter fallback, and Spark managed-vLLM context length. - #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`, `docs/security/credential-storage.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/troubleshooting.mdx`: Capture permission healing, gateway-stored credential reuse, cross-sandbox messaging credential conflict checks, and CDI preflight diagnostics. - #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`: Regenerate the user skill references from the updated source docs. - Follow-up maintenance -> `.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`, `.coderabbit.yaml`: Add release-prep area labels for docs and skills PRs, and teach docs review guidance that `$$nemoclaw` is the correct shared command placeholder for examples that work across agent aliases. Note: the `documentation` label was not present in the repository, so this PR is labeled with `v0.0.59` only.  ## Summary by CodeRabbit * **Documentation** * Updated default model for local Ollama inference setup to qwen3.5:9b * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option * Clarified credential storage and reuse behavior for post-deployment (day-two) operations * Added v0.0.59 release notes covering OpenClaw compatibility, inference options, Hermes messaging sync, and troubleshooting * Clarified CLI selection guidance and updated OpenClaw version example in status output * Revised release-prep instructions and docs review guidance for CLI alias usage

feat(inference): replace qwen2.5:7b with qwen3.5:9b as the default Ol…

67d896f

…lama starter model Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

zyang-dev added the v0.0.59 Release target label Jun 4, 2026

cv approved these changes Jun 4, 2026

View reviewed changes

cv merged commit 8a025a4 into main Jun 4, 2026
33 checks passed

cv deleted the feat/update-ollama-starter-qwen35-9b branch June 4, 2026 20:49

miyoungc mentioned this pull request Jun 5, 2026

docs: refresh 0.0.59 release notes #4790

Merged

coderabbitai Bot mentioned this pull request Jun 5, 2026

fix(inference): tighten Ollama bootstrap fit and raise runtime context floor #4852

Merged

12 tasks

wscurran added area: inference Inference routing, serving, model selection, or outputs area: local-models Local model providers, downloads, launch, or connectivity feature PR adds or expands user-visible functionality labels Jun 5, 2026

Conversation

zyang-dev commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 4, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 4, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 4, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zyang-dev commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading