Skip to content

feat(inference): replace qwen2.5:7b with qwen3.5:9b as the default Ollama starter model#4776

Merged
cv merged 1 commit into
mainfrom
feat/update-ollama-starter-qwen35-9b
Jun 4, 2026
Merged

feat(inference): replace qwen2.5:7b with qwen3.5:9b as the default Ollama starter model#4776
cv merged 1 commit into
mainfrom
feat/update-ollama-starter-qwen35-9b

Conversation

@zyang-dev

@zyang-dev zyang-dev commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Updates the local Ollama starter model from qwen2.5:7b to qwen3.5:9b so new local-inference onboarding uses a newer tool-capable Qwen model.

Changes

  • Replaced the smallest Ollama bootstrap registry entry with qwen3.5:9b.
  • Updated memory and download-size metadata for the new starter model.
  • Updated onboarding, registry, proxy, model-size, config, inventory, and e2e test fixtures that assert starter-model behavior.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

  • Tests

    • Updated test expectations and mock data across inference, onboarding, and E2E test suites to reflect the updated default Ollama model configuration.
  • Chores

    • Updated Ollama model registry and associated default model references throughout the codebase.

…lama starter model

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 07ed9c2b-cfc9-43d2-9dff-ec06d6c40ce9

📥 Commits

Reviewing files that changed from the base of the PR and between 5c82cd9 and 67d896f.

📒 Files selected for processing (16)
  • src/lib/inference/local.test.ts
  • src/lib/inference/ollama-model-registry.test.ts
  • src/lib/inference/ollama-model-registry.ts
  • src/lib/inference/ollama/model-size.test.ts
  • src/lib/inference/ollama/proxy.test.ts
  • src/lib/inventory/index.test.ts
  • src/lib/onboard/dockerfile-patch.test.ts
  • src/lib/onboard/ollama-probe-failure.test.ts
  • test/e2e/brev-e2e.test.ts
  • test/generate-openclaw-config.test.ts
  • test/get-ollama-model-options.test.ts
  • test/ollama-pull-timeout.test.ts
  • test/ollama-tools-capability.test.ts
  • test/onboard-ollama-autostart.test.ts
  • test/onboard-selection.test.ts
  • test/onboard.test.ts

📝 Walkthrough

Walkthrough

This PR updates the Ollama bootstrap model registry from qwen2.5:7b to qwen3.5:9b, including the model's memory requirements and download size. The production registry change propagates through derived constants and requires test adjustments across inference, onboarding, and E2E suites to reflect the new model identifier.

Changes

Model Registry and Integration Tests

Layer / File(s) Summary
Bootstrap model registry and derived constants
src/lib/inference/ollama-model-registry.ts, src/lib/inference/ollama-model-registry.test.ts
Updated OLLAMA_MODEL_REGISTRY final entry from qwen2.5:7b to qwen3.5:9b with revised requiredMemoryMB and downloadSizeBytes. The exported constants SMALLEST_OLLAMA_MODEL_TAG and OLLAMA_DOWNLOAD_SIZE_FALLBACK_BYTES are automatically recalculated with the new model.
Model sizing and registry probe tests
src/lib/inference/ollama/model-size.test.ts
Test expectations for probeRegistrySize and getOllamaModelSize updated to use qwen3.5:9b, verifying manifest URL resolution, curl invocations, JSON layer parsing, and fallback table lookups with the new model.
Local inference model selection and resolution tests
src/lib/inference/local.test.ts
Comprehensive test suite covering /api/tags parsing, model option filtering, bootstrap menu generation, memory-constrained downgrades, installed model fitting, non-interactive model resolution, Apple Silicon offerings, and probe/warmup command generation—all adjusted to expect qwen3.5:9b as the selected or downgraded model.
Installed model proxy and fitting tests
src/lib/inference/ollama/proxy.test.ts
Tests for promptOllamaModel installed-model filtering and memory-constrained selection updated to expect qwen3.5:9b when available memory requires downgrade.
Onboarding and feature integration tests
src/lib/inventory/index.test.ts, src/lib/onboard/dockerfile-patch.test.ts, src/lib/onboard/ollama-probe-failure.test.ts, test/onboard-selection.test.ts, test/onboard.test.ts
Downstream test suites across inventory, dockerfile GPU networking, probe failure handling, and onboarding flows updated to pass qwen3.5:9b as the model argument and expect it in logs, configuration, and state assertions.
E2E and configuration tests
test/e2e/brev-e2e.test.ts, test/generate-openclaw-config.test.ts, test/get-ollama-model-options.test.ts, test/ollama-pull-timeout.test.ts, test/ollama-tools-capability.test.ts, test/onboard-ollama-autostart.test.ts
E2E remote-execution model, config generation with supportsUsageInStreaming, mock /api/tags responses, pull timeout precision, tools capability probes, and autostart harness tests updated to use qwen3.5:9b.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#3683: The test updates in test/generate-openclaw-config.test.ts adjust the model identifiers used in existing supportsUsageInStreaming assertions for ollama/ollama-local, which relates to PR #3683's ollama-local supportsUsageInStreaming compat behavior.

Suggested labels

Local Models, feature, area: inference, area: local-models, area: providers, v0.0.58

Suggested reviewers

  • cv
  • ericksoa

Poem

🐰 A nimble model trade, so neat and clean,
From seven-point-five to nine-point-three we've seen,
Qwen's newer sibling takes the smallest role,
Tests across the realm now reach their goal,
One registry change ripples through them all! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically summarizes the main change: replacing qwen2.5:7b with qwen3.5:9b as the default Ollama starter model, which aligns with the file summaries.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/update-ollama-starter-qwen35-9b

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: gpu-e2e
Optional E2E: gpu-double-onboard-e2e, ollama-proxy-e2e, strict-tool-call-probe-e2e

Dispatch hint: gpu-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • gpu-e2e (high; requires linux-amd64 GPU runner and pulls an Ollama model): Required because the source change affects local Ollama onboarding and inference. This is the existing live GPU/Ollama E2E that installs Ollama, runs NemoClaw onboarding with NEMOCLAW_PROVIDER=ollama, pulls a model, creates a sandbox, and verifies Ollama-backed inference through OpenShell/OpenClaw.

Optional E2E

  • gpu-double-onboard-e2e (high; requires linux-amd64 GPU runner and repeated Ollama onboarding): Useful adjacent confidence for Ollama re-onboarding and auth-proxy token consistency after changing the bootstrap model registry, but less directly tied to the model metadata change than the single onboarding GPU proof.
  • ollama-proxy-e2e (medium; installs Ollama and pulls a small CPU model): Optional host-side validation for real Ollama plus auth proxy reachability/auth/inference. It does not exercise the changed bootstrap registry directly, but it covers adjacent local Ollama proxy behavior touched by nearby tests.
  • strict-tool-call-probe-e2e (low to medium; hermetic mock, no GPU/Ollama infrastructure): Optional hermetic regression for local Ollama model validation/tool-call probe behavior. The PR changes Ollama capability/model-selection tests, but the live registry tag change is better covered by gpu-e2e.

New E2E recommendations

  • local-ollama-bootstrap-model (high): Existing GPU E2Es typically run on high-VRAM GPUs and may select a larger registry model, so they might not prove that the new smallest fallback qwen3.5:9b is actually pulled, routed, and usable end-to-end.
    • Suggested test: Add a targeted Ollama GPU E2E variant or scenario that pins NEMOCLAW_MODEL=qwen3.5:9b (or constrains detected available memory) and verifies install/onboard, model pull, Dockerfile/OpenClaw config, and sandbox inference.
  • e2e-gpu-model-defaults (medium): The Brev/GPU E2E harness default changes to qwen3.5:9b, but workflow-level env defaults can still pin older model names. Coverage should catch stale E2E model pins when the registry smallest tag changes.
    • Suggested test: Add a lightweight CI assertion that E2E workflow defaults for NEMOCLAW_GPU_E2E_MODEL match SMALLEST_OLLAMA_MODEL_TAG or intentionally override it with an explicit comment.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: gpu-e2e

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: gpu-repo-local-ollama-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • gpu-repo-local-ollama-openclaw: The PR changes the Ollama bootstrap model registry, including the smallest/default local Ollama model metadata. The only dispatchable scenario that exercises local Ollama onboarding and inference is gpu-repo-local-ollama-openclaw.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

  • None.

Relevant changed files

  • src/lib/inference/ollama-model-registry.ts

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Top item: Update the GPU E2E default to qwen3.5:9b

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Ollama starter model registry metadata: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `src/lib/inference/ollama-model-registry.ts` sets `qwen3.5:9b` with `requiredMemoryMB: 12_000` and `downloadSizeBytes: 6_600_000_000`; tests mostly assert table mirroring or mocked capabilities.
  • Source-of-truth review needed: Ollama starter model defaults across runtime and workflow validation: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `.github/workflows/e2e-branch-validation.yaml:266` still uses `'qwen2.5:7b'`, while `test/e2e/brev-e2e.test.ts` now falls back to `'qwen3.5:9b'` only when `NEMOCLAW_GPU_E2E_MODEL` is unset.
  • Branch GPU E2E still defaults to the old Ollama starter (.github/workflows/e2e-branch-validation.yaml:266): The PR changes `test/e2e/brev-e2e.test.ts` so its local fallback is `qwen3.5:9b`, but the branch-validation workflow exports `NEMOCLAW_GPU_E2E_MODEL` with a fallback of `qwen2.5:7b`. Because the test only uses its internal fallback when that environment variable is absent, this workflow path will continue exercising the old model by default. That leaves the main runtime/sandbox validation path out of sync with the PR's stated replacement of the starter model.
    • Recommendation: Update the workflow fallback to `qwen3.5:9b`, or remove the workflow override so the test's fallback is used. If the old value is intentionally retained for capacity reasons, document that exception and add separate validation for the new default.
    • Evidence: `test/e2e/brev-e2e.test.ts` now uses `process.env.NEMOCLAW_GPU_E2E_MODEL || "qwen3.5:9b"`, while `.github/workflows/e2e-branch-validation.yaml:266` still sets `NEMOCLAW_GPU_E2E_MODEL: ${{ vars.NEMOCLAW_GPU_E2E_MODEL || 'qwen2.5:7b' }}`.
  • New default model metadata is manually trusted (src/lib/inference/ollama-model-registry.ts:37): The new starter model's tag, 12 GB fit threshold, 6.6 GB fallback download size, and tool-capable behavior are now central to onboarding decisions, but the changed tests mostly assert that code paths mirror the registry table or use mocked capability responses. An incorrect tag or size would still pass these unit fixtures and surface only during runtime pull/probe or user onboarding.
    • Recommendation: Add or identify an independent validation for the new registry entry: verify that `qwen3.5:9b` exists at the expected Ollama manifest URL, that the fallback size is close enough to the actual download size, that it supports tools, and that the 12 GB boundary is intentional. A targeted boundary unit test for 11,999 MB versus 12,000 MB would also make the changed fit threshold explicit.
    • Evidence: `OLLAMA_MODEL_REGISTRY` now contains `{ tag: "qwen3.5:9b", requiredMemoryMB: 12_000, downloadSizeBytes: 6_600_000_000 }`. The registry tests assert derived behavior, and `test/ollama-tools-capability.test.ts` mocks `/api/show` as returning `capabilities: ["completion", "tools"]` rather than proving the real model capability.

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@zyang-dev zyang-dev added the v0.0.59 Release target label Jun 4, 2026
@cv cv merged commit 8a025a4 into main Jun 4, 2026
33 checks passed
@cv cv deleted the feat/update-ollama-starter-qwen35-9b branch June 4, 2026 20:49
cv pushed a commit that referenced this pull request Jun 5, 2026
## Summary
- Add the v0.0.59 release notes from the GitHub announcement discussion.
- Refresh local inference and credential-storage guidance for the
current release behavior.
- Regenerate the user skills from the updated Fern docs.
- Tighten release-prep and docs review guidance for generated skills, PR
labels, and shared `$$nemoclaw` command placeholders.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" --glob '*.{md,mdx}'`
- `git diff --check`
- `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC
permission failure)
- `npm run typecheck:cli`
- Pre-commit hooks during commit passed, including markdownlint,
docs-to-skills verification, gitleaks, commitlint, and skills YAML
tests.

## Source Summary
- #3679, #4437, #4681, #4766, #4772, #4775, #4786 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`,
`docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27
compatibility, runtime path pinning, plugin registry recovery, live
gateway reconciliation, and clearer host-alias/startup diagnostics.
- #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`,
`docs/inference/inference-options.mdx`,
`docs/inference/use-local-inference.mdx`,
`docs/inference/switch-inference-providers.mdx`: Document the release
inference changes covering Local NIM waits, Hermes Anthropic routing,
Nemotron 3 Ultra, the current Ollama starter fallback, and Spark
managed-vLLM context length.
- #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`,
`docs/security/credential-storage.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`: Capture permission healing,
gateway-stored credential reuse, cross-sandbox messaging credential
conflict checks, and CDI preflight diagnostics.
- #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`:
Regenerate the user skill references from the updated source docs.
- Follow-up maintenance ->
`.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`,
`.coderabbit.yaml`: Add release-prep area labels for docs and skills
PRs, and teach docs review guidance that `$$nemoclaw` is the correct
shared command placeholder for examples that work across agent aliases.

Note: the `documentation` label was not present in the repository, so
this PR is labeled with `v0.0.59` only.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
  * Updated default model for local Ollama inference setup to qwen3.5:9b
  * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option
* Clarified credential storage and reuse behavior for post-deployment
(day-two) operations
* Added v0.0.59 release notes covering OpenClaw compatibility, inference
options, Hermes messaging sync, and troubleshooting
* Clarified CLI selection guidance and updated OpenClaw version example
in status output
* Revised release-prep instructions and docs review guidance for CLI
alias usage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added area: inference Inference routing, serving, model selection, or outputs area: local-models Local model providers, downloads, launch, or connectivity feature PR adds or expands user-visible functionality labels Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: inference Inference routing, serving, model selection, or outputs area: local-models Local model providers, downloads, launch, or connectivity feature PR adds or expands user-visible functionality v0.0.59 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants