Skip to content

test(agents): include Ollama in small live model matrix#87838

Merged
steipete merged 6 commits into
mainfrom
codex/small-live-ollama-matrix
Jun 1, 2026
Merged

test(agents): include Ollama in small live model matrix#87838
steipete merged 6 commits into
mainfrom
codex/small-live-ollama-matrix

Conversation

@vincentkoc

@vincentkoc vincentkoc commented May 29, 2026

Copy link
Copy Markdown
Member

Summary

  • Add Ollama Gemma to the curated small live model matrix.
  • Make live model discovery auto-enable Ollama provider/plugin config, default local Ollama without leaking OLLAMA_API_KEY, and use remote credentials only for remote/Ollama Cloud endpoints.
  • Register the Ollama runtime stream in-process for direct live model probes and document local/remote Ollama selection.

Verification

  • git diff --check origin/main...HEAD - passed on head 14c0ace5b76.
  • node_modules/.bin/oxfmt --check --threads=1 docs/help/testing-live.md src/agents/live-model-filter.ts src/agents/model-compat.test.ts src/agents/models.profiles.live.test.ts - passed.
  • node scripts/run-oxlint.mjs src/agents/live-model-filter.ts src/agents/model-compat.test.ts src/agents/models.profiles.live.test.ts - passed.
  • node scripts/check-docs-mdx.mjs docs/help/testing-live.md - passed.
  • node scripts/run-vitest.mjs src/agents/model-compat.test.ts src/agents/models.profiles.live.test.ts --reporter=dot - passed; the non-live agent config picked up model-compat.test.ts, 57 tests.
  • node scripts/test-live.mjs --quiet -- src/agents/models.profiles.live.test.ts --reporter=dot - passed, 33 tests; provider sweep intentionally skipped without OPENCLAW_LIVE_MODELS.
  • OPENCLAW_LIVE_MODELS=small OPENCLAW_LIVE_PROVIDERS=ollama OPENCLAW_LIVE_OLLAMA_BASE_URL=https://ollama.com OPENCLAW_LIVE_MAX_MODELS=1 OPENCLAW_LIVE_MODEL_CONCURRENCY=1 OPENCLAW_LIVE_MODEL_TIMEOUT_MS=45000 OPENCLAW_LIVE_TEST_TIMEOUT_MS=120000 node scripts/test-live.mjs --quiet -- src/agents/models.profiles.live.test.ts --reporter=dot - passed, 33 tests; ollama/gemma3:4b completed prompt, file-read, and image probes.
  • Fresh 2026-05-31 repeat of the Ollama Cloud small live matrix command above - passed, 33 tests in 46.37s; ollama/gemma3:4b completed prompt, file-read, and image probes again.
  • Fresh 2026-05-31 local rerun of the gateway files left after a transient CI hang: node scripts/run-vitest.mjs src/gateway/server-runtime-state.test.ts src/gateway/server-startup-session-migration.test.ts src/gateway/server-startup-web-fetch-bind.test.ts src/gateway/server.lazy.test.ts --reporter=dot - passed, 4 files / 7 tests.
  • Fresh 2026-05-31 GitHub Actions rerun of failed shard checks-node-agentic-control-plane-startup-runtime - passed in 42s: https://github.com/openclaw/openclaw/actions/runs/26721310412/job/78759461978.
  • AWS Crabbox changed gate on the amended head: provider aws, lease cbx_0b77e578cbc5, slug coral-lobster, run run_3aae1ee99dfa, machine c7a.8xlarge, exit 0, leaseStopped=true; pnpm check:changed passed lanes core, coreTests, and docs.
  • Earlier AWS changed gate before the accepted doc-review fix: provider aws, lease cbx_f3263a9c4953, slug coral-lobster, run run_f36bcaa35ea6, machine c7a.8xlarge, exit 0, leaseStopped=true.
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main - first run found the legacy openai-codex/* doc example; fixed. Final rerun was clean, no accepted/actionable findings, overall: patch is correct (0.84).

Real behavior proof

Behavior addressed: the direct live model matrix now covers a constrained Ollama small-model route and can prove prompt, file-read, and image behavior against Ollama Cloud or a local Ollama endpoint without treating local endpoints as secret-bearing cloud calls.

Real environment tested: local linked OpenClaw worktree for source, live setup, and Ollama Cloud proof; AWS Crabbox Linux for check:changed on the amended PR head; GitHub Actions rerun for the startup-runtime shard after a transient hang.

Exact steps or command run after this patch: node scripts/run-vitest.mjs src/agents/model-compat.test.ts src/agents/models.profiles.live.test.ts --reporter=dot; node scripts/test-live.mjs --quiet -- src/agents/models.profiles.live.test.ts --reporter=dot; OPENCLAW_LIVE_MODELS=small OPENCLAW_LIVE_PROVIDERS=ollama OPENCLAW_LIVE_OLLAMA_BASE_URL=https://ollama.com OPENCLAW_LIVE_MAX_MODELS=1 OPENCLAW_LIVE_MODEL_CONCURRENCY=1 OPENCLAW_LIVE_MODEL_TIMEOUT_MS=45000 OPENCLAW_LIVE_TEST_TIMEOUT_MS=120000 node scripts/test-live.mjs --quiet -- src/agents/models.profiles.live.test.ts --reporter=dot; AWS Crabbox run run_3aae1ee99dfa with pnpm check:changed; GitHub Actions job 78759461978.

Evidence after fix: repeated local Ollama Cloud live runs selected ollama/gemma3:4b and completed prompt, file-read, and image probes; live-suite setup tests passed 33/33; compatibility tests passed 57/57; AWS Crabbox check:changed passed on head 14c0ace5b76; rerun startup-runtime CI shard passed.

Observed result after fix: OPENCLAW_LIVE_MODELS=small OPENCLAW_LIVE_PROVIDERS=ollama selects the curated Ollama model, hydrates the right live provider config, registers the Ollama runtime stream in-process, and completes the small-model probes.

What was not tested: gateway-level live agent smoke for Ollama was not run in this PR; this PR extends the direct live model matrix only.

@vincentkoc vincentkoc self-assigned this May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation agents Agent runtime and tooling size: M maintainer Maintainer-authored PR labels May 29, 2026
@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 31, 2026, 9:27 PM ET / 01:27 UTC.

Summary
The PR adds ollama/gemma3:4b to the small direct live model matrix and updates live-test provider setup, focused tests, and docs for local or remote Ollama selection.

PR surface: Source +1, Tests +817, Docs +1. Total +819 across 4 files.

Reproducibility: not applicable. this is a test matrix expansion rather than a bug report. The PR body supplies after-patch live Ollama Cloud, focused test, Crabbox, and CI proof instead of a current-main failure reproduction.

Review metrics: 2 noteworthy metrics.

  • Small live matrix default: 1 provider/model added. OPENCLAW_LIVE_MODELS=small can now select an Ollama direct probe, which changes the default live automation surface.
  • Ollama live setup path: 1 runtime registration path added. The direct live suite now registers the Ollama runtime stream in-process when Ollama is in scope, which maintainers should accept before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Maintainers should decide whether Ollama belongs in the default small direct live matrix and whether the omitted gateway-level Ollama smoke is acceptable as follow-up.

Risk before merge

  • [P1] Merging changes the default OPENCLAW_LIVE_MODELS=small automation by adding an Ollama direct probe plus live-test provider/runtime setup, so CI operators need to accept the added local or Ollama Cloud dependency behavior.
  • [P1] The PR proof covers the direct model matrix but explicitly leaves gateway-level Ollama live agent smoke out of scope.

Maintainer options:

  1. Accept Ollama in the small lane (recommended)
    Merge after maintainer acceptance that the small direct live matrix should auto-enable Ollama and rely on the documented local or Cloud endpoint behavior.
  2. Require gateway smoke first
    Ask for gateway-level live agent proof before merge if maintainers want this PR to prove the full agent path, not only direct model probes.
  3. Keep Ollama out of this default
    Pause or close if the intended automation policy is to keep local or Cloud Ollama outside the default small direct-model sweep.

Next step before merge

  • [P2] This protected maintainer-owned PR has no line-level repair finding; the remaining action is human acceptance of the live automation scope and normal merge checks.

Security
Cleared: No concrete security or supply-chain regression was found; the diff changes docs and test harness code and uses an existing bundled Ollama runtime API without new dependencies, workflows, lockfiles, or secret exposure.

Review details

Best possible solution:

Land this only if maintainers want the curated small direct live matrix to exercise Ollama; keep gateway-level Ollama smoke as a separate lane or follow-up if needed.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this is a test matrix expansion rather than a bug report. The PR body supplies after-patch live Ollama Cloud, focused test, Crabbox, and CI proof instead of a current-main failure reproduction.

Is this the best way to solve the issue?

Yes, with maintainer acceptance of the automation scope. The branch keeps Ollama-specific handling in the direct live-test harness, uses the plugin runtime-api barrel, and leaves gateway-level Ollama smoke as a reasonable separate follow-up.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 52c809a759f3.

Label changes

Label justifications:

  • P3: This is a low-risk test/docs automation improvement rather than a user-facing runtime regression.
  • merge-risk: 🚨 automation: The diff changes live-test automation defaults and provider setup behavior that green unit tests alone do not fully settle.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body reports repeated Ollama Cloud live runs selecting ollama/gemma3:4b with prompt, file-read, and image probes, plus focused tests, AWS Crabbox check:changed, and a rerun CI shard.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body reports repeated Ollama Cloud live runs selecting ollama/gemma3:4b with prompt, file-read, and image probes, plus focused tests, AWS Crabbox check:changed, and a rerun CI shard.
Evidence reviewed

PR surface:

Source +1, Tests +817, Docs +1. Total +819 across 4 files.

View PR surface stats
Area Files Added Removed Net
Source 1 1 0 +1
Tests 2 838 21 +817
Docs 1 2 1 +1
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 4 841 22 +819

What I checked:

  • Repository policy read: Root, docs, and src/agents review guidance were read; the relevant guidance is to review beyond the diff, treat live automation/provider routing as merge-sensitive, and apply docs/test scoped rules. (AGENTS.md:1, 52c809a759f3)
  • Current main does not already include the PR work: Current main has only unrelated Ollama live-test references; it does not contain the new small-matrix ollama/gemma3:4b entry or the new live-provider registration helpers, so this PR is not implemented on main. (src/agents/live-model-filter.ts:34, 52c809a759f3)
  • Small matrix change: The PR head adds ollama/gemma3:4b to SMALL_LIVE_MODEL_PRIORITY, which changes the curated OPENCLAW_LIVE_MODELS=small selection set. (src/agents/live-model-filter.ts:38, 919413c3a09b)
  • Live setup path: The PR head scopes small-priority refs by provider, passes them into provider discovery, applies live provider config, and registers provider APIs before direct model probes run. (src/agents/models.profiles.live.test.ts:1726, 919413c3a09b)
  • Focused coverage: The PR adds tests that include Ollama in small-provider discovery, preserve provider filters, enable the bundled plugin, and hydrate Ollama Cloud provider settings from live env. (src/agents/models.profiles.live.test.ts:823, 919413c3a09b)
  • Documented live behavior: The docs update names Ollama Gemma in the small allowlist and documents the local default endpoint plus OPENCLAW_LIVE_OLLAMA_BASE_URL for LAN, custom, or Cloud endpoints. Public docs: docs/help/testing-live.md. (docs/help/testing-live.md:77, 919413c3a09b)

Likely related people:

  • steipete: Peter Steinberger dominates recent history for the central live model and Ollama runtime paths, including live matrix stabilization and Ollama runtime fixes. (role: recent area contributor; confidence: high; commits: 7562afdca37a, e93216080aa1, 57b55883c5e9; files: src/agents/models.profiles.live.test.ts, src/agents/live-model-filter.ts, extensions/ollama/src/stream.ts)
  • vincentkoc: Vincent Koc has recent merged commits touching the same agent/Ollama-related paths and authored the initial branch commit for this PR. (role: recent adjacent contributor; confidence: medium; commits: efd5d0773454, 015c6b40aeac, 3b66146e9da4; files: src/agents/models.profiles.live.test.ts, src/agents/live-model-filter.ts, extensions/ollama/src/stream.ts)
  • suboss87: A recent Ollama model-id repair in the same runtime area was credited to @suboss87, making them a plausible routing candidate for Ollama-specific behavior questions. (role: Ollama adjacent contributor; confidence: medium; commits: 6f5459364a0b; files: extensions/ollama/src/stream.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. labels May 29, 2026
@vincentkoc vincentkoc force-pushed the codex/small-live-ollama-matrix branch from f142472 to 23909b4 Compare May 29, 2026 04:12
@vincentkoc vincentkoc marked this pull request as ready for review May 29, 2026 04:12
@clawsweeper clawsweeper Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 29, 2026
@vincentkoc vincentkoc force-pushed the codex/small-live-ollama-matrix branch from 23909b4 to 2b4fb34 Compare May 30, 2026 08:44
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 30, 2026
@vincentkoc vincentkoc force-pushed the codex/small-live-ollama-matrix branch from 2b4fb34 to 9920539 Compare May 30, 2026 09:00
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. labels May 30, 2026
@vincentkoc vincentkoc force-pushed the codex/small-live-ollama-matrix branch from 9920539 to 815cccb Compare May 30, 2026 09:18
@vincentkoc vincentkoc force-pushed the codex/small-live-ollama-matrix branch 3 times, most recently from 142530e to 82cfd3e Compare May 30, 2026 12:36
@vincentkoc vincentkoc force-pushed the codex/small-live-ollama-matrix branch 5 times, most recently from 49ec1a3 to 14c0ace Compare May 31, 2026 18:48
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 31, 2026
@steipete steipete force-pushed the codex/small-live-ollama-matrix branch from 52edd1b to 47e3da2 Compare May 31, 2026 23:56
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. labels Jun 1, 2026
@steipete steipete force-pushed the codex/small-live-ollama-matrix branch from 432c802 to c3abae1 Compare June 1, 2026 00:47
@openclaw-barnacle openclaw-barnacle Bot removed commands Command implementations extensions: github-copilot labels Jun 1, 2026
@steipete steipete force-pushed the codex/small-live-ollama-matrix branch from c3abae1 to 919413c Compare June 1, 2026 01:20
@steipete

steipete commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Maintainer verification for head 919413c3a09b304b43f8f334c6dbb6192aeada0e:

  • Rebasing: rebased onto current origin/main without conflicts after the latest main churn.
  • Autoreview: .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main --parallel-tests 'node scripts/test-live.mjs --quiet -- src/agents/models.profiles.live.test.ts --reporter=dot && node scripts/run-vitest.mjs src/agents/model-compat.test.ts --reporter=dot && pnpm tsgo:test:src && node_modules/.bin/oxfmt --check --threads=1 docs/help/testing-live.md src/agents/live-model-filter.ts src/agents/model-compat.test.ts src/agents/models.profiles.live.test.ts && git diff --check origin/main...HEAD'
  • Result: autoreview clean, no accepted/actionable findings; tests exit 0 after 376s.
  • Focused proof inside that run: models.profiles.live.test.ts 37/37 passed, model-compat.test.ts 57/57 passed, pnpm tsgo:test:src passed, oxfmt check passed, diff whitespace check passed.
  • CI: current-head GitHub checks have no failures; required workflows are still queued at preflight/security-fast due runner backlog while main is actively churning.

Landing with maintainer override on exact-head local proof because the change is test harness/docs scoped and the queued CI has not produced a code failure.

@steipete steipete merged commit b6bac3c into main Jun 1, 2026
25 checks passed
@steipete steipete deleted the codex/small-live-ollama-matrix branch June 1, 2026 01:38
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 1, 2026
* test(agents): include Ollama in small live model matrix

* test: avoid Ollama cloud key in local live runs

* test: recognize Ollama env secret refs

* test: type Ollama live key fixtures

* test: prevent Ollama cloud auth in local live probes

* test: preserve equivalent Ollama live credentials

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
* test(agents): include Ollama in small live model matrix

* test: avoid Ollama cloud key in local live runs

* test: recognize Ollama env secret refs

* test: type Ollama live key fixtures

* test: prevent Ollama cloud auth in local live probes

* test: preserve equivalent Ollama live credentials

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* test(agents): include Ollama in small live model matrix

* test: avoid Ollama cloud key in local live runs

* test: recognize Ollama env secret refs

* test: type Ollama live key fixtures

* test: prevent Ollama cloud auth in local live probes

* test: preserve equivalent Ollama live credentials

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation maintainer Maintainer-authored PR merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: L status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants