Skip to content

fix(inference): set compat.supportsUsageInStreaming for ollama-local (#2747)#3683

Merged
cv merged 7 commits into
mainfrom
fix/ollama-local-compat-followup
Jun 3, 2026
Merged

fix(inference): set compat.supportsUsageInStreaming for ollama-local (#2747)#3683
cv merged 7 commits into
mainfrom
fix/ollama-local-compat-followup

Conversation

@nvshaxie

@nvshaxie nvshaxie commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Follow-up to fix(inference): enable streamed usage accounting for Ollama-local (#2747) #3678: that PR's Python-side conditional was dead code in production. getSandboxInferenceConfig(\"model\", \"ollama-local\") falls through to the default switch arm and returns providerKey = \"inference\" (the same value all managed-inference providers use), so the build-time NEMOCLAW_PROVIDER_KEY is never \"ollama\" / \"ollama-local\" and the conditional never fired. The earlier vitest cases only passed because they set NEMOCLAW_PROVIDER_KEY=ollama directly — an env shape no real onboard produces.
  • Move the wiring to the proper layer: add an explicit case \"ollama-local\": in getSandboxInferenceConfig that sets inferenceCompat = { supportsUsageInStreaming: true }. The existing NEMOCLAW_INFERENCE_COMPAT_B64 round-trip then carries the flag into model.compat exactly as it does for kimi-k2.6-managed-inference.json and compatible-endpoint.
  • Drop the now-redundant Python conditional in scripts/generate-openclaw-config.py, drop the three unrealistic-fixture vitest cases in test/generate-openclaw-config.test.ts, and add two new ones in src/lib/inference/config.test.ts that exercise the real provider name.

Evidence the previous fix was inert

Direct run of scripts/generate-openclaw-config.py with the env values NemoClaw actually passes at docker build time for an ollama-local onboard (post-#3678 on main):

```
NEMOCLAW_PROVIDER_KEY=inference
NEMOCLAW_PRIMARY_MODEL_REF=inference/qwen2.5:0.5b
NEMOCLAW_INFERENCE_BASE_URL=https://inference.local/v1
NEMOCLAW_INFERENCE_API=openai-completions
NEMOCLAW_INFERENCE_COMPAT_B64=$(echo -n '{}' | base64)
```

generates an openclaw.json with provider.models[0].compat = null. The if provider_key in {\"ollama\", \"ollama-local\"}: conditional never matched, so the supportsUsageInStreaming flag was not set. TUI token counter would stay ? on nemoclaw onboard --provider ollama builds in main today.

After this PR (same env, but TypeScript now also encodes inferenceCompat = {\"supportsUsageInStreaming\": true} into NEMOCLAW_INFERENCE_COMPAT_B64 for ollama-local):

```
NEMOCLAW_INFERENCE_COMPAT_B64=$(echo -n '{"supportsUsageInStreaming":true}' | base64)
```

openclaw.json now has provider.models[0].compat = {\"supportsUsageInStreaming\": true} ✓.

Test plan

Re-opens — and properly fixes — #2747.

Signed-off-by: Shawn Xie shaxie@nvidia.com

Summary by CodeRabbit

  • Refactor

    • Adjusted provider configuration handling: explicit support for streaming usage reporting for the local Ollama provider and removal of a legacy compatibility override.
  • Tests

    • Updated tests to cover provider-specific streaming compatibility and to verify model-setup plugin activation only when relevant environment overrides are present.

Review Change Stack

…dboxInferenceConfig (#2747)

Follow-up to #3678. That PR added a Python-side conditional in
`generate-openclaw-config.py` that toggled `supportsUsageInStreaming`
when `NEMOCLAW_PROVIDER_KEY` was `"ollama"` or `"ollama-local"`. However,
in real onboard runs `getSandboxInferenceConfig("model", "ollama-local")`
hits the default switch arm and returns `providerKey = "inference"` —
the same value used by all managed-inference providers — so the
build-time `NEMOCLAW_PROVIDER_KEY` is never `"ollama"` and the
conditional was dead code. The earlier regression tests in
`test/generate-openclaw-config.test.ts` passed only because they set
`NEMOCLAW_PROVIDER_KEY=ollama` directly, which never happens in
production.

Verified locally with the python script and the real env values
NemoClaw passes through at docker build time
(`NEMOCLAW_PROVIDER_KEY=inference`,
`NEMOCLAW_INFERENCE_BASE_URL=https://inference.local/v1`,
`NEMOCLAW_INFERENCE_COMPAT_B64=e30=` for an empty compat): the
generated `openclaw.json` has `model.compat = null`, confirming the
flag never landed.

Move the wiring to the proper layer: add an explicit `"ollama-local"`
case in `getSandboxInferenceConfig` that sets
`inferenceCompat = { supportsUsageInStreaming: true }`. The existing
encode-into-`NEMOCLAW_INFERENCE_COMPAT_B64` path then carries the flag
into the python script's `inference_compat` dict, which writes it onto
`model.compat` exactly as before — except now it actually fires for
ollama-local builds. Cloud providers and other local backends keep
their existing `inferenceCompat` (or absence thereof).

Drop the now-redundant python conditional, drop the three
unrealistic-fixture vitest cases from `test/generate-openclaw-config.test.ts`,
and add two new ones in `src/lib/inference/config.test.ts` that
exercise `getSandboxInferenceConfig` with realistic provider names.

Test plan
- `npx vitest run src/lib/inference/config.test.ts` — 24/24 pass,
  including the two new ones
  (`forces supportsUsageInStreaming for ollama-local`,
  `does not force supportsUsageInStreaming for non-ollama providers`)
- `npx vitest run test/generate-openclaw-config.test.ts` — 67/67 pass
  (was 70/70 with the three deleted unrealistic cases)
- Direct script run with prod-shaped env values:
  `NEMOCLAW_INFERENCE_COMPAT_B64={"supportsUsageInStreaming":true}`
  (the shape TypeScript will emit for ollama-local) →
  `model.compat = {'supportsUsageInStreaming': True}` in generated
  openclaw.json ✓

Signed-off-by: Shawn Xie <shaxie@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 18, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fdf0c958-a131-4416-adfd-e1624e3edcde

📥 Commits

Reviewing files that changed from the base of the PR and between 605e7bf and c53bca1.

📒 Files selected for processing (1)
  • scripts/generate-openclaw-config.py

📝 Walkthrough

Walkthrough

Relocates Ollama/ollama-local streaming usage-flag handling from the Python config generator into TypeScript runtime: adds an ollama-local case enabling inferenceCompat.supportsUsageInStreaming, removes the Python override, and updates tests to match the new activation points.

Changes

Ollama supportsUsageInStreaming migration

Layer / File(s) Summary
Ollama-local streaming config in TypeScript
src/lib/inference/config.ts, src/lib/inference/config.test.ts
Adds explicit ollama-local case in getSandboxInferenceConfig that sets managed providerKey/primaryModelRef and enables inferenceCompat.supportsUsageInStreaming = true. Tests assert the flag for ollama-local and its absence for non-ollama providers.
Remove Ollama override from config generation script
scripts/generate-openclaw-config.py
Removes the conditional in build_config() that previously forced inference_compat["supportsUsageInStreaming"] = True for ollama/ollama-local.
Update OpenClaw/Kimi activation test
test/generate-openclaw-config.test.ts
Replaces prior provider-key supportsUsageInStreaming tests with a test verifying the Kimi model-specific plugin setup does not activate when env route overrides do not match.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#3678: Related changes to supportsUsageInStreaming handling for Ollama/ollama-local.

Suggested labels

bug, v0.0.44

Suggested reviewers

  • ericksoa
  • cv

Poem

🐰 A little flag hops house today,
From Python fields it found its way,
To TypeScript streams where metrics sing,
Tests nod, configs softly spring,
Hooray for tidy hops and play 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: enabling supportsUsageInStreaming for ollama-local provider, with clear reference to the related issue.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ollama-local-compat-followup

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. Only a new test file was added. Because no production code, E2E workflow, deployment asset, policy, sandbox lifecycle, credential, or inference-routing implementation changed, no existing E2E job is needed for this PR.

Optional E2E

  • None.

New E2E recommendations

  • None.

@nvshaxie

Copy link
Copy Markdown
Contributor Author

Additional empirical evidence — confirmed by running a fresh ollama-local onboard on the current main (post-#3678) and pulling /sandbox/.openclaw/openclaw.json straight out of the built sandbox:

```
provider keys: ['inference']
provider 'inference':
baseUrl: https://inference.local/v1
api: openai-completions
model.id: qwen2.5:0.5b
model.compat: None ← supportsUsageInStreaming NOT set
```

So on main today (50ad764bd fix(inference): enable streamed usage accounting for Ollama-local (#2747) (#3678)), an ollama-local sandbox is still being built without the compat flag. The Python conditional from #3678 doesn't fire because NEMOCLAW_PROVIDER_KEY at build time is "inference", not "ollama" / "ollama-local" — exactly as this PR's body describes.

Onboard command used:
```
curl -fsSL https://www.nvidia.com/nemoclaw.sh |
NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1
NEMOCLAW_PROVIDER=ollama NEMOCLAW_MODEL=qwen2.5:0.5b
NEMOCLAW_SANDBOX_NAME=ollama-repro NEMOCLAW_OLLAMA_REQUIRE_TOOLS=0
CHAT_UI_URL=http://127.0.0.1:18789 bash
```

Environment: Ubuntu 24.04, x86_64, A100x8 + Docker 29.4.2 + Node 22.22.2 + Ollama 0.24.0 (loopback), OpenShell 0.0.39, NemoClaw v0.0.44+ from current main.

After landing this PR, the same onboard should produce model.compat = {'supportsUsageInStreaming': True} in the sandbox's openclaw.json.

@wscurran

Copy link
Copy Markdown
Contributor

@nvshaxie nvshaxie changed the title fix(inference): wire ollama-local supportsUsageInStreaming via getSandboxInferenceConfig (#2747 follow-up to #3678) fix(inference): set compat.supportsUsageInStreaming for ollama-local (#2747) May 19, 2026
@wscurran wscurran added the provider: ollama Ollama local model provider behavior label May 19, 2026
Resolve conflicts from the generate-openclaw-config .py->.mts migration
(landed on main in 9641ce0):

- scripts/generate-openclaw-config.py: deleted on main. This PR had only
  removed the dead `provider_key in {ollama,ollama-local}` block from it,
  so accept the deletion — the file is gone.
- test/generate-openclaw-config.test.ts: rewritten on main. This PR only
  deleted the old .py-era tests here (+0/-64), which no longer exist in
  the rewritten file, so take main's version (keeps main's new tests).

The real fix is unchanged and lives in src/lib/inference/config.ts:
getSandboxInferenceConfig("model","ollama-local") sets
inferenceCompat.supportsUsageInStreaming=true, which flows to
NEMOCLAW_INFERENCE_COMPAT_B64 (dockerfile-patch.ts) and into
openclaw.json. Note main's migration re-ported the same provider-key
-gated block into the .mts script; that block is inert in production
(NEMOCLAW_PROVIDER_KEY resolves to "inference" for ollama-local), so the
two coexist and config.ts is the layer that actually fires.

Verified: build:cli clean; 130/130 tests pass across
src/lib/inference/config.test.ts and test/generate-openclaw-config.test.ts.

Signed-off-by: Shawn Xie <shaxie@nvidia.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. Changed file is a non-scenario unit/integration test outside test/e2e-scenario/ and does not modify scenario workflows, scenario metadata, expected states, validation suites, runtime, or scenario-relevant onboarding/install helpers.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 5 prior items resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@wscurran wscurran added the v0.0.57 Release target label Jun 2, 2026
@cv cv added v0.0.58 Release target and removed v0.0.57 Release target labels Jun 2, 2026
Comment thread src/lib/inference/config.ts Fixed
@cv

cv commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Addressed the latest advisory feedback:

  • Removed the duplicate/unreachable ollama-local switch arm and the duplicate direct getSandboxInferenceConfig tests from the PR diff.
  • Added a focused prod-shaped regression test in test/ollama-local-openclaw-config-propagation.test.ts that patches the Dockerfile, asserts NEMOCLAW_PROVIDER_KEY=inference, decodes NEMOCLAW_INFERENCE_COMPAT_B64, and verifies the generated openclaw.json carries models.providers.inference.models[0].compat.supportsUsageInStreaming = true.
  • The PR diff is now narrowed to that single focused test file.

Validation run locally:

  • npx vitest run --project cli test/ollama-local-openclaw-config-propagation.test.ts src/lib/inference/config.test.ts
  • npm run typecheck:cli
  • push hooks passed, including repository checks, CLI TypeScript, and CLI tests.

@wscurran wscurran added area: inference Inference routing, serving, model selection, or outputs area: local-models Local model providers, downloads, launch, or connectivity area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality labels Jun 3, 2026
@cv cv merged commit 7de3d3d into main Jun 3, 2026
18 checks passed
@cv cv deleted the fix/ollama-local-compat-followup branch June 3, 2026 04:14
@wscurran wscurran removed the feature PR adds or expands user-visible functionality label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: inference Inference routing, serving, model selection, or outputs area: local-models Local model providers, downloads, launch, or connectivity area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression provider: ollama Ollama local model provider behavior v0.0.58 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants