fix(inference): set compat.supportsUsageInStreaming for ollama-local (#2747) by nvshaxie · Pull Request #3683 · NVIDIA/NemoClaw

nvshaxie · 2026-05-18T08:56:40Z

Summary

Follow-up to fix(inference): enable streamed usage accounting for Ollama-local (#2747) #3678: that PR's Python-side conditional was dead code in production. getSandboxInferenceConfig(\"model\", \"ollama-local\") falls through to the default switch arm and returns providerKey = \"inference\" (the same value all managed-inference providers use), so the build-time NEMOCLAW_PROVIDER_KEY is never \"ollama\" / \"ollama-local\" and the conditional never fired. The earlier vitest cases only passed because they set NEMOCLAW_PROVIDER_KEY=ollama directly — an env shape no real onboard produces.
Move the wiring to the proper layer: add an explicit case \"ollama-local\": in getSandboxInferenceConfig that sets inferenceCompat = { supportsUsageInStreaming: true }. The existing NEMOCLAW_INFERENCE_COMPAT_B64 round-trip then carries the flag into model.compat exactly as it does for kimi-k2.6-managed-inference.json and compatible-endpoint.
Drop the now-redundant Python conditional in scripts/generate-openclaw-config.py, drop the three unrealistic-fixture vitest cases in test/generate-openclaw-config.test.ts, and add two new ones in src/lib/inference/config.test.ts that exercise the real provider name.

Evidence the previous fix was inert

Direct run of scripts/generate-openclaw-config.py with the env values NemoClaw actually passes at docker build time for an ollama-local onboard (post-#3678 on main):

```
NEMOCLAW_PROVIDER_KEY=inference
NEMOCLAW_PRIMARY_MODEL_REF=inference/qwen2.5:0.5b
NEMOCLAW_INFERENCE_BASE_URL=https://inference.local/v1
NEMOCLAW_INFERENCE_API=openai-completions
NEMOCLAW_INFERENCE_COMPAT_B64=$(echo -n '{}' | base64)
```

generates an openclaw.json with provider.models[0].compat = null. The if provider_key in {\"ollama\", \"ollama-local\"}: conditional never matched, so the supportsUsageInStreaming flag was not set. TUI token counter would stay ? on nemoclaw onboard --provider ollama builds in main today.

After this PR (same env, but TypeScript now also encodes inferenceCompat = {\"supportsUsageInStreaming\": true} into NEMOCLAW_INFERENCE_COMPAT_B64 for ollama-local):

```
NEMOCLAW_INFERENCE_COMPAT_B64=$(echo -n '{"supportsUsageInStreaming":true}' | base64)
```

openclaw.json now has provider.models[0].compat = {\"supportsUsageInStreaming\": true} ✓.

Test plan

`npx vitest run src/lib/inference/config.test.ts` — 24/24 pass, including two new cases:
- `forces supportsUsageInStreaming for ollama-local ([Station][Inference] TUI token counter shows ? after successful ollama-local inference instead of actual token usage #2747)`
- `does not force supportsUsageInStreaming for non-ollama providers`
`npx vitest run test/generate-openclaw-config.test.ts` — 67/67 pass (was 70/70 with the three deleted unrealistic cases)
Manual script run with the prod-shaped env values listed above

Re-opens — and properly fixes — #2747.

Signed-off-by: Shawn Xie shaxie@nvidia.com

Summary by CodeRabbit

Refactor
- Adjusted provider configuration handling: explicit support for streaming usage reporting for the local Ollama provider and removal of a legacy compatibility override.
Tests
- Updated tests to cover provider-specific streaming compatibility and to verify model-setup plugin activation only when relevant environment overrides are present.

…dboxInferenceConfig (#2747) Follow-up to #3678. That PR added a Python-side conditional in `generate-openclaw-config.py` that toggled `supportsUsageInStreaming` when `NEMOCLAW_PROVIDER_KEY` was `"ollama"` or `"ollama-local"`. However, in real onboard runs `getSandboxInferenceConfig("model", "ollama-local")` hits the default switch arm and returns `providerKey = "inference"` — the same value used by all managed-inference providers — so the build-time `NEMOCLAW_PROVIDER_KEY` is never `"ollama"` and the conditional was dead code. The earlier regression tests in `test/generate-openclaw-config.test.ts` passed only because they set `NEMOCLAW_PROVIDER_KEY=ollama` directly, which never happens in production. Verified locally with the python script and the real env values NemoClaw passes through at docker build time (`NEMOCLAW_PROVIDER_KEY=inference`, `NEMOCLAW_INFERENCE_BASE_URL=https://inference.local/v1`, `NEMOCLAW_INFERENCE_COMPAT_B64=e30=` for an empty compat): the generated `openclaw.json` has `model.compat = null`, confirming the flag never landed. Move the wiring to the proper layer: add an explicit `"ollama-local"` case in `getSandboxInferenceConfig` that sets `inferenceCompat = { supportsUsageInStreaming: true }`. The existing encode-into-`NEMOCLAW_INFERENCE_COMPAT_B64` path then carries the flag into the python script's `inference_compat` dict, which writes it onto `model.compat` exactly as before — except now it actually fires for ollama-local builds. Cloud providers and other local backends keep their existing `inferenceCompat` (or absence thereof). Drop the now-redundant python conditional, drop the three unrealistic-fixture vitest cases from `test/generate-openclaw-config.test.ts`, and add two new ones in `src/lib/inference/config.test.ts` that exercise `getSandboxInferenceConfig` with realistic provider names. Test plan - `npx vitest run src/lib/inference/config.test.ts` — 24/24 pass, including the two new ones (`forces supportsUsageInStreaming for ollama-local`, `does not force supportsUsageInStreaming for non-ollama providers`) - `npx vitest run test/generate-openclaw-config.test.ts` — 67/67 pass (was 70/70 with the three deleted unrealistic cases) - Direct script run with prod-shaped env values: `NEMOCLAW_INFERENCE_COMPAT_B64={"supportsUsageInStreaming":true}` (the shape TypeScript will emit for ollama-local) → `model.compat = {'supportsUsageInStreaming': True}` in generated openclaw.json ✓ Signed-off-by: Shawn Xie <shaxie@nvidia.com>

copy-pr-bot · 2026-05-18T08:56:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-18T08:56:52Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fdf0c958-a131-4416-adfd-e1624e3edcde

📥 Commits

Reviewing files that changed from the base of the PR and between 605e7bf and c53bca1.

📒 Files selected for processing (1)

scripts/generate-openclaw-config.py

📝 Walkthrough

Walkthrough

Relocates Ollama/ollama-local streaming usage-flag handling from the Python config generator into TypeScript runtime: adds an ollama-local case enabling inferenceCompat.supportsUsageInStreaming, removes the Python override, and updates tests to match the new activation points.

Changes

Ollama supportsUsageInStreaming migration

Layer / File(s)	Summary
Ollama-local streaming config in TypeScript `src/lib/inference/config.ts`, `src/lib/inference/config.test.ts`	Adds explicit `ollama-local` case in `getSandboxInferenceConfig` that sets managed `providerKey`/`primaryModelRef` and enables `inferenceCompat.supportsUsageInStreaming = true`. Tests assert the flag for `ollama-local` and its absence for non-ollama providers.
Remove Ollama override from config generation script `scripts/generate-openclaw-config.py`	Removes the conditional in `build_config()` that previously forced `inference_compat["supportsUsageInStreaming"] = True` for `ollama`/`ollama-local`.
Update OpenClaw/Kimi activation test `test/generate-openclaw-config.test.ts`	Replaces prior provider-key `supportsUsageInStreaming` tests with a test verifying the Kimi model-specific plugin setup does not activate when env route overrides do not match.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

NVIDIA/NemoClaw#3678: Related changes to supportsUsageInStreaming handling for Ollama/ollama-local.

Suggested labels

bug, v0.0.44

Suggested reviewers

ericksoa
cv

Poem

🐰 A little flag hops house today,
From Python fields it found its way,
To TypeScript streams where metrics sing,
Tests nod, configs softly spring,
Hooray for tidy hops and play 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: enabling supportsUsageInStreaming for ollama-local provider, with clear reference to the related issue.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/ollama-local-compat-followup

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-18T08:58:10Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. Only a new test file was added. Because no production code, E2E workflow, deployment asset, policy, sandbox lifecycle, credential, or inference-routing implementation changed, no existing E2E job is needed for this PR.

Optional E2E

None.

New E2E recommendations

None.

nvshaxie · 2026-05-18T09:04:01Z

Additional empirical evidence — confirmed by running a fresh ollama-local onboard on the current main (post-#3678) and pulling /sandbox/.openclaw/openclaw.json straight out of the built sandbox:

```
provider keys: ['inference']
provider 'inference':
baseUrl: https://inference.local/v1
api: openai-completions
model.id: qwen2.5:0.5b
model.compat: None ← supportsUsageInStreaming NOT set
```

So on main today (50ad764bd fix(inference): enable streamed usage accounting for Ollama-local (#2747) (#3678)), an ollama-local sandbox is still being built without the compat flag. The Python conditional from #3678 doesn't fire because NEMOCLAW_PROVIDER_KEY at build time is "inference", not "ollama" / "ollama-local" — exactly as this PR's body describes.

Onboard command used:
```
curl -fsSL https://www.nvidia.com/nemoclaw.sh |
NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1
NEMOCLAW_PROVIDER=ollama NEMOCLAW_MODEL=qwen2.5:0.5b
NEMOCLAW_SANDBOX_NAME=ollama-repro NEMOCLAW_OLLAMA_REQUIRE_TOOLS=0
CHAT_UI_URL=http://127.0.0.1:18789 bash
```

Environment: Ubuntu 24.04, x86_64, A100x8 + Docker 29.4.2 + Node 22.22.2 + Ollama 0.24.0 (loopback), OpenShell 0.0.39, NemoClaw v0.0.44+ from current main.

After landing this PR, the same onboard should produce model.compat = {'supportsUsageInStreaming': True} in the sandbox's openclaw.json.

wscurran · 2026-05-18T14:53:30Z

✨ Related open PRs:

#3678 fix(inference): enable streamed usage accounting for Ollama-local (#2747)

Related open issues:

#2747 [Station][Inference] TUI token counter shows ? after successful ollama-local inference instead of actual token usage

Resolve conflicts from the generate-openclaw-config .py->.mts migration (landed on main in 9641ce0): - scripts/generate-openclaw-config.py: deleted on main. This PR had only removed the dead `provider_key in {ollama,ollama-local}` block from it, so accept the deletion — the file is gone. - test/generate-openclaw-config.test.ts: rewritten on main. This PR only deleted the old .py-era tests here (+0/-64), which no longer exist in the rewritten file, so take main's version (keeps main's new tests). The real fix is unchanged and lives in src/lib/inference/config.ts: getSandboxInferenceConfig("model","ollama-local") sets inferenceCompat.supportsUsageInStreaming=true, which flows to NEMOCLAW_INFERENCE_COMPAT_B64 (dockerfile-patch.ts) and into openclaw.json. Note main's migration re-ported the same provider-key -gated block into the .mts script; that block is inert in production (NEMOCLAW_PROVIDER_KEY resolves to "inference" for ollama-local), so the two coexist and config.ts is the layer that actually fires. Verified: build:cli clean; 130/130 tests pass across src/lib/inference/config.test.ts and test/generate-openclaw-config.test.ts. Signed-off-by: Shawn Xie <shaxie@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T00:08:07Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. Changed file is a non-scenario unit/integration test outside test/e2e-scenario/ and does not modify scenario workflows, scenario metadata, expected states, validation suites, runtime, or scenario-relevant onboarding/install helpers.

Optional scenario E2E

None.

Relevant changed files

None.

github-actions · 2026-06-01T00:10:21Z

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 5 prior items resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

cv · 2026-06-03T01:11:49Z

Addressed the latest advisory feedback:

Removed the duplicate/unreachable ollama-local switch arm and the duplicate direct getSandboxInferenceConfig tests from the PR diff.
Added a focused prod-shaped regression test in test/ollama-local-openclaw-config-propagation.test.ts that patches the Dockerfile, asserts NEMOCLAW_PROVIDER_KEY=inference, decodes NEMOCLAW_INFERENCE_COMPAT_B64, and verifies the generated openclaw.json carries models.providers.inference.models[0].compat.supportsUsageInStreaming = true.
The PR diff is now narrowed to that single focused test file.

Validation run locally:

npx vitest run --project cli test/ollama-local-openclaw-config-propagation.test.ts src/lib/inference/config.test.ts
npm run typecheck:cli
push hooks passed, including repository checks, CLI TypeScript, and CLI tests.

nvshaxie requested review from cv and ericksoa May 18, 2026 08:56

nvshaxie mentioned this pull request May 18, 2026

[Station][Inference] TUI token counter shows ? after successful ollama-local inference instead of actual token usage #2747

Closed

wscurran added enhancement: inference labels May 18, 2026

Merge branch 'main' into fix/ollama-local-compat-followup

0979729

nvshaxie changed the title ~~fix(inference): wire ollama-local supportsUsageInStreaming via getSandboxInferenceConfig (#2747 follow-up to #3678)~~ fix(inference): set compat.supportsUsageInStreaming for ollama-local (#2747) May 19, 2026

wscurran added the provider: ollama Ollama local model provider behavior label May 19, 2026

wscurran mentioned this pull request May 19, 2026

Model performance / capability audit across supported agents #3123

Open

cv approved these changes May 20, 2026

View reviewed changes

Merge branch 'main' into fix/ollama-local-compat-followup

c53bca1

wscurran added the v0.0.57 Release target label Jun 2, 2026

Merge branch 'main' into fix/ollama-local-compat-followup

1cf0953

prekshivyas assigned cv Jun 2, 2026

Merge branch 'main' into fix/ollama-local-compat-followup

4a45577

cv added v0.0.58 Release target and removed v0.0.57 Release target labels Jun 2, 2026

github-advanced-security AI found potential problems Jun 2, 2026

View reviewed changes

Comment thread src/lib/inference/config.ts Fixed

test(inference): cover ollama-local compat propagation

faf3172

cv merged commit 7de3d3d into main Jun 3, 2026
18 checks passed

cv deleted the fix/ollama-local-compat-followup branch June 3, 2026 04:14

wscurran removed Local Models labels Jun 3, 2026

coderabbitai Bot mentioned this pull request Jun 4, 2026

feat(inference): replace qwen2.5:7b with qwen3.5:9b as the default Ollama starter model #4776

Merged

12 tasks

wscurran removed the feature PR adds or expands user-visible functionality label Jun 9, 2026

Conversation

nvshaxie commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Evidence the previous fix was inert

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

nvshaxie commented May 18, 2026

Uh oh!

wscurran commented May 18, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

Uh oh!

Uh oh!

cv commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nvshaxie commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading

github-actions Bot commented May 18, 2026 •

edited

Loading

github-actions Bot commented Jun 1, 2026 •

edited

Loading

github-actions Bot commented Jun 1, 2026 •

edited

Loading