fix(inference): enable streamed usage accounting for Ollama-local (#2747)#3678
Conversation
) Ollama's OpenAI-compatible /v1/chat/completions stream omits the `usage` chunk by default; OpenAI clients have to send `stream_options.include_usage: true` to receive it. OpenClaw already gates that request flag on `model.compat.supportsUsageInStreaming` (see `src/agents/openai-transport-stream.ts` upstream), and the LM Studio extension force-enables it via `withLmstudioUsageCompat` to work around the same usage-chunk omission. OpenClaw's own Ollama extension only opts in when its detector recognises the endpoint as Ollama (`extensions/ollama/src/stream.ts` -> `isOllamaCompatProvider`). NemoClaw routes ollama-local traffic via the standardised `https://inference.local/v1` URL through the OpenShell gateway, so the upstream detector misses it and the OpenClaw TUI token counter stays `?` for every turn. Set `supportsUsageInStreaming` directly on the model compat block when the build's `NEMOCLAW_PROVIDER_KEY` is one of `ollama` or `ollama-local` (the same set already tracked by `_bundled_provider_plugins["ollama"]`). Cloud providers and other local backends are unaffected. Existing explicit `supportsUsageInStreaming` values in the incoming inference compat blob (e.g. from a future model-specific-setup manifest) take precedence. Test plan - `npx vitest run test/generate-openclaw-config.test.ts`
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR adds provider-specific logic to force-enable streaming usage support for Ollama providers in the configuration generator and validates this behavior with three test cases covering default enablement, non-Ollama exclusion, and explicit override precedence. ChangesOpenClaw Ollama Streaming Usage Flag
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
## Summary Updates the NemoClaw documentation for the v0.0.45 release by summarizing the user-facing changes merged since v0.0.44 and bumping the docs version metadata. Refreshes generated user skills so agent-facing references match the source docs. ## Changes - Added v0.0.45 release notes covering onboarding recovery, local inference, channel cleanup, share mount diagnostics, uninstall cleanup, and security redaction updates. - Updated command and troubleshooting docs for sandbox name limits, GPU gateway reuse, DNS preflight behavior, channel removal cleanup, and share mount path validation. - Bumped docs version metadata to 0.0.45 and regenerated NemoClaw user skills from the docs. - Source summary: #3672 -> `docs/reference/commands.md`: documented channel removal detaching bridge providers and un-applying channel policy presets. - Source summary: #3678 -> `docs/about/release-notes.md`: documented Ollama streamed usage accounting in the release notes. - Source summary: #3670 -> `docs/reference/commands.md`, `docs/reference/troubleshooting.md`: documented safe GPU gateway replacement behavior. - Source summary: #3664 -> `docs/about/release-notes.md`: documented blueprint permission normalization in the release notes. - Source summary: #3181 -> `docs/reference/troubleshooting.md`: documented GPU toolkit guidance when host drivers work but passthrough is disabled. - Source summary: #3554 -> `docs/about/release-notes.md`: documented host `openshell-gateway` cleanup during uninstall. - Source summary: #3651 -> `docs/reference/troubleshooting.md`: documented the uncached `.invalid` DNS preflight probe. - Source summary: #3643 -> `docs/reference/commands.md`: included existing `NEMOCLAW_PROVIDER` interactive-mode behavior in generated docs. - Source summary: #3647 -> `docs/reference/commands.md`: documented remote sandbox path verification for `share mount`. - Source summary: #3646 -> `docs/reference/commands.md`: included existing local writable mount target guidance in generated docs. - Source summary: #3642 -> `docs/inference/use-local-inference.md`, `docs/reference/commands.md`: documented managed-vLLM model override and gated-model token checks. - Source summary: #3639 -> `docs/reference/commands.md`: documented the 63-character sandbox name limit. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Commit hooks passed for the staged files. A standalone `npx prek run --all-files` attempt was blocked by sandbox access to `/Users/miyoungc/.cache/prek/prek.log`, so that checkbox is left unchecked. --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Enhanced CLI command reference documentation with clearer guidance on onboarding, GPU passthrough, inference configuration, channel removal, and shared mounts. * Improved troubleshooting sections with better DNS resolution and GPU passthrough remediation steps. * Added documentation for overriding managed vLLM model selection. * Updated release notes for v0.0.45 reflecting infrastructure and workflow improvements. * **Version Bump** * Released v0.0.45. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3755?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…2747) (#3683) ## Summary - Follow-up to #3678: that PR's Python-side conditional was dead code in production. `getSandboxInferenceConfig(\"model\", \"ollama-local\")` falls through to the default switch arm and returns `providerKey = \"inference\"` (the same value all managed-inference providers use), so the build-time `NEMOCLAW_PROVIDER_KEY` is never `\"ollama\"` / `\"ollama-local\"` and the conditional never fired. The earlier vitest cases only passed because they set `NEMOCLAW_PROVIDER_KEY=ollama` directly — an env shape no real onboard produces. - Move the wiring to the proper layer: add an explicit `case \"ollama-local\":` in `getSandboxInferenceConfig` that sets `inferenceCompat = { supportsUsageInStreaming: true }`. The existing `NEMOCLAW_INFERENCE_COMPAT_B64` round-trip then carries the flag into `model.compat` exactly as it does for `kimi-k2.6-managed-inference.json` and `compatible-endpoint`. - Drop the now-redundant Python conditional in `scripts/generate-openclaw-config.py`, drop the three unrealistic-fixture vitest cases in `test/generate-openclaw-config.test.ts`, and add two new ones in `src/lib/inference/config.test.ts` that exercise the real provider name. ## Evidence the previous fix was inert Direct run of `scripts/generate-openclaw-config.py` with the env values NemoClaw actually passes at docker build time for an ollama-local onboard (post-#3678 on main): \`\`\` NEMOCLAW_PROVIDER_KEY=inference NEMOCLAW_PRIMARY_MODEL_REF=inference/qwen2.5:0.5b NEMOCLAW_INFERENCE_BASE_URL=https://inference.local/v1 NEMOCLAW_INFERENCE_API=openai-completions NEMOCLAW_INFERENCE_COMPAT_B64=$(echo -n '{}' | base64) \`\`\` generates an `openclaw.json` with `provider.models[0].compat = null`. The `if provider_key in {\"ollama\", \"ollama-local\"}:` conditional never matched, so the `supportsUsageInStreaming` flag was not set. TUI token counter would stay `?` on `nemoclaw onboard --provider ollama` builds in main today. After this PR (same env, but TypeScript now also encodes `inferenceCompat = {\"supportsUsageInStreaming\": true}` into `NEMOCLAW_INFERENCE_COMPAT_B64` for ollama-local): \`\`\` NEMOCLAW_INFERENCE_COMPAT_B64=$(echo -n '{\"supportsUsageInStreaming\":true}' | base64) \`\`\` `openclaw.json` now has `provider.models[0].compat = {\"supportsUsageInStreaming\": true}` ✓. ## Test plan - \`npx vitest run src/lib/inference/config.test.ts\` — 24/24 pass, including two new cases: - \`forces supportsUsageInStreaming for ollama-local (#2747)\` - \`does not force supportsUsageInStreaming for non-ollama providers\` - \`npx vitest run test/generate-openclaw-config.test.ts\` — 67/67 pass (was 70/70 with the three deleted unrealistic cases) - Manual script run with the prod-shaped env values listed above Re-opens — and properly fixes — #2747. Signed-off-by: Shawn Xie <shaxie@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Adjusted provider configuration handling: explicit support for streaming usage reporting for the local Ollama provider and removal of a legacy compatibility override. * **Tests** * Updated tests to cover provider-specific streaming compatibility and to verify model-setup plugin activation only when relevant environment overrides are present. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3683?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Shawn Xie <shaxie@nvidia.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
model.compat.supportsUsageInStreamingis now set when the build'sNEMOCLAW_PROVIDER_KEYisollamaorollama-local, so OpenClaw sendsstream_options.include_usage: trueand the TUI token counter updates after each turn instead of staying?.supportsUsageInStreamingvalues in the incoming inference compat blob take precedence, leaving room for a future model-specific-setup manifest to opt models out.test/generate-openclaw-config.test.ts.Root cause
OpenClaw's Ollama extension already enables streamed usage accounting if its detector (
isOllamaCompatProviderinextensions/ollama/src/stream.tsupstream) recognises the endpoint. The detector requires one of:model.provider === "ollama"(exact match after normalization), orbaseUrlpointing atlocalhost:11434, or a provider id that containsollamareaching port11434on a/v1path.NemoClaw routes ollama-local traffic via the standardised
https://inference.local/v1URL through the OpenShell gateway. The base URL is loopback-equivalent inside the sandbox but the hostname isinference.localand the port is443, so none of the detector's positive paths match. Without that opt-in, OpenClaw omitsstream_options.include_usage, Ollama's/v1/chat/completionsstream omits the trailingusagechunk, OpenClaw'sparseTransportChunkUsagenever runs, andmodel.compat.supportsUsageInStreamingstaysfalse. End result: the TUI'stokens X/Yfield never moves off?.Setting
supportsUsageInStreaming: truedirectly in the compat block bypasses the detector entirely and forces the request to ask for the usage chunk. This mirrors the LM Studio extension'swithLmstudioUsageCompatworkaround (extensions/lmstudio/src/stream.tsupstream), which addresses the same class of bug for LM Studio's OpenAI-compatible streaming.Test plan
npx vitest run test/generate-openclaw-config.test.ts -t "supportsUsageInStreaming"— 3 new cases pass:ollamaandollama-localprovider keysopenai,anthropic,vllm,nim-localsupportsUsageInStreaming: falsein the incoming compat blob is preservednpx vitest run test/generate-openclaw-config.test.ts— 70/70 pass, no regressionsFixes #2747.
Signed-off-by: Shawn Xie shaxie@nvidia.com
Summary by CodeRabbit
New Features
Tests