fix(inference): inject force_nonempty_content for Nemotron models#2380
Conversation
…IDIA#1193, NVIDIA#2051) Nemotron models sometimes return empty content (tool call instead of text) or thinking-only blocks that OpenClaw treats as end-of-turn, stalling the conversation. The root cause is the vLLM/NIM chat template producing empty assistant content when tool definitions are present. Add a Node.js --require preload that intercepts /v1/chat/completions requests and injects `chat_template_kwargs: { force_nonempty_content: true }` when the model ID contains "nemotron". This tells the serving layer to always produce non-empty content alongside tool calls or thinking blocks. The preload follows the established http-proxy-fix pattern: IIFE, emitted via emit_sandbox_sourced_file (root:root 444), registered in NODE_OPTIONS for both the main process and connect sessions, validated by validate_tmp_permissions in both root and non-root paths. Non-Nemotron requests pass through completely untouched. Backends that do not support the extra field silently ignore it per the OpenAI-compatible contract. Fixes NVIDIA#1193 Fixes NVIDIA#2051 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR adds a Node.js preload script that intercepts HTTP/HTTPS POST requests to Changes
Sequence DiagramsequenceDiagram
participant Client
participant NodePreload as Nemotron Fix Preload
participant HTTPModule as HTTP/HTTPS Module
participant RequestHandler as Request Parser
participant ChatAPI as Chat Completions API
participant ResponseHandler as Response Modifier
Client->>NodePreload: POST /v1/chat/completions<br/>(Nemotron model + tools)
NodePreload->>HTTPModule: Intercept outgoing request
HTTPModule->>RequestHandler: Buffer request body
RequestHandler->>RequestHandler: Parse JSON request
alt Is Nemotron Model?
RequestHandler->>ResponseHandler: Inject force_nonempty_content=true<br/>into chat_template_kwargs
ResponseHandler->>ResponseHandler: Recalculate Content-Length
ResponseHandler->>ChatAPI: Forward modified request
else Non-Nemotron Model
RequestHandler->>ChatAPI: Forward original request
end
ChatAPI-->>ResponseHandler: Response with content
ResponseHandler-->>Client: Return response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)
979-979: Unused variableinterceptedshould be prefixed with_.The
interceptedvariable is assigned on line 1009 but never read. Per coding guidelines, unused variables must be prefixed with_.🔧 Suggested fix
var origEnd = req.end; var chunks = []; - var intercepted = false; + var _intercepted = false;And correspondingly on line 1009:
- intercepted = true; + _intercepted = true;Alternatively, if debugging or metrics are planned, consider removing the variable entirely until needed.
As per coding guidelines: "Unused variables must be prefixed with
_in JavaScript and TypeScript files".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` at line 979, The variable intercepted is declared but never used; rename it to _intercepted to follow the unused-variable convention (or remove the declaration entirely if not needed). Locate the declaration "var intercepted = false;" and either change the identifier to "_intercepted" or delete that statement, and also update the assignment site referenced later (where intercepted is set) to use "_intercepted" if you keep it; ensure no other code reads the original name before committing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Line 979: The variable intercepted is declared but never used; rename it to
_intercepted to follow the unused-variable convention (or remove the declaration
entirely if not needed). Locate the declaration "var intercepted = false;" and
either change the identifier to "_intercepted" or delete that statement, and
also update the assignment site referenced later (where intercepted is set) to
use "_intercepted" if you keep it; ensure no other code reads the original name
before committing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a1bc483a-3a2d-40ce-9e8b-490f03e5f26b
📒 Files selected for processing (3)
scripts/nemoclaw-start.shtest/nemotron-inference-fix.test.tstest/service-env.test.ts
Summary
--requirepreload that injectschat_template_kwargs: { force_nonempty_content: true }into/v1/chat/completionsrequests when the model ID contains "nemotron"http-proxy-fix.jspreload pattern (IIFE,emit_sandbox_sourced_file,validate_tmp_permissions, registered inNODE_OPTIONS)Context
Nemotron models sometimes return empty content (tool call instead of text, #1193) or thinking-only blocks that OpenClaw treats as end-of-turn, stalling the conversation (#2051). The root cause is the vLLM/NIM chat template producing empty assistant content when tool definitions are present.
The
force_nonempty_contentchat template kwarg (HuggingFace model card) tells the serving layer to always produce non-empty content alongside any tool calls or thinking blocks. OpenClaw's model config schema doesn't supportextra_bodypassthrough, so the preload injects the parameter at the HTTP layer — transparent to OpenClaw.Fixes #1193
Fixes #2051
Test plan
test/nemotron-inference-fix.test.ts(13 tests) — verifies heredoc embedding, NODE_OPTIONS registration, proxy-env inclusion, validate_tmp_permissions in both paths, model matching, body injection, passthrough, error handling, Content-Length updatetest/service-env.test.ts— adds_NEMOTRON_FIX_SCRIPTto proxy-env test harnesses, updates assertion to allow the unconditional Nemotron preloadtest/nemoclaw-start.test.ts(61 tests) — all passtest/http-proxy-fix-sync.test.ts(5 tests) — all pass🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests