fix(inference): inject force_nonempty_content for Nemotron models by ericksoa · Pull Request #2380 · NVIDIA/NemoClaw

ericksoa · 2026-04-23T20:47:01Z

Summary

Adds a Node.js --require preload that injects chat_template_kwargs: { force_nonempty_content: true } into /v1/chat/completions requests when the model ID contains "nemotron"
Follows the established http-proxy-fix.js preload pattern (IIFE, emit_sandbox_sourced_file, validate_tmp_permissions, registered in NODE_OPTIONS)
Non-Nemotron models pass through completely untouched; backends that don't support the extra field silently ignore it

Context

Nemotron models sometimes return empty content (tool call instead of text, #1193) or thinking-only blocks that OpenClaw treats as end-of-turn, stalling the conversation (#2051). The root cause is the vLLM/NIM chat template producing empty assistant content when tool definitions are present.

The force_nonempty_content chat template kwarg (HuggingFace model card) tells the serving layer to always produce non-empty content alongside any tool calls or thinking blocks. OpenClaw's model config schema doesn't support extra_body passthrough, so the preload injects the parameter at the HTTP layer — transparent to OpenClaw.

Fixes #1193
Fixes #2051

Test plan

New test file test/nemotron-inference-fix.test.ts (13 tests) — verifies heredoc embedding, NODE_OPTIONS registration, proxy-env inclusion, validate_tmp_permissions in both paths, model matching, body injection, passthrough, error handling, Content-Length update
Updated test/service-env.test.ts — adds _NEMOTRON_FIX_SCRIPT to proxy-env test harnesses, updates assertion to allow the unconditional Nemotron preload
Existing test/nemoclaw-start.test.ts (61 tests) — all pass
Existing test/http-proxy-fix-sync.test.ts (5 tests) — all pass
E2E: onboard with Nemotron model, send a simple "hello" message, verify text response (not empty/tool-call-only)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved Nemotron model inference in chat completions by automatically injecting required content validation into API requests.
Tests
- Introduced comprehensive test coverage for the Nemotron inference fix, validating request interception behavior, model-specific modifications, and integration with system environment configuration.

…IDIA#1193, NVIDIA#2051) Nemotron models sometimes return empty content (tool call instead of text) or thinking-only blocks that OpenClaw treats as end-of-turn, stalling the conversation. The root cause is the vLLM/NIM chat template producing empty assistant content when tool definitions are present. Add a Node.js --require preload that intercepts /v1/chat/completions requests and injects `chat_template_kwargs: { force_nonempty_content: true }` when the model ID contains "nemotron". This tells the serving layer to always produce non-empty content alongside tool calls or thinking blocks. The preload follows the established http-proxy-fix pattern: IIFE, emitted via emit_sandbox_sourced_file (root:root 444), registered in NODE_OPTIONS for both the main process and connect sessions, validated by validate_tmp_permissions in both root and non-root paths. Non-Nemotron requests pass through completely untouched. Backends that do not support the extra field silently ignore it per the OpenAI-compatible contract. Fixes NVIDIA#1193 Fixes NVIDIA#2051 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-23T20:47:13Z

📝 Walkthrough

Walkthrough

The PR adds a Node.js preload script that intercepts HTTP/HTTPS POST requests to /v1/chat/completions, injects chat_template_kwargs.force_nonempty_content=true for Nemotron models, recalculates Content-Length, and wires it into runtime startup via NODE_OPTIONS and proxy-env sourcing. Includes comprehensive validation tests and updates existing regression test assertions.

Changes

Cohort / File(s)	Summary
Nemotron Inference Fix Implementation `scripts/nemoclaw-start.sh`	Introduces `_NEMOTRON_FIX_SCRIPT` preload that wraps `http` and `https` modules, intercepts POST requests to `/v1/chat/completions`, conditionally injects `force_nonempty_content=true` into `chat_template_kwargs` for Nemotron models (case-insensitive matching), updates `Content-Length` after body modification, falls back safely on parse errors, and applies preload to interactive sessions via `NODE_OPTIONS` export in proxy-env. Extends sandbox trust-boundary validation to include the new script alongside existing proxy-fix validation.
Nemotron Fix Validation Tests `test/nemotron-inference-fix.test.ts`	New test suite validates preload definition via heredoc, verifies HTTP/HTTPS wrapping, confirms POST-only interception of `/v1/chat/completions`, checks case-insensitive Nemotron model detection, confirms `force_nonempty_content` injection into `chat_template_kwargs`, validates passthrough for non-Nemotron requests and JSON parse errors, confirms `Content-Length` recalculation, and asserts preload ordering before WebSocket fix.
Service Environment Regression Tests `test/service-env.test.ts`	Updates proxy-environment test scenarios to inject `_NEMOTRON_FIX_SCRIPT` into sandboxed wrapper; refocuses one assertion to verify that when `NODE_USE_ENV_PROXY` is not `1` and ws-fix is absent, proxy/ws preloads remain absent while `nemotron-inference-fix` is unconditionally present in `proxy-env.sh`.

Sequence Diagram

sequenceDiagram
    participant Client
    participant NodePreload as Nemotron Fix Preload
    participant HTTPModule as HTTP/HTTPS Module
    participant RequestHandler as Request Parser
    participant ChatAPI as Chat Completions API
    participant ResponseHandler as Response Modifier

    Client->>NodePreload: POST /v1/chat/completions<br/>(Nemotron model + tools)
    NodePreload->>HTTPModule: Intercept outgoing request
    HTTPModule->>RequestHandler: Buffer request body
    RequestHandler->>RequestHandler: Parse JSON request
    alt Is Nemotron Model?
        RequestHandler->>ResponseHandler: Inject force_nonempty_content=true<br/>into chat_template_kwargs
        ResponseHandler->>ResponseHandler: Recalculate Content-Length
        ResponseHandler->>ChatAPI: Forward modified request
    else Non-Nemotron Model
        RequestHandler->>ChatAPI: Forward original request
    end
    ChatAPI-->>ResponseHandler: Response with content
    ResponseHandler-->>Client: Return response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A preload script hops through the wire,
Catching Nemotron's silent desire,
"Force content!" it whispers with care,
Where empty responses once hung in the air,
HTTP POST requests now dance and redirect,
Tool calls transform—no silence, just text! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: injecting force_nonempty_content for Nemotron models, which aligns with the PR's primary objective to fix empty content issues.
Linked Issues check	✅ Passed	The PR implements HTTP request interception to inject force_nonempty_content for Nemotron models, directly addressing both `#1193` (empty content with tool calls) and `#2051` (thinking-only blocks causing stalls).
Out of Scope Changes check	✅ Passed	All changes are scoped to implementing the Nemotron inference fix: the preload script, test coverage, and service-env updates. No unrelated modifications detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

scripts/nemoclaw-start.sh (1)
979-979: Unused variable intercepted should be prefixed with _.

The intercepted variable is assigned on line 1009 but never read. Per coding guidelines, unused variables must be prefixed with _.
🔧 Suggested fix
       var origEnd = req.end;
       var chunks = [];
-      var intercepted = false;
+      var _intercepted = false;
And correspondingly on line 1009:
-            intercepted = true;
+            _intercepted = true;
Alternatively, if debugging or metrics are planned, consider removing the variable entirely until needed.

As per coding guidelines: "Unused variables must be prefixed with _ in JavaScript and TypeScript files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` at line 979, The variable intercepted is declared
but never used; rename it to _intercepted to follow the unused-variable
convention (or remove the declaration entirely if not needed). Locate the
declaration "var intercepted = false;" and either change the identifier to
"_intercepted" or delete that statement, and also update the assignment site
referenced later (where intercepted is set) to use "_intercepted" if you keep
it; ensure no other code reads the original name before committing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Line 979: The variable intercepted is declared but never used; rename it to
_intercepted to follow the unused-variable convention (or remove the declaration
entirely if not needed). Locate the declaration "var intercepted = false;" and
either change the identifier to "_intercepted" or delete that statement, and
also update the assignment site referenced later (where intercepted is set) to
use "_intercepted" if you keep it; ensure no other code reads the original name
before committing.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a1bc483a-3a2d-40ce-9e8b-490f03e5f26b

📥 Commits

Reviewing files that changed from the base of the PR and between a3aafcd and 6daf9ae.

📒 Files selected for processing (3)

scripts/nemoclaw-start.sh
test/nemotron-inference-fix.test.ts
test/service-env.test.ts

Merge branch 'main' into fix/nemotron-force-nonempty-content

bfa79eb

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

cv approved these changes Apr 23, 2026

View reviewed changes

ericksoa merged commit 225a3be into NVIDIA:main Apr 23, 2026
19 checks passed

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference): inject force_nonempty_content for Nemotron models#2380

fix(inference): inject force_nonempty_content for Nemotron models#2380
ericksoa merged 2 commits into
NVIDIA:mainfrom
ericksoa:fix/nemotron-force-nonempty-content

ericksoa commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ericksoa commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericksoa commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading