Skip to content

fix(inference): inject force_nonempty_content for Nemotron models#2380

Merged
ericksoa merged 2 commits into
NVIDIA:mainfrom
ericksoa:fix/nemotron-force-nonempty-content
Apr 23, 2026
Merged

fix(inference): inject force_nonempty_content for Nemotron models#2380
ericksoa merged 2 commits into
NVIDIA:mainfrom
ericksoa:fix/nemotron-force-nonempty-content

Conversation

@ericksoa

@ericksoa ericksoa commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds a Node.js --require preload that injects chat_template_kwargs: { force_nonempty_content: true } into /v1/chat/completions requests when the model ID contains "nemotron"
  • Follows the established http-proxy-fix.js preload pattern (IIFE, emit_sandbox_sourced_file, validate_tmp_permissions, registered in NODE_OPTIONS)
  • Non-Nemotron models pass through completely untouched; backends that don't support the extra field silently ignore it

Context

Nemotron models sometimes return empty content (tool call instead of text, #1193) or thinking-only blocks that OpenClaw treats as end-of-turn, stalling the conversation (#2051). The root cause is the vLLM/NIM chat template producing empty assistant content when tool definitions are present.

The force_nonempty_content chat template kwarg (HuggingFace model card) tells the serving layer to always produce non-empty content alongside any tool calls or thinking blocks. OpenClaw's model config schema doesn't support extra_body passthrough, so the preload injects the parameter at the HTTP layer — transparent to OpenClaw.

Fixes #1193
Fixes #2051

Test plan

  • New test file test/nemotron-inference-fix.test.ts (13 tests) — verifies heredoc embedding, NODE_OPTIONS registration, proxy-env inclusion, validate_tmp_permissions in both paths, model matching, body injection, passthrough, error handling, Content-Length update
  • Updated test/service-env.test.ts — adds _NEMOTRON_FIX_SCRIPT to proxy-env test harnesses, updates assertion to allow the unconditional Nemotron preload
  • Existing test/nemoclaw-start.test.ts (61 tests) — all pass
  • Existing test/http-proxy-fix-sync.test.ts (5 tests) — all pass
  • E2E: onboard with Nemotron model, send a simple "hello" message, verify text response (not empty/tool-call-only)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved Nemotron model inference in chat completions by automatically injecting required content validation into API requests.
  • Tests

    • Introduced comprehensive test coverage for the Nemotron inference fix, validating request interception behavior, model-specific modifications, and integration with system environment configuration.

…IDIA#1193, NVIDIA#2051)

Nemotron models sometimes return empty content (tool call instead of
text) or thinking-only blocks that OpenClaw treats as end-of-turn,
stalling the conversation. The root cause is the vLLM/NIM chat template
producing empty assistant content when tool definitions are present.

Add a Node.js --require preload that intercepts /v1/chat/completions
requests and injects `chat_template_kwargs: { force_nonempty_content:
true }` when the model ID contains "nemotron". This tells the serving
layer to always produce non-empty content alongside tool calls or
thinking blocks.

The preload follows the established http-proxy-fix pattern: IIFE,
emitted via emit_sandbox_sourced_file (root:root 444), registered in
NODE_OPTIONS for both the main process and connect sessions, validated
by validate_tmp_permissions in both root and non-root paths. Non-Nemotron
requests pass through completely untouched. Backends that do not support
the extra field silently ignore it per the OpenAI-compatible contract.

Fixes NVIDIA#1193
Fixes NVIDIA#2051

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The PR adds a Node.js preload script that intercepts HTTP/HTTPS POST requests to /v1/chat/completions, injects chat_template_kwargs.force_nonempty_content=true for Nemotron models, recalculates Content-Length, and wires it into runtime startup via NODE_OPTIONS and proxy-env sourcing. Includes comprehensive validation tests and updates existing regression test assertions.

Changes

Cohort / File(s) Summary
Nemotron Inference Fix Implementation
scripts/nemoclaw-start.sh
Introduces _NEMOTRON_FIX_SCRIPT preload that wraps http and https modules, intercepts POST requests to /v1/chat/completions, conditionally injects force_nonempty_content=true into chat_template_kwargs for Nemotron models (case-insensitive matching), updates Content-Length after body modification, falls back safely on parse errors, and applies preload to interactive sessions via NODE_OPTIONS export in proxy-env. Extends sandbox trust-boundary validation to include the new script alongside existing proxy-fix validation.
Nemotron Fix Validation Tests
test/nemotron-inference-fix.test.ts
New test suite validates preload definition via heredoc, verifies HTTP/HTTPS wrapping, confirms POST-only interception of /v1/chat/completions, checks case-insensitive Nemotron model detection, confirms force_nonempty_content injection into chat_template_kwargs, validates passthrough for non-Nemotron requests and JSON parse errors, confirms Content-Length recalculation, and asserts preload ordering before WebSocket fix.
Service Environment Regression Tests
test/service-env.test.ts
Updates proxy-environment test scenarios to inject _NEMOTRON_FIX_SCRIPT into sandboxed wrapper; refocuses one assertion to verify that when NODE_USE_ENV_PROXY is not 1 and ws-fix is absent, proxy/ws preloads remain absent while nemotron-inference-fix is unconditionally present in proxy-env.sh.

Sequence Diagram

sequenceDiagram
    participant Client
    participant NodePreload as Nemotron Fix Preload
    participant HTTPModule as HTTP/HTTPS Module
    participant RequestHandler as Request Parser
    participant ChatAPI as Chat Completions API
    participant ResponseHandler as Response Modifier

    Client->>NodePreload: POST /v1/chat/completions<br/>(Nemotron model + tools)
    NodePreload->>HTTPModule: Intercept outgoing request
    HTTPModule->>RequestHandler: Buffer request body
    RequestHandler->>RequestHandler: Parse JSON request
    alt Is Nemotron Model?
        RequestHandler->>ResponseHandler: Inject force_nonempty_content=true<br/>into chat_template_kwargs
        ResponseHandler->>ResponseHandler: Recalculate Content-Length
        ResponseHandler->>ChatAPI: Forward modified request
    else Non-Nemotron Model
        RequestHandler->>ChatAPI: Forward original request
    end
    ChatAPI-->>ResponseHandler: Response with content
    ResponseHandler-->>Client: Return response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A preload script hops through the wire,
Catching Nemotron's silent desire,
"Force content!" it whispers with care,
Where empty responses once hung in the air,
HTTP POST requests now dance and redirect,
Tool calls transform—no silence, just text! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: injecting force_nonempty_content for Nemotron models, which aligns with the PR's primary objective to fix empty content issues.
Linked Issues check ✅ Passed The PR implements HTTP request interception to inject force_nonempty_content for Nemotron models, directly addressing both #1193 (empty content with tool calls) and #2051 (thinking-only blocks causing stalls).
Out of Scope Changes check ✅ Passed All changes are scoped to implementing the Nemotron inference fix: the preload script, test coverage, and service-env updates. No unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)

979-979: Unused variable intercepted should be prefixed with _.

The intercepted variable is assigned on line 1009 but never read. Per coding guidelines, unused variables must be prefixed with _.

🔧 Suggested fix
       var origEnd = req.end;
       var chunks = [];
-      var intercepted = false;
+      var _intercepted = false;

And correspondingly on line 1009:

-            intercepted = true;
+            _intercepted = true;

Alternatively, if debugging or metrics are planned, consider removing the variable entirely until needed.

As per coding guidelines: "Unused variables must be prefixed with _ in JavaScript and TypeScript files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` at line 979, The variable intercepted is declared
but never used; rename it to _intercepted to follow the unused-variable
convention (or remove the declaration entirely if not needed). Locate the
declaration "var intercepted = false;" and either change the identifier to
"_intercepted" or delete that statement, and also update the assignment site
referenced later (where intercepted is set) to use "_intercepted" if you keep
it; ensure no other code reads the original name before committing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Line 979: The variable intercepted is declared but never used; rename it to
_intercepted to follow the unused-variable convention (or remove the declaration
entirely if not needed). Locate the declaration "var intercepted = false;" and
either change the identifier to "_intercepted" or delete that statement, and
also update the assignment site referenced later (where intercepted is set) to
use "_intercepted" if you keep it; ensure no other code reads the original name
before committing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a1bc483a-3a2d-40ce-9e8b-490f03e5f26b

📥 Commits

Reviewing files that changed from the base of the PR and between a3aafcd and 6daf9ae.

📒 Files selected for processing (3)
  • scripts/nemoclaw-start.sh
  • test/nemotron-inference-fix.test.ts
  • test/service-env.test.ts

@ericksoa ericksoa merged commit 225a3be into NVIDIA:main Apr 23, 2026
19 checks passed
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

3 participants