Summary
I got local vLLM (Nemotron Nano 9B v2) working from inside a NemoClaw sandbox on WSL2 + RTX 5090. The provider system (--type openai with custom OPENAI_BASE_URL) works at the API level, but the sandbox network isolation requires three manual workarounds to let traffic through:
- Host iptables — Allow Docker bridge → vLLM port in
DOCKER-USER chain
- TCP relay — Python relay in pod main namespace to bridge sandbox veth → Docker bridge
- Sandbox iptables —
nsenter to inject ACCEPT rule before the OUTPUT REJECT
All three are volatile (reset on restart). Network policy update is the only persistent piece.
Beyond basic inference, I also got tool call execution working with the opencode agent inside the sandbox. Nemotron 9B outputs tool calls as <TOOLCALL>[...]</TOOLCALL> text, which isn't compatible with OpenAI's structured tool_calls format. A custom gateway between the agent and vLLM buffers SSE streams and translates the text format into structured tool call objects, enabling the agent to actually execute tools (file read/write, shell commands, etc.) via local inference.
References
Environment
- WSL2 (Ubuntu 24.04), RTX 5090, CUDA 13.1
- openshell 0.0.7
- vLLM 0.15.1+ (OpenAI-compatible API)
Context
Not a bug report or feature request — just sharing what it took to make this work in case it's useful for the team. Related: #305.
Summary
I got local vLLM (Nemotron Nano 9B v2) working from inside a NemoClaw sandbox on WSL2 + RTX 5090. The provider system (
--type openaiwith customOPENAI_BASE_URL) works at the API level, but the sandbox network isolation requires three manual workarounds to let traffic through:DOCKER-USERchainnsenterto inject ACCEPT rule before the OUTPUT REJECTAll three are volatile (reset on restart). Network policy update is the only persistent piece.
Beyond basic inference, I also got tool call execution working with the opencode agent inside the sandbox. Nemotron 9B outputs tool calls as
<TOOLCALL>[...]</TOOLCALL>text, which isn't compatible with OpenAI's structuredtool_callsformat. A custom gateway between the agent and vLLM buffers SSE streams and translates the text format into structured tool call objects, enabling the agent to actually execute tools (file read/write, shell commands, etc.) via local inference.References
Environment
Context
Not a bug report or feature request — just sharing what it took to make this work in case it's useful for the team. Related: #305.