Skip to content

Local vLLM inference from sandbox on WSL2 + RTX 5090 — workaround documented #315

@soy-tuber

Description

@soy-tuber

Summary

I got local vLLM (Nemotron Nano 9B v2) working from inside a NemoClaw sandbox on WSL2 + RTX 5090. The provider system (--type openai with custom OPENAI_BASE_URL) works at the API level, but the sandbox network isolation requires three manual workarounds to let traffic through:

  1. Host iptables — Allow Docker bridge → vLLM port in DOCKER-USER chain
  2. TCP relay — Python relay in pod main namespace to bridge sandbox veth → Docker bridge
  3. Sandbox iptablesnsenter to inject ACCEPT rule before the OUTPUT REJECT

All three are volatile (reset on restart). Network policy update is the only persistent piece.

Beyond basic inference, I also got tool call execution working with the opencode agent inside the sandbox. Nemotron 9B outputs tool calls as <TOOLCALL>[...]</TOOLCALL> text, which isn't compatible with OpenAI's structured tool_calls format. A custom gateway between the agent and vLLM buffers SSE streams and translates the text format into structured tool call objects, enabling the agent to actually execute tools (file read/write, shell commands, etc.) via local inference.

References

Environment

  • WSL2 (Ubuntu 24.04), RTX 5090, CUDA 13.1
  • openshell 0.0.7
  • vLLM 0.15.1+ (OpenAI-compatible API)

Context

Not a bug report or feature request — just sharing what it took to make this work in case it's useful for the team. Related: #305.

Metadata

Metadata

Assignees

No one assigned

    Labels

    platform: wslAffects Windows Subsystem for Linux

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions