Skip to content

WSL2: restarting openshell-cluster breaks gateway/sandbox connectivity and forces re-onboarding #716

@BartekBis

Description

@BartekBis

Description

Actual result:

  • openshell status fails with:
    Connection reset by peer (os error 104)
  • openshell sandbox list may briefly show the sandbox in Provisioning, then later fails with the same transport error
  • nemoclaw openclaw-sandbox status still reports the sandbox as Ready

Expected result:

  • Restarting the OpenShell cluster container should either preserve a usable gateway/sandbox state or clearly require reattachment/recovery without forcing a full re-onboard.

Notes:

  • Docker volume is mounted correctly at /var/lib/rancher/k3s
  • Container is healthy before the restart
  • Full nemoclaw onboard recreates a working environment, but only until the next container restart

Reproduction Steps

  1. Run nemoclaw onboard
  2. Create sandbox openclaw-sandbox
  3. Verify:
    • openshell sandbox list shows openclaw-sandbox in Ready
    • nemoclaw openclaw-sandbox status shows Ready
  4. Run:
    docker restart openshell-cluster-nemoclaw
  5. Then run:
    • openshell status
    • openshell sandbox list
    • nemoclaw openclaw-sandbox status

Environment

OS:

  • Windows 10 Pro
  • WSL2 (Ubuntu)
  • Docker Desktop (WSL2 backend)

OpenShell:

  • Version: 0.0.13
  • Gateway image: ghcr.io/nvidia/openshell/cluster:0.0.13

NemoClaw:

  • Installed via onboarding script (latest available as of March 2026)

Docker:

  • Engine: Docker Desktop
  • Container runtime: docker-desktop
  • Storage: Docker volume mounted at /var/lib/rancher/k3s

Node.js:

  • Not explicitly used / not relevant to this setup

GPU:

  • NVIDIA GPU detected
  • ~11 GB VRAM

Networking:

Architecture:

  • Windows host
    → WSL2
    → Docker container (openshell-cluster)
    → k3s / containerd
    → OpenShell sandbox
    → OpenClaw inside sandbox

Debug Output

Logs

$ nemoclaw onboard

  NemoClaw Onboarding
  ===================

  [1/7] Preflight checks
  ──────────────────────────────────────────────────
  ✓ Docker is running
  ✓ Container runtime: docker-desktop
  ✓ openshell CLI: openshell 0.0.13
  ✓ Port 8080 available (OpenShell gateway)
  ✓ Port 18789 available (NemoClaw dashboard)
  ✓ NVIDIA GPU detected: 1 GPU(s), ~11 GB VRAM

  [2/7] Starting OpenShell gateway
  ──────────────────────────────────────────────────
  Using pinned OpenShell gateway image: ghcr.io/nvidia/openshell/cluster:0.0.13
  ✓ Checking Docker
  ✓ Downloading gateway
  ✓ Initializing environment
  ✓ Starting gateway
  ✓ Gateway ready

    Name: nemoclaw
    Endpoint: https://127.0.0.1:8080

  ✓ Active gateway set to 'nemoclaw'
    ✓ Gateway is healthy

  [3/7] Creating sandbox
  ──────────────────────────────────────────────────
  Sandbox name (lowercase, numbers, hyphens) [my-assistant]: openclaw-sandbox
  Sandbox 'openclaw-sandbox' already exists. Recreate? [y/N]: y
  Creating sandbox 'openclaw-sandbox' (this takes a few minutes on first run)...
  Building image openshell/sandbox-from:<redacted> from /tmp/nemoclaw-build-<redacted>/Dockerfile
  Built image openshell/sandbox-from:<redacted>
  Pushing image openshell/sandbox-from:<redacted> into gateway "nemoclaw"
  [progress] Exported 1173 MiB
  [progress] Uploaded to gateway
  Image openshell/sandbox-from:<redacted> is available in the gateway.
  Waiting for sandbox to become ready...
  ✓ Forwarding port 18789 to sandbox openclaw-sandbox in the background
    Access at: http://127.0.0.1:18789/
    Stop with: openshell forward stop 18789 openclaw-sandbox
    ✓ Sandbox 'openclaw-sandbox' created

  [4/7] Configuring inference (NIM)
  ──────────────────────────────────────────────────

  Cloud models:
    1) Nemotron 3 Super 120B (nvidia/nemotron-3-super-120b-a12b)
    2) Kimi K2.5 (moonshotai/kimi-k2.5)
    3) GLM-5 (z-ai/glm5)
    4) MiniMax M2.5 (minimaxai/minimax-m2.5)
    5) Qwen3.5 397B A17B (qwen/qwen3.5-397b-a17b)
    6) GPT-OSS 120B (openai/gpt-oss-120b)

  Choose model [1]:
  Using NVIDIA Endpoint API with model: nvidia/nemotron-3-super-120b-a12b

  [5/7] Setting up inference provider
  ──────────────────────────────────────────────────
  ✓ Created provider nvidia-nim
  Gateway inference configured:

    Route: inference.local
    Provider: nvidia-nim
    Model: nvidia/nemotron-3-super-120b-a12b
    Version: 1

    ✓ Inference route set: nvidia-nim / nvidia/nemotron-3-super-120b-a12b

  [6/7] Setting up OpenClaw inside sandbox
  ──────────────────────────────────────────────────
    ✓ OpenClaw gateway launched inside sandbox

  [7/7] Policy presets
  ──────────────────────────────────────────────────

  Available policy presets:
    ○ discord — Discord API, gateway, and CDN access
    ○ docker — Docker Hub and NVIDIA container registry access
    ○ huggingface — Hugging Face Hub, LFS, and Inference API access
    ○ jira — Jira and Atlassian Cloud access
    ○ npm — npm and Yarn registry access (suggested)
    ○ outlook — Microsoft Outlook and Graph API access
    ○ pypi — Python Package Index (PyPI) access (suggested)
    ○ slack — Slack API and webhooks access
    ○ telegram — Telegram Bot API access

  Apply suggested presets (pypi, npm)? [Y/n/list]: list
  Enter preset names (comma-separated): npm,pypi,telegram
  ✓ Policy version submitted
  ✓ Policy version loaded
    Applied preset: npm
  ✓ Policy version submitted
  ✓ Policy version loaded
    Applied preset: pypi
  ✓ Policy version submitted
  ✓ Policy version loaded
    Applied preset: telegram
    ✓ Policies applied

  ──────────────────────────────────────────────────
  Sandbox      openclaw-sandbox (Landlock + seccomp + netns)
  Model        nvidia/nemotron-3-super-120b-a12b (NVIDIA Endpoint API)
  NIM          not running
  ──────────────────────────────────────────────────
  Next:
  Run:         nemoclaw openclaw-sandbox connect
  Status:      nemoclaw openclaw-sandbox status
  Logs:        nemoclaw openclaw-sandbox logs --follow
  ──────────────────────────────────────────────────

$ openshell sandbox list
NAME              NAMESPACE  CREATED              PHASE
openclaw-sandbox  openshell  2026-03-23 13:07:11  Ready

$ nemoclaw openclaw-sandbox status

  Sandbox: openclaw-sandbox
    Model:    nvidia/nemotron-3-super-120b-a12b
    Provider: nvidia-nim
    GPU:      yes
    Policies: npm, pypi, telegram

Sandbox:

  Id: <redacted>
  Name: openclaw-sandbox
  Namespace: openshell
  Phase: Ready

Policy:
  version: 1
  ...
  NIM: not running

$ docker restart openshell-cluster-nemoclaw
openshell-cluster-nemoclaw

$ openshell status
Server Status

  Gateway: nemoclaw
  Server: https://127.0.0.1:8080
Error:   × client error (Connect)
  ╰─▶ Connection reset by peer (os error 104)

$ openshell sandbox list
NAME              NAMESPACE  CREATED              PHASE
openclaw-sandbox  openshell  2026-03-23 13:07:11  Provisioning

$ openshell sandbox list
Error:   × transport error
  ├─▶ Connection reset by peer (os error 104)
  ╰─▶ Connection reset by peer (os error 104)

$ nemoclaw openclaw-sandbox status

  Sandbox: openclaw-sandbox
    Model:    nvidia/nemotron-3-super-120b-a12b
    Provider: nvidia-nim
    GPU:      yes
    Policies: npm, pypi, telegram

Sandbox:

  Id: <redacted>
  Name: openclaw-sandbox
  Namespace: openshell
  Phase: Ready

Policy:
  version: 1
  ...
  NIM: not running

Checklist

  • I confirmed this bug is reproducible
  • I searched existing issues and this is not a duplicate

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryplatform: wslAffects Windows Subsystem for Linux

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions