Skip to content

WSL2 Support Tracking — Known Gaps & Workarounds #305

@rudiheydra

Description

@rudiheydra

WSL2 Support Tracking — Known Gaps & Workarounds

Summary

NemoClaw can run on WSL2 (Windows Subsystem for Linux 2), but requires significant manual intervention due to nested container networking issues. The embedded k3s cluster inside OpenShell cannot reliably reach container registries from within WSL2's virtualized network stack.

This issue tracks all known WSL2-specific bugs, workarounds, and proposed fixes in one place.

Full field guide with step-by-step fixes: Installing NemoClaw on WSL2: A Field Guide (link TBD)

Current WSL2 Status

Feature Status Notes
Install NemoClaw CLI ✅ Works npm install succeeds
OpenShell gateway start ❌ Fails Timeout waiting for namespace (image pull failure)
Gateway with manual image seeding ✅ Works Requires crane + manual ctr import
TLS certificate bootstrap ❌ Fails Must generate X.509 v3 certs manually
nemoclaw onboard ⚠️ Destructive Destroys working gateway, starts fresh (hits same timeout)
Sandbox creation ⚠️ Requires patch imagePullPolicy: Always must be changed to IfNotPresent
NVIDIA Cloud NIM inference ✅ Works Once sandbox is running
Local Ollama inference ❓ Untested

Root Cause

WSL2 runs a Linux kernel inside a Hyper-V VM. Docker runs inside that VM. OpenShell creates a container running k3s, which runs its own containerd. That's four layers of virtualization:

Windows Host
  └── WSL2 VM (Hyper-V)
       └── Docker Engine
            └── openshell-cluster-nemoclaw container
                 └── k3s + containerd
                      └── tries to reach registry-1.docker.io ← FAILS

The outermost Docker can reach registries. The innermost containerd cannot — likely due to how WSL2's virtual network adapter handles nested container DNS and routing. This isn't unique to NemoClaw; any tool running containerd-inside-Docker-inside-WSL2 (k3d, kind, Rancher Desktop) can hit similar issues.

Known Issues

1. Gateway bootstrap timeout — nested containerd can't pull images

Symptom:

K8s namespace not ready
timed out waiting for namespace 'openshell' to exist

Root cause: Pods are stuck in ImagePullBackOff because containerd inside k3s can't reach Docker Hub or GHCR.

Workaround: Use crane (Google's container registry tool) to pull images on the host, then manually import them into the embedded containerd:

crane pull --platform linux/amd64 docker.io/rancher/mirrored-pause:3.6 /tmp/pause.tar
docker cp /tmp/pause.tar openshell-cluster-nemoclaw:/tmp/pause.tar
docker exec openshell-cluster-nemoclaw ctr -n k8s.io images import /tmp/pause.tar

Why not docker save? Docker Desktop produces OCI index archives that ctr import cannot parse. You'll get content digest sha256:...: not found. crane produces the correct single-platform tarball format.

2. nemoclaw onboard destroys working gateway

Symptom: You manually fix the gateway (seed images, create TLS certs, get it healthy), then run nemoclaw onboard. It destroys your working gateway and starts fresh, hitting the same timeout.

Root cause: bin/lib/onboard.js unconditionally calls openshell gateway destroy before openshell gateway start.

Proposed fix (PR candidate):

// In startGateway(), before the destroy+start block:
const statusCheck = await $`openshell gateway status -g ${gatewayName}`.nothrow();
if (statusCheck.exitCode === 0 && statusCheck.stdout.includes('connected')) {
  console.log(`Gateway '${gatewayName}' is already connected, skipping restart.`);
  return;
}

This benefits any environment where the gateway was pre-configured — not just WSL2.

3. imagePullPolicy: Always prevents using seeded images

Symptom: Images are imported into containerd, but pods still fail with ImagePullBackOff.

Root cause: The OpenShell gateway StatefulSet and sandbox pods have imagePullPolicy: Always, which forces k8s to contact the registry even when the image exists locally.

Workaround:

docker exec openshell-cluster-nemoclaw kubectl patch statefulset openshell \
  -n openshell --type='json' \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/imagePullPolicy","value":"IfNotPresent"}]'

Proposed fix: Default to IfNotPresent or make it configurable.

4. TLS certificates must be X.509 v3

Symptom:

invalid peer certificate: Other(OtherError(UnsupportedCertVersion))

Root cause: Basic openssl commands produce X.509 v1 certs. OpenShell requires v3.

Workaround: Generate certs with proper extensions:

openssl req -new -x509 -days 365 -key /tmp/ca.key -out /tmp/ca.crt \
  -subj "/CN=openshell-ca" \
  -addext "basicConstraints=critical,CA:TRUE" \
  -addext "keyUsage=critical,keyCertSign,cRLSign"

Environment

Tested on:

  • Windows 11 + WSL2 Ubuntu 22.04
  • Docker Engine 28.x (inside WSL2)
  • 16GB RAM allocated to WSL
  • NemoClaw from npm / GitHub main
  • OpenShell 0.0.9 / 0.0.10

Proposed Improvements

  1. [Easy] Skip gateway destroy if already connected — PR the startGateway() patch above
  2. [Medium] Default imagePullPolicy to IfNotPresent — or make it configurable via env var
  3. [Docs] Add WSL2 troubleshooting section — link to field guide or inline the key workarounds
  4. [Hard] Investigate nested containerd networking — root cause fix in OpenShell or k3s config

Related Issues


Tested and documented by Substr8 Labs — building trust infrastructure for AI agents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    platform: wslAffects Windows Subsystem for Linux

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions