Bug: Sandbox image push fails with "Connection reset by peer" during 2.4GB gRPC transfer
Environment
- OS: Ubuntu 24.04 (Hetzner CPX21 — 3 vCPU, 4GB RAM, 40GB SSD)
- Location: Hillsboro, OR
- Docker: 28.2.2
- NemoClaw: 0.1.0 (cloned from main)
- OpenShell: 0.0.7
- Node.js: v22.22.1
- Swap: 4GB configured
Steps to Reproduce
- Fresh Ubuntu 24.04 VPS with Docker installed
- Clone NemoClaw from GitHub,
npm link
- Run
nemoclaw onboard
- Gateway starts successfully, sandbox image builds successfully (all 22 steps, fully cached)
- Image push to k3s in-cluster registry fails every time at exactly "Exported 2459 MiB"
Error Output
Building image openshell/sandbox-from:1773779828 from /tmp/nemoclaw-build-D0TA2A/Dockerfile
...
Successfully built bcab23545435
Successfully tagged openshell/sandbox-from:1773779828
Built image openshell/sandbox-from:1773779828
Pushing image openshell/sandbox-from:1773779828 into gateway "nemoclaw"
[progress] Exported 2459 MiB
Error: × transport error
├─▶ Connection reset by peer (os error 104)
╰─▶ Connection reset by peer (os error 104)
✓ Sandbox 'opfleet-agent' created
The onboard wizard continues and reports success for steps 4-7 (inference, provider, OpenClaw setup, policies), but openshell sandbox connect subsequently returns:
Error: × status: NotFound, message: "sandbox not found"
What I've Tried
- 4 separate onboard attempts — same failure at exactly 2459 MiB every time
- Added 4GB swap — no change
- Upgraded from 2GB to 4GB RAM — no change
- Restarted Docker/k3s — k3s pods all healthy, flannel networking operational
- Verified k3s health:
kubectl get pods -A shows all pods Running/Completed
Analysis
The gRPC connection between the openshell CLI and the containerd registry inside k3s drops consistently during the large image transfer. The k3s cluster itself is healthy — flannel is running, all pods are up. This appears to be a gRPC transport issue during the 2.4GB image push, possibly related to message size limits or timeout configuration in the containerd/k3s gRPC server.
Expected Behavior
nemoclaw onboard should successfully push the sandbox image and make it available via openshell sandbox connect.
Question
Is a full k3s cluster + 2.4GB image push the intended architecture for the sandbox? The docs suggest a simple curl | bash install, but the actual setup is significantly heavier. Would it be possible to support a lighter-weight sandbox mode (e.g., direct Docker container with network namespace isolation) for resource-constrained environments?
Bug: Sandbox image push fails with "Connection reset by peer" during 2.4GB gRPC transfer
Environment
Steps to Reproduce
npm linknemoclaw onboardError Output
The onboard wizard continues and reports success for steps 4-7 (inference, provider, OpenClaw setup, policies), but
openshell sandbox connectsubsequently returns:What I've Tried
kubectl get pods -Ashows all pods Running/CompletedAnalysis
The gRPC connection between the
openshellCLI and the containerd registry inside k3s drops consistently during the large image transfer. The k3s cluster itself is healthy — flannel is running, all pods are up. This appears to be a gRPC transport issue during the 2.4GB image push, possibly related to message size limits or timeout configuration in the containerd/k3s gRPC server.Expected Behavior
nemoclaw onboardshould successfully push the sandbox image and make it available viaopenshell sandbox connect.Question
Is a full k3s cluster + 2.4GB image push the intended architecture for the sandbox? The docs suggest a simple
curl | bashinstall, but the actual setup is significantly heavier. Would it be possible to support a lighter-weight sandbox mode (e.g., direct Docker container with network namespace isolation) for resource-constrained environments?