Skip to content

Sandbox image push fails with 'Connection reset by peer' at 2459 MiB during onboard #245

@thgeorge2561

Description

@thgeorge2561

Bug: Sandbox image push fails with "Connection reset by peer" during 2.4GB gRPC transfer

Environment

  • OS: Ubuntu 24.04 (Hetzner CPX21 — 3 vCPU, 4GB RAM, 40GB SSD)
  • Location: Hillsboro, OR
  • Docker: 28.2.2
  • NemoClaw: 0.1.0 (cloned from main)
  • OpenShell: 0.0.7
  • Node.js: v22.22.1
  • Swap: 4GB configured

Steps to Reproduce

  1. Fresh Ubuntu 24.04 VPS with Docker installed
  2. Clone NemoClaw from GitHub, npm link
  3. Run nemoclaw onboard
  4. Gateway starts successfully, sandbox image builds successfully (all 22 steps, fully cached)
  5. Image push to k3s in-cluster registry fails every time at exactly "Exported 2459 MiB"

Error Output

Building image openshell/sandbox-from:1773779828 from /tmp/nemoclaw-build-D0TA2A/Dockerfile
  ...
  Successfully built bcab23545435
  Successfully tagged openshell/sandbox-from:1773779828
  Built image openshell/sandbox-from:1773779828
  Pushing image openshell/sandbox-from:1773779828 into gateway "nemoclaw"
  [progress] Exported 2459 MiB
Error:   × transport error
  ├─▶ Connection reset by peer (os error 104)
  ╰─▶ Connection reset by peer (os error 104)

  ✓ Sandbox 'opfleet-agent' created

The onboard wizard continues and reports success for steps 4-7 (inference, provider, OpenClaw setup, policies), but openshell sandbox connect subsequently returns:

Error: × status: NotFound, message: "sandbox not found"

What I've Tried

  • 4 separate onboard attempts — same failure at exactly 2459 MiB every time
  • Added 4GB swap — no change
  • Upgraded from 2GB to 4GB RAM — no change
  • Restarted Docker/k3s — k3s pods all healthy, flannel networking operational
  • Verified k3s health: kubectl get pods -A shows all pods Running/Completed

Analysis

The gRPC connection between the openshell CLI and the containerd registry inside k3s drops consistently during the large image transfer. The k3s cluster itself is healthy — flannel is running, all pods are up. This appears to be a gRPC transport issue during the 2.4GB image push, possibly related to message size limits or timeout configuration in the containerd/k3s gRPC server.

Expected Behavior

nemoclaw onboard should successfully push the sandbox image and make it available via openshell sandbox connect.

Question

Is a full k3s cluster + 2.4GB image push the intended architecture for the sandbox? The docs suggest a simple curl | bash install, but the actual setup is significantly heavier. Would it be possible to support a lighter-weight sandbox mode (e.g., direct Docker container with network namespace isolation) for resource-constrained environments?

Metadata

Metadata

Assignees

Labels

area: installInstall, setup, prerequisites, or uninstall flowarea: onboardingOnboarding FSM, provider setup, sandbox launch, or first-run flowplatform: ubuntuAffects Ubuntu Linux environments

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions