Skip to content

[All platforms] Re-onboard does not clean up orphaned SSH port-forward from previous session #1950

@yanyunl1991

Description

@yanyunl1991

Description

Summary:
nemoclaw onboard fails at preflight with "Port 18789 is not available" after a previous sandbox was destroyed. The SSH port-forward process from the previous session is not cleaned up by openshell gateway destroy or nemoclaw <name> destroy.

Reproduction Steps

  1. Complete a full nemoclaw onboard (sandbox running, dashboard forwarded on port 18789)
  2. Destroy gateway directly via openshell (bypassing NemoClaw's cleanup):
    openshell gateway destroy -g nemoclaw
    docker stop openshell-cluster-nemoclaw
    docker rm openshell-cluster-nemoclaw
  3. Run nemoclaw onboard again

Note: This also applies to scenarios where the gateway is lost unexpectedly (machine reboot, Ctrl+C during onboard, etc.) without going through nemoclaw destroy.

Environment

  • Platform: Ubuntu 24.04
  • NemoClaw: v0.1.0 (latest main)
  • OpenShell CLI: 0.0.26
  • Node.js: v22.22.2

Debug Output

Logs

**Expected Behavior:**
  Onboard detects the orphaned SSH port-forward and cleans it up automatically, then proceeds normally.                                                                                                                                                                                                                          
                  
  **Actual Behavior:**                                                                                                                                                                                                                                                                                                           
  [1/8] Preflight checks
  ✓ Docker is running                                                                                                                                                                                                                                                                                                            
  ✓ Container runtime: docker
  ✓ openshell CLI: openshell 0.0.26
  ✓ Port 8080 available (OpenShell gateway)
                                                                                                                                                                                                                                                                                                                                 
  !! Port 18789 is not available.
     NemoClaw dashboard needs this port.                                                                                                                                                                                                                                                                                         
                  
     Blocked by: ssh (PID 3595441)                                                                                                                                                                                                                                                                                               
   
  User must manually `sudo kill <PID>` before re-onboarding.                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                                 
  **Suggested Fix:**
  In the preflight port check, when port 18789 is blocked by an `ssh` process (NemoClaw's own port-forward), automatically kill it and retry — similar to the orphaned gateway container cleanup in #1582.

Checklist

  • I confirmed this bug is reproducible
  • I searched existing issues and this is not a duplicate

Metadata

Metadata

Assignees

Labels

needs: triageAwaiting maintainer classification

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions