Description
After a host reboot, nemoclaw <name> fails with SSH handshake errors because the gateway container regenerates SSH host keys on restart. The CLI detects this as identity_drift state but doesn't auto-resolve it — the user has to re-onboard, which unnecessarily creates a new sandbox.
Root Cause
Two issues compound:
-
Gateway SSH keys don't persist across restarts. When the recovery path restarts the gateway via openshell gateway start, OpenShell regenerates SSH host keys. The sandbox still has the old keys cached, causing handshake verification failed.
-
identity_drift is diagnosed but not resolved. getReconciledSandboxGatewayState() in src/nemoclaw.ts correctly classifies this as identity_drift (line ~688) but only prints an error — it doesn't clear stale known_hosts entries and retry.
Additionally, the registry recovery gate at line 2389 uses ["connect", "skill", "shields", "config"].includes(args[0] || "") which excludes the bare nemoclaw <name> case where args[0] is undefined (empty string not in list), even though line 2397 defaults to "connect".
Proposed Fix
In the identity_drift branch of getReconciledSandboxGatewayState():
- Remove the stale SSH known_hosts entry for the gateway
- Retry the sandbox lookup
- If the retry succeeds, return
present with recoveredGateway: true
Also fix the registry recovery gate to include the bare-command case:
["connect", "skill", "shields", "config", ""].includes(args[0] || "")
Reproduction
nemoclaw onboard (creates sandbox)
- Reboot host machine
nemoclaw <sandbox-name>
- SSH handshake verification failure — forced to re-onboard
Labels
bug, priority: high, area/connect
Description
After a host reboot,
nemoclaw <name>fails with SSH handshake errors because the gateway container regenerates SSH host keys on restart. The CLI detects this asidentity_driftstate but doesn't auto-resolve it — the user has to re-onboard, which unnecessarily creates a new sandbox.Root Cause
Two issues compound:
Gateway SSH keys don't persist across restarts. When the recovery path restarts the gateway via
openshell gateway start, OpenShell regenerates SSH host keys. The sandbox still has the old keys cached, causinghandshake verification failed.identity_driftis diagnosed but not resolved.getReconciledSandboxGatewayState()insrc/nemoclaw.tscorrectly classifies this asidentity_drift(line ~688) but only prints an error — it doesn't clear stale known_hosts entries and retry.Additionally, the registry recovery gate at line 2389 uses
["connect", "skill", "shields", "config"].includes(args[0] || "")which excludes the barenemoclaw <name>case whereargs[0]is undefined (empty string not in list), even though line 2397 defaults to"connect".Proposed Fix
In the
identity_driftbranch ofgetReconciledSandboxGatewayState():presentwithrecoveredGateway: trueAlso fix the registry recovery gate to include the bare-command case:
Reproduction
nemoclaw onboard(creates sandbox)nemoclaw <sandbox-name>Labels
bug,priority: high,area/connect