Skip to content

fix(connect): auto-recover from SSH identity drift after host reboot #2056

@ericksoa

Description

@ericksoa

Description

After a host reboot, nemoclaw <name> fails with SSH handshake errors because the gateway container regenerates SSH host keys on restart. The CLI detects this as identity_drift state but doesn't auto-resolve it — the user has to re-onboard, which unnecessarily creates a new sandbox.

Root Cause

Two issues compound:

  1. Gateway SSH keys don't persist across restarts. When the recovery path restarts the gateway via openshell gateway start, OpenShell regenerates SSH host keys. The sandbox still has the old keys cached, causing handshake verification failed.

  2. identity_drift is diagnosed but not resolved. getReconciledSandboxGatewayState() in src/nemoclaw.ts correctly classifies this as identity_drift (line ~688) but only prints an error — it doesn't clear stale known_hosts entries and retry.

Additionally, the registry recovery gate at line 2389 uses ["connect", "skill", "shields", "config"].includes(args[0] || "") which excludes the bare nemoclaw <name> case where args[0] is undefined (empty string not in list), even though line 2397 defaults to "connect".

Proposed Fix

In the identity_drift branch of getReconciledSandboxGatewayState():

  1. Remove the stale SSH known_hosts entry for the gateway
  2. Retry the sandbox lookup
  3. If the retry succeeds, return present with recoveredGateway: true

Also fix the registry recovery gate to include the bare-command case:

["connect", "skill", "shields", "config", ""].includes(args[0] || "")

Reproduction

  1. nemoclaw onboard (creates sandbox)
  2. Reboot host machine
  3. nemoclaw <sandbox-name>
  4. SSH handshake verification failure — forced to re-onboard

Labels

bug, priority: high, area/connect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions