Skip to content

Gateway restart regenerates TLS certificates, breaking existing sandbox connections #888

@GarfieldHuang

Description

@GarfieldHuang

Problem

When the OpenShell gateway restarts (e.g., after openshell gateway start or system reboot), it regenerates its mTLS CA keypair. Existing sandbox pods still hold the old gateway CA certificate, so every subsequent connection attempt fails with:

[WARN] SSH connection: handshake verification failed

The only workaround is to destroy and recreate all sandboxes via nemoclaw onboard, which is disruptive and loses sandbox state.

Steps to Reproduce

  1. Create a sandbox: nemoclaw onboard
  2. Restart the gateway: openshell gateway start (after a stop or system reboot)
  3. Try to connect: nemoclaw <name> connect
  4. Connection fails with Command failed (exit 255)

Expected Behavior

Gateway restart should not invalidate existing sandbox connections. Options:

  • Persist the gateway CA keypair to a PersistentVolume so it survives restarts
  • Implement a certificate rotation mechanism where sandboxes can fetch the new gateway CA cert automatically

Environment

  • NemoClaw CLI / openshell 0.0.12
  • WSL2 Ubuntu on Windows 11
  • k3s (via Docker Desktop)

Metadata

Metadata

Assignees

Labels

area: sandboxOpenShell sandbox lifecycle, runtime, config, or recoverysecurityPotential vulnerability, unsafe behavior, or access risk

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions