Skip to content

[Channels] channels remove <ch> fails on a live sandbox and leaves the channel's policy preset applied #3671

@hunglp6d

Description

@hunglp6d

Description

What happened

nemoclaw <sandbox> channels remove <channel> has two distinct asymmetries with channels add that surface together on a normal usage path.

Symptom A — command fails entirely on a live sandbox

$ nemoclaw my-assistant channels remove telegram
Failed to delete bridge provider(s) from the OpenShell gateway: my-assistant-telegram-bridge.
Registry not updated; re-run after resolving the gateway error.

The underlying openshell error is captured by the CLI but never printed, so the operator cannot self-diagnose. Running the gateway op manually reveals the real cause:

$ openshell provider delete my-assistant-telegram-bridge
Error: × status: FailedPrecondition,
message: "provider 'my-assistant-telegram-bridge' is attached to sandbox(es): my-assistant"

openshell (≥ 0.0.39) refuses to delete a provider that is still attached to any sandbox. The removeSandboxChannel flow in src/lib/actions/sandbox/policy-channel.ts calls openshell provider delete directly without first detaching the provider, so the delete fails on any live sandbox. When it fails the code exits before reaching the registry update and the rebuild prompt — operator state is left half-applied:

  • host: tokens cleared
  • gateway: provider record still present, still attached
  • registry (~/.nemoclaw/sandboxes.json): messagingChannels still contains the channel, providerCredentialHashes still contains its hash
  • sandbox container: bridge process still running

Symptom B — policy preset is never un-applied

After working around Symptom A (e.g., manually detaching and re-running the command), policy-list still reports the channel-named policy preset as active:

$ nemoclaw my-assistant policy-list
Policy presets for sandbox 'my-assistant':
...
● telegram — Telegram Bot API access
...

channels add <ch> calls applyChannelPresetIfAvailable(sandbox, channel) to auto-apply the matching built-in preset (e.g., telegram.yaml for the telegram channel) so the L7 proxy allow-lists api.telegram.org. There is no symmetric helper on the remove side, so api.telegram.org stays in the sandbox's allow-list even after the bridge is gone. This is a defense-in-depth gap rather than a functional regression, but it surprises operators who expect remove to be the inverse of add.

What I expected

channels remove <ch> should be a clean inverse of channels add <ch>:

  1. detach the bridge provider(s) from the sandbox attachment,
  2. delete the provider record(s) from the gateway,
  3. drop the channel from the host registry (messagingChannels, providerCredentialHashes),
  4. un-apply the matching channel-named built-in policy preset (mirror of applyChannelPresetIfAvailable),
  5. prompt for rebuild so the new image stops baking in the channel.

Root cause

Two independent gaps in src/lib/actions/sandbox/policy-channel.ts:

  1. applyChannelRemoveToGatewayAndRegistry (~line 325-382) calls openshell provider delete <bridge> without a prior openshell sandbox provider detach <sandbox> <bridge>. openshell ≥ 0.0.39 rejects the delete with FailedPrecondition whenever the sandbox is still alive at remove-time. Additionally, the captured stderr/stdout is only used to match NotFound and is then discarded — operators never see the real gateway error.
  2. removeSandboxChannel (~line 672-706) has no symmetric helper to un-apply the channel-named preset. addSandboxChannel calls applyChannelPresetIfAvailable at line 640; the remove path is missing the equivalent un-apply call.

Reproduction Steps

  1. Fresh checkout. Export a Telegram bot token (any well-formed value works — does not need to be valid):
export TELEGRAM_BOT_TOKEN=<redacted>
  1. Install + onboard non-interactively:
bash install.sh --non-interactive --yes-i-accept-third-party-software

Wait for the sandbox my-assistant to be reported Ready.

  1. Verify baseline:
openshell provider list # shows my-assistant-telegram-bridge
nemoclaw my-assistant policy-list # shows ● telegram (applied)
cat ~/.nemoclaw/sandboxes.json | jq '.sandboxes."my-assistant".messagingChannels'
# → ["telegram"]
  1. Attempt to remove the channel while the sandbox is alive (Symptom A):
nemoclaw my-assistant channels remove telegram

Observed: Failed to delete bridge provider(s) from the OpenShell gateway: my-assistant-telegram-bridge. Registry not updated; re-run after resolving the gateway error. (exit 1)

  1. Confirm the underlying gateway error is FailedPrecondition:
openshell provider delete my-assistant-telegram-bridge
# → Error: FailedPrecondition, message: "provider 'my-assistant-telegram-bridge' is attached to sandbox(es): my-assistant"
  1. Confirm state is half-applied:
cat ~/.nemoclaw/sandboxes.json | jq '.sandboxes."my-assistant".messagingChannels'
# → still ["telegram"]
  1. To observe Symptom B: detach manually, retry the remove, then check the policy list:
openshell sandbox provider detach my-assistant my-assistant-telegram-bridge
nemoclaw my-assistant channels remove telegram
nemoclaw my-assistant policy-list
# → ● telegram still listed as applied

Environment

  • OS: Ubuntu 24.04
  • Node.js: v22.22.3
  • npm: 10.9.8
  • Docker: 29.4.3
  • NemoClaw: v0.0.44-13-gc57103f8f (built from source)
  • OpenShell: 0.0.39
  • Container runtime: docker (driver)

Debug Output

Logs

Checklist

  • I confirmed this bug is reproducible
  • I searched existing issues and this is not a duplicate

Metadata

Metadata

Assignees

Labels

VRDCIssues and PRs submitted by NVIDIA VRDC test team.area: e2eEnd-to-end tests, nightly failures, or validation infrastructurearea: messagingMessaging channels, bridges, manifests, or channel lifecycleneeds: triageAwaiting maintainer classification
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions