Skip to content

fix(onboard): preserve disabled channels through rebuild so channels start can recover#3395

Merged
ericksoa merged 5 commits into
mainfrom
fix/channels-start-after-stop
May 13, 2026
Merged

fix(onboard): preserve disabled channels through rebuild so channels start can recover#3395
ericksoa merged 5 commits into
mainfrom
fix/channels-start-after-stop

Conversation

@laitingsheng

@laitingsheng laitingsheng commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

nemoclaw <sandbox> channels start <name> reported success but left the bot offline whenever it ran after a channels stop <name> + rebuild. The stop path correctly disabled the bridge, but the subsequent rebuild silently dropped the channel from the registry's configured set, leaving a later channels start with nothing to re-enable.

Related Issue

Fixes #3381

Changes

  • src/lib/onboard.ts: persist the operator's configured channel set on the new registry entry (input enabledChannels), not the post-disabled-filter activeMessagingChannels. disabledChannels remains the runtime modifier; the baked image and the gateway provider attachments still respect it.
  • test/onboard.test.ts: regression test that drives createSandbox with disabledChannels: ["telegram"] already in the registry and asserts the new registerSandbox payload keeps messagingChannels: ["telegram"] while the Dockerfile build arg and --provider flags continue to exclude the disabled channel.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • Bug Fixes
    • Messaging channels you configure are now preserved when recreating a sandbox, allowing them to be re-enabled in subsequent rebuilds.
    • Disabled messaging channels no longer incorrectly prevent other sandboxes from using the same credential token.

Review Change Stack

…start can recover

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 12, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 31530622-1d91-4cf1-a514-21c87fee0e33

📥 Commits

Reviewing files that changed from the base of the PR and between 65d8ce8 and 2408bc6.

📒 Files selected for processing (1)
  • src/lib/onboard.ts

📝 Walkthrough

Walkthrough

The PR fixes issue #3381 by preserving the operator's configured messaging channels across sandbox recreate cycles. createSandbox now persists enabledChannels instead of activeMessagingChannels to the registry, and conflict detection is updated to exclude disabled channels from blocking other sandboxes or reporting overlaps.

Changes

Messaging channel persistence and conflict detection fix

Layer / File(s) Summary
Registry channel persistence on recreate
src/lib/onboard.ts, test/onboard.test.ts
createSandbox persists enabledChannels (deduplicated) instead of activeMessagingChannels, preserving disabled channels in the registry for later re-enablement. Integration test verifies the registry round-trips both messagingChannels and disabledChannels across a rebuild cycle.
Conflict detection respects disabled channels
src/lib/messaging-conflict.ts, src/lib/messaging-conflict.test.ts
hasStoredChannel returns false when a channel is listed in disabledChannels, and findAllOverlaps skips disabled channels during indexing. Tests verify disabled channels do not trigger false conflicts or phantom overlaps, allowing other sandboxes to claim the same credentials.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A hop, a skip, a channel's keep—
Through stop and start, the threads run deep!
Disabled yet remembered true,
Re-enable brings the bot anew! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: preserving disabled channels through rebuild to enable recovery via channels start command.
Linked Issues check ✅ Passed All code changes directly address issue #3381 by preserving operator-configured channels through rebuild and respecting disabled state in conflict detection.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the disabled channels issue; no unrelated modifications outside the stated objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/channels-start-after-stop

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: messaging-providers-e2e, token-rotation-e2e

Workflow run

Full advisor summary

Pi Semantic E2E Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. The PR is a narrow persistence + filtering fix in lib code with substantive new unit tests covering both sides of the change (findChannelConflicts, findAllOverlaps, createSandbox registry write). It does not alter networking, credential handling, sandbox isolation, installer behavior, or inference routing — the user-visible payload is a corrected registry field and suppressed phantom warnings, which unit tests cover deterministically. Existing nightly E2Es (messaging-providers, token-rotation) provide confidence-level coverage but are not merge-blocking for this change.

Optional E2E

  • messaging-providers-e2e (medium): Exercises the full Telegram/Discord/Slack provider + placeholder + L7-proxy chain after onboard, which is where the registry messagingChannels persistence lives. Useful regression confidence for the onboard.ts registry write change.
  • token-rotation-e2e (medium): Re-runs onboard and rebuilds sandboxes with new messaging tokens, re-exercising the registry.registerSandbox call path (providerCredentialHashes, messagingChannels, disabledChannels) that this PR modifies.

New E2E recommendations

  • sandbox-channels-lifecycle (medium): No existing E2E exercises the channels stop → sandbox rebuild → channels start recovery path that is the direct user-visible behavior fixed by [Messaging] Cannot start a channel after a stop #3381. The unit tests in this PR cover the registry-write semantics, but a live-gateway round-trip would catch regressions where rebuild silently drops the disabled channel or channels start cannot re-attach the bridge.
    • Suggested test: test/e2e/test-channels-stop-rebuild-start.sh — onboard a sandbox with telegram + discord, run nemoclaw <name> channels stop telegram, trigger a rebuild (e.g. via nemoclaw onboard with unchanged tokens), assert registry.messagingChannels still contains telegram and disabledChannels=['telegram'] and the telegram bridge provider is not attached, then run nemoclaw <name> channels start telegram and assert the bridge is reattached and receives the original token.
  • messaging-channels (low): No existing E2E asserts that nemoclaw status does not report phantom overlap warnings across two sandboxes sharing a messaging token when one has stopped the channel. The unit test covers findAllOverlaps directly, but a status-command smoke would lock in the end-to-end UX fix.
    • Suggested test: Extend sandbox-operations-e2e or a new test/e2e/test-status-channel-overlap.sh: onboard two sandboxes sharing TELEGRAM_BOT_TOKEN, channels stop telegram on one, then assert nemoclaw status emits no matching-token/unknown-token overlap warning for telegram.

…ction

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng marked this pull request as ready for review May 12, 2026 10:09
@laitingsheng laitingsheng added VRDC Issues and PRs submitted by NVIDIA VRDC test team. v0.0.40 and removed VRDC Issues and PRs submitted by NVIDIA VRDC test team. labels May 12, 2026
@sandl99

sandl99 commented May 12, 2026

Copy link
Copy Markdown
Contributor

Hi I am having this PR #3186, please make sure it won't override each others

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Solid two-sided fix:

  1. onboard.ts now persists the operator's configured enabledChannels instead of the post-disabled-filter activeMessagingChannels, so a later channels start has something to re-enable after channels stop + rebuild.
  2. messaging-conflict.ts skips disabledChannels in both hasStoredChannel and findAllOverlaps, which is necessary — without it the preservation would create phantom cross-sandbox token conflicts on paused channels.

The PR aligns one stale write site with the convention already used elsewhere (doctor.ts:374-380 filters disabledChannels; rebuild.ts:625-629 round-trips disabledChannels). Test coverage is thorough: registry preservation, dockerfile build arg, gateway provider attachment, and conflict detection all asserted.

Non-blocking follow-up worth a separate PR: verify-deployment.ts:167 (verifyMessagingBridges) and inventory/index.ts:415 don't yet subtract disabledChannels before checking provider presence — same pattern as doctor.ts would prevent a latent false-positive "missing telegram bridge" warning when verifyDeployment gets wired to a production command.

macos-e2e failure is test/cli.test.ts:1310 timeout preceded by daemon.json is malformed — unrelated to this PR's surface area (no macOS-specific or cli.test.ts changes). Worth a re-run before merge.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25747639368
Branch: fix/channels-start-after-stop
Requested jobs: messaging-providers-e2e,token-rotation-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
messaging-providers-e2e ✅ success
token-rotation-e2e ✅ success

@jyaunches

Copy link
Copy Markdown
Contributor

Targeted E2E validation requested by the E2E Advisor has passed.

Run: https://github.com/NVIDIA/NemoClaw/actions/runs/25747639368

Passed targeted jobs: messaging-providers-e2e, token-rotation-e2e

PR review can proceed with that E2E signal in mind.

@prekshivyas prekshivyas requested a review from ericksoa May 12, 2026 21:44
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25777076092
Branch: fix/channels-start-after-stop
Requested jobs: messaging-providers-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
messaging-providers-e2e ✅ success

@ericksoa ericksoa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current head 2408bc6. The fix is scoped to preserving disabled messaging-channel registry state through rebuilds and filtering disabled channels out of conflict reporting. Checks are green, and a fresh messaging-providers-e2e run passed on this head.

@ericksoa ericksoa merged commit cf3fd66 into main May 13, 2026
72 checks passed
@ericksoa ericksoa deleted the fix/channels-start-after-stop branch May 13, 2026 04:01
sandl99 added a commit to sandl99/NemoClaw that referenced this pull request May 13, 2026
…chat e2e

NVIDIA#3395 (merged) preserves messagingChannels and disabledChannels in the
registry across rebuild, so the session-based carry-across introduced in
961f222 is now redundant for the channels-start recovery side. The
channels-add policy auto-apply will land in a separate PR alongside the
global build-time disable filter fix.

Removed (defer to NVIDIA#3395 / separate PR):
- src/lib/state/onboard-session.ts: drop disabledChannels field from
  Session/SessionUpdates and its read/write/normalize sites.
- src/lib/actions/sandbox/rebuild.ts: drop preservedDisabledChannels
  snapshot and the s.disabledChannels session stash.
- src/lib/onboard.ts: drop sessionDisabledChannels read in createSandbox
  (back to registry.getDisabledChannels) and the session-clear after
  registry.setDefault.
- src/lib/actions/sandbox/policy-channel.ts: drop channels-add auto-apply
  policy preset and channels-remove "still applied" hint.
- src/lib/actions/inference-set.test.ts: drop disabledChannels: null from
  the test session fixture.

Added E2E coverage for non-interactive wechat onboarding
(test/e2e/test-messaging-providers.sh). Default fake env vars
(WECHAT_BOT_TOKEN, WECHAT_ACCOUNT_ID, WECHAT_BASE_URL, WECHAT_USER_ID,
WECHAT_ALLOWED_IDS) let onboard short-circuit HOST_QR_LOGIN_HANDLERS at
src/lib/onboard.ts:8433 so no QR scan is needed. New markers:
- M-W1: ${SANDBOX}-wechat-bridge provider registered in gateway.
- M-W3/3a-3d: host-side WECHAT_BOT_TOKEN must not leak into sandbox env,
  process list, or filesystem; placeholder is present.
- M-W8: openclaw.json registers channels.openclaw-weixin with the
  configured accountId enabled.
- M-W9: per-account credential file uses the L7-resolved placeholder
  (openshell:resolve:env:WECHAT_BOT_TOKEN), not the real token.
- M-W10: accounts.json index lists the configured accountId.

Fixes the cascade from inserting wechatConfig between telegramConfig and
darwinVmCompat in patchStagedDockerfile's signature: test/onboard.test.ts
and src/lib/onboard/dockerfile-patch.test.ts now pass {} for wechatConfig
before the darwinVmCompat boolean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sandl99 added a commit to sandl99/NemoClaw that referenced this pull request May 13, 2026
…chat e2e

NVIDIA#3395 (merged) preserves messagingChannels and disabledChannels in the
registry across rebuild, so the session-based carry-across introduced in
961f222 is now redundant for the channels-start recovery side. The
channels-add policy auto-apply will land in a separate PR alongside the
global build-time disable filter fix.

Removed (defer to NVIDIA#3395 / separate PR):
- src/lib/state/onboard-session.ts: drop disabledChannels field from
  Session/SessionUpdates and its read/write/normalize sites.
- src/lib/actions/sandbox/rebuild.ts: drop preservedDisabledChannels
  snapshot and the s.disabledChannels session stash.
- src/lib/onboard.ts: drop sessionDisabledChannels read in createSandbox
  (back to registry.getDisabledChannels) and the session-clear after
  registry.setDefault.
- src/lib/actions/sandbox/policy-channel.ts: drop channels-add auto-apply
  policy preset and channels-remove "still applied" hint.
- src/lib/actions/inference-set.test.ts: drop disabledChannels: null from
  the test session fixture.

Added E2E coverage for non-interactive wechat onboarding
(test/e2e/test-messaging-providers.sh). Default fake env vars
(WECHAT_BOT_TOKEN, WECHAT_ACCOUNT_ID, WECHAT_BASE_URL, WECHAT_USER_ID,
WECHAT_ALLOWED_IDS) let onboard short-circuit HOST_QR_LOGIN_HANDLERS at
src/lib/onboard.ts:8433 so no QR scan is needed. New markers:
- M-W1: ${SANDBOX}-wechat-bridge provider registered in gateway.
- M-W3/3a-3d: host-side WECHAT_BOT_TOKEN must not leak into sandbox env,
  process list, or filesystem; placeholder is present.
- M-W8: openclaw.json registers channels.openclaw-weixin with the
  configured accountId enabled.
- M-W9: per-account credential file uses the L7-resolved placeholder
  (openshell:resolve:env:WECHAT_BOT_TOKEN), not the real token.
- M-W10: accounts.json index lists the configured accountId.

Fixes the cascade from inserting wechatConfig between telegramConfig and
darwinVmCompat in patchStagedDockerfile's signature: test/onboard.test.ts
and src/lib/onboard/dockerfile-patch.test.ts now pass {} for wechatConfig
before the darwinVmCompat boolean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sandl99 added a commit to sandl99/NemoClaw that referenced this pull request May 14, 2026
…chat e2e

NVIDIA#3395 (merged) preserves messagingChannels and disabledChannels in the
registry across rebuild, so the session-based carry-across introduced in
961f222 is now redundant for the channels-start recovery side. The
channels-add policy auto-apply will land in a separate PR alongside the
global build-time disable filter fix.

Removed (defer to NVIDIA#3395 / separate PR):
- src/lib/state/onboard-session.ts: drop disabledChannels field from
  Session/SessionUpdates and its read/write/normalize sites.
- src/lib/actions/sandbox/rebuild.ts: drop preservedDisabledChannels
  snapshot and the s.disabledChannels session stash.
- src/lib/onboard.ts: drop sessionDisabledChannels read in createSandbox
  (back to registry.getDisabledChannels) and the session-clear after
  registry.setDefault.
- src/lib/actions/sandbox/policy-channel.ts: drop channels-add auto-apply
  policy preset and channels-remove "still applied" hint.
- src/lib/actions/inference-set.test.ts: drop disabledChannels: null from
  the test session fixture.

Added E2E coverage for non-interactive wechat onboarding
(test/e2e/test-messaging-providers.sh). Default fake env vars
(WECHAT_BOT_TOKEN, WECHAT_ACCOUNT_ID, WECHAT_BASE_URL, WECHAT_USER_ID,
WECHAT_ALLOWED_IDS) let onboard short-circuit HOST_QR_LOGIN_HANDLERS at
src/lib/onboard.ts:8433 so no QR scan is needed. New markers:
- M-W1: ${SANDBOX}-wechat-bridge provider registered in gateway.
- M-W3/3a-3d: host-side WECHAT_BOT_TOKEN must not leak into sandbox env,
  process list, or filesystem; placeholder is present.
- M-W8: openclaw.json registers channels.openclaw-weixin with the
  configured accountId enabled.
- M-W9: per-account credential file uses the L7-resolved placeholder
  (openshell:resolve:env:WECHAT_BOT_TOKEN), not the real token.
- M-W10: accounts.json index lists the configured accountId.

Fixes the cascade from inserting wechatConfig between telegramConfig and
darwinVmCompat in patchStagedDockerfile's signature: test/onboard.test.ts
and src/lib/onboard/dockerfile-patch.test.ts now pass {} for wechatConfig
before the darwinVmCompat boolean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cv added a commit that referenced this pull request May 15, 2026
## Summary
  
`channels stop <ch>` followed by a rebuild left the channel running: the
rebuild path destroyed the registry entry before `onboard --resume` read
back `disabledChannels`, so the filter that should have excluded the
stopped channel from the new image was a silent no-op. This is
the complementary half of #3395 — that PR fixed the same
destroy/recreate gap on the **write** side (preserving
`messagingChannels` so `channels start` could recover, #3381); this one
fixes it on the **read** side (honoring `disabledChannels` so `channels
stop` actually
disables, #3453). After both fixes the registry, gateway, and image
agree at rest.
  
  ## Related Issue

  Fixes #3453. Refs #3381, #3395, #3462.
  
  ## Changes

- `src/lib/state/onboard-session.ts` — add `disabledChannels: string[] |
null` to `Session`, `SessionUpdates`, `createSession`,
`normalizeSession`, and `filterSafeUpdates` (mirrors the existing
`messagingChannels` round-trip).
- `src/lib/actions/sandbox/rebuild.ts` — snapshot `sb.disabledChannels`
(with session fallback for cross-process resumes) and stash it into the
session inside the same `updateSession` block that already mirrors
`messagingChannels`, before `removeSandboxRegistryEntry` wipes the
  registry.
- `src/lib/onboard.ts` — `createSandbox` prefers
`session.disabledChannels` over
`registry.getDisabledChannels(sandboxName)`; the registry fallback
preserves the fresh-onboard path that #3395 already exercises.
- `src/lib/state/onboard-session.test.ts` — 4 new cases covering
round-trip, JSON filter, default-null on fresh sessions, and
`filterSafeUpdates` accepting array + explicit `null` clear.
- `src/lib/actions/inference-set.test.ts` — fixture updated for the new
required `Session` field.
- `test/e2e/test-channels-stop-start.sh` — new e2e regression test
covering Test 1 of #3462. 47 stable `PASS:`/`FAIL:` assertions across 6
phases; Phase 4 C4a is the load-bearing check for #3453 (`openclaw.json`
excludes `telegram` after stop+rebuild), Phase 6 C6 assertions
  guard against regression of #3395's #3381 fix.
- `test/e2e/docs/parity-map.yaml` + `parity-inventory.generated.json` —
entry for the new test, regenerated inventory.
- `.github/workflows/nightly-e2e.yaml` — new `channels-stop-start-e2e`
job (mirrors `messaging-providers-e2e`), wired into `notify-on-failure`,
`report-to-pr`, and `scorecard` aggregator `needs:` lists.

  ## Type of Change

  - [x] Code change (feature, bug fix, or refactor)
  - [ ] Code change with doc updates
  - [ ] Doc only (prose changes, no code sample modifications)
  - [ ] Doc only (includes code sample changes)

  ## Verification

  - [x] `npx prek run --all-files` passes
  - [x] `npm test` passes
  - [x] Tests added or updated for new or changed behavior
  - [x] No secrets, API keys, or credentials committed
  - [ ] Docs updated for user-facing behavior changes
  - [ ] `make docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

---
<!-- DCO sign-off required by CI. Run: git config user.name && git
config user.email -->
Signed-off-by: San Dang <sdang@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* End-to-end test and script added to validate Telegram channel
stop/start across rebuilds.
* Sandbox sessions now persist a disabled-channel state across
rebuild/resume flows.

* **Tests**
* Expanded unit and E2E coverage for disabled-channel resolution,
persistence, and registry interactions.
* Test inventory and parity mappings updated to include the new nightly
scenario.

* **Chores**
* Nightly CI updated to run the new E2E job and include its results in
reports and scorecards.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3532)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: San Dang <sdang@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output area: messaging Messaging channels, bridges, manifests, or channel lifecycle bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed NemoClaw CLI feature PR adds or expands user-visible functionality labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output area: messaging Messaging channels, bridges, manifests, or channel lifecycle bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Messaging] Cannot start a channel after a stop

6 participants