fix(onboard): preserve disabled channels through rebuild so channels start can recover by laitingsheng · Pull Request #3395 · NVIDIA/NemoClaw

laitingsheng · 2026-05-12T09:33:31Z

Summary

nemoclaw <sandbox> channels start <name> reported success but left the bot offline whenever it ran after a channels stop <name> + rebuild. The stop path correctly disabled the bridge, but the subsequent rebuild silently dropped the channel from the registry's configured set, leaving a later channels start with nothing to re-enable.

Related Issue

Fixes #3381

Changes

src/lib/onboard.ts: persist the operator's configured channel set on the new registry entry (input enabledChannels), not the post-disabled-filter activeMessagingChannels. disabledChannels remains the runtime modifier; the baked image and the gateway provider attachments still respect it.
test/onboard.test.ts: regression test that drives createSandbox with disabledChannels: ["telegram"] already in the registry and asserts the new registerSandbox payload keeps messagingChannels: ["telegram"] while the Dockerfile build arg and --provider flags continue to exclude the disabled channel.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

Bug Fixes
- Messaging channels you configure are now preserved when recreating a sandbox, allowing them to be re-enabled in subsequent rebuilds.
- Disabled messaging channels no longer incorrectly prevent other sandboxes from using the same credential token.

…start can recover Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

copy-pr-bot · 2026-05-12T09:33:35Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-12T09:33:38Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 31530622-1d91-4cf1-a514-21c87fee0e33

📥 Commits

Reviewing files that changed from the base of the PR and between 65d8ce8 and 2408bc6.

📒 Files selected for processing (1)

src/lib/onboard.ts

📝 Walkthrough

Walkthrough

The PR fixes issue #3381 by preserving the operator's configured messaging channels across sandbox recreate cycles. createSandbox now persists enabledChannels instead of activeMessagingChannels to the registry, and conflict detection is updated to exclude disabled channels from blocking other sandboxes or reporting overlaps.

Changes

Messaging channel persistence and conflict detection fix

Layer / File(s)	Summary
Registry channel persistence on recreate `src/lib/onboard.ts`, `test/onboard.test.ts`	`createSandbox` persists `enabledChannels` (deduplicated) instead of `activeMessagingChannels`, preserving disabled channels in the registry for later re-enablement. Integration test verifies the registry round-trips both `messagingChannels` and `disabledChannels` across a rebuild cycle.
Conflict detection respects disabled channels `src/lib/messaging-conflict.ts`, `src/lib/messaging-conflict.test.ts`	`hasStoredChannel` returns false when a channel is listed in `disabledChannels`, and `findAllOverlaps` skips disabled channels during indexing. Tests verify disabled channels do not trigger false conflicts or phantom overlaps, allowing other sandboxes to claim the same credentials.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A hop, a skip, a channel's keep—
Through stop and start, the threads run deep!
Disabled yet remembered true,
Re-enable brings the bot anew! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: preserving disabled channels through rebuild to enable recovery via channels start command.
Linked Issues check	✅ Passed	All code changes directly address issue `#3381` by preserving operator-configured channels through rebuild and respecting disabled state in conflict detection.
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing the disabled channels issue; no unrelated modifications outside the stated objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/channels-start-after-stop

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-12T09:34:06Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: messaging-providers-e2e, token-rotation-e2e

Workflow run

Full advisor summary

Pi Semantic E2E Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. The PR is a narrow persistence + filtering fix in lib code with substantive new unit tests covering both sides of the change (findChannelConflicts, findAllOverlaps, createSandbox registry write). It does not alter networking, credential handling, sandbox isolation, installer behavior, or inference routing — the user-visible payload is a corrected registry field and suppressed phantom warnings, which unit tests cover deterministically. Existing nightly E2Es (messaging-providers, token-rotation) provide confidence-level coverage but are not merge-blocking for this change.

Optional E2E

messaging-providers-e2e (medium): Exercises the full Telegram/Discord/Slack provider + placeholder + L7-proxy chain after onboard, which is where the registry messagingChannels persistence lives. Useful regression confidence for the onboard.ts registry write change.
token-rotation-e2e (medium): Re-runs onboard and rebuilds sandboxes with new messaging tokens, re-exercising the registry.registerSandbox call path (providerCredentialHashes, messagingChannels, disabledChannels) that this PR modifies.

New E2E recommendations

sandbox-channels-lifecycle (medium): No existing E2E exercises the channels stop → sandbox rebuild → channels start recovery path that is the direct user-visible behavior fixed by [Messaging] Cannot start a channel after a stop #3381. The unit tests in this PR cover the registry-write semantics, but a live-gateway round-trip would catch regressions where rebuild silently drops the disabled channel or channels start cannot re-attach the bridge.
- Suggested test: test/e2e/test-channels-stop-rebuild-start.sh — onboard a sandbox with telegram + discord, run nemoclaw <name> channels stop telegram, trigger a rebuild (e.g. via nemoclaw onboard with unchanged tokens), assert registry.messagingChannels still contains telegram and disabledChannels=['telegram'] and the telegram bridge provider is not attached, then run nemoclaw <name> channels start telegram and assert the bridge is reattached and receives the original token.
messaging-channels (low): No existing E2E asserts that nemoclaw status does not report phantom overlap warnings across two sandboxes sharing a messaging token when one has stopped the channel. The unit test covers findAllOverlaps directly, but a status-command smoke would lock in the end-to-end UX fix.
- Suggested test: Extend sandbox-operations-e2e or a new test/e2e/test-status-channel-overlap.sh: onboard two sandboxes sharing TELEGRAM_BOT_TOKEN, channels stop telegram on one, then assert nemoclaw status emits no matching-token/unknown-token overlap warning for telegram.

…ction Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

sandl99 · 2026-05-12T14:14:28Z

Hi I am having this PR #3186, please make sure it won't override each others

prekshivyas

LGTM.

Solid two-sided fix:

onboard.ts now persists the operator's configured enabledChannels instead of the post-disabled-filter activeMessagingChannels, so a later channels start has something to re-enable after channels stop + rebuild.
messaging-conflict.ts skips disabledChannels in both hasStoredChannel and findAllOverlaps, which is necessary — without it the preservation would create phantom cross-sandbox token conflicts on paused channels.

The PR aligns one stale write site with the convention already used elsewhere (doctor.ts:374-380 filters disabledChannels; rebuild.ts:625-629 round-trips disabledChannels). Test coverage is thorough: registry preservation, dockerfile build arg, gateway provider attachment, and conflict detection all asserted.

Non-blocking follow-up worth a separate PR: verify-deployment.ts:167 (verifyMessagingBridges) and inventory/index.ts:415 don't yet subtract disabledChannels before checking provider presence — same pattern as doctor.ts would prevent a latent false-positive "missing telegram bridge" warning when verifyDeployment gets wired to a production command.

macos-e2e failure is test/cli.test.ts:1310 timeout preceded by daemon.json is malformed — unrelated to this PR's surface area (no macOS-specific or cli.test.ts changes). Worth a re-run before merge.

github-actions · 2026-05-12T16:55:38Z

Selective E2E Results — ✅ All requested jobs passed

Run: 25747639368
Branch: fix/channels-start-after-stop
Requested jobs: messaging-providers-e2e,token-rotation-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
messaging-providers-e2e	✅ success
token-rotation-e2e	✅ success

jyaunches · 2026-05-12T17:03:31Z

Targeted E2E validation requested by the E2E Advisor has passed.

Run: https://github.com/NVIDIA/NemoClaw/actions/runs/25747639368

Passed targeted jobs: messaging-providers-e2e, token-rotation-e2e

PR review can proceed with that E2E signal in mind.

github-actions · 2026-05-13T03:57:58Z

Selective E2E Results — ✅ All requested jobs passed

Run: 25777076092
Branch: fix/channels-start-after-stop
Requested jobs: messaging-providers-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
messaging-providers-e2e	✅ success

ericksoa

Reviewed current head 2408bc6. The fix is scoped to preserving disabled messaging-channel registry state through rebuilds and filtering disabled channels out of conflict reporting. Checks are green, and a fresh messaging-providers-e2e run passed on this head.

…chat e2e NVIDIA#3395 (merged) preserves messagingChannels and disabledChannels in the registry across rebuild, so the session-based carry-across introduced in 961f222 is now redundant for the channels-start recovery side. The channels-add policy auto-apply will land in a separate PR alongside the global build-time disable filter fix. Removed (defer to NVIDIA#3395 / separate PR): - src/lib/state/onboard-session.ts: drop disabledChannels field from Session/SessionUpdates and its read/write/normalize sites. - src/lib/actions/sandbox/rebuild.ts: drop preservedDisabledChannels snapshot and the s.disabledChannels session stash. - src/lib/onboard.ts: drop sessionDisabledChannels read in createSandbox (back to registry.getDisabledChannels) and the session-clear after registry.setDefault. - src/lib/actions/sandbox/policy-channel.ts: drop channels-add auto-apply policy preset and channels-remove "still applied" hint. - src/lib/actions/inference-set.test.ts: drop disabledChannels: null from the test session fixture. Added E2E coverage for non-interactive wechat onboarding (test/e2e/test-messaging-providers.sh). Default fake env vars (WECHAT_BOT_TOKEN, WECHAT_ACCOUNT_ID, WECHAT_BASE_URL, WECHAT_USER_ID, WECHAT_ALLOWED_IDS) let onboard short-circuit HOST_QR_LOGIN_HANDLERS at src/lib/onboard.ts:8433 so no QR scan is needed. New markers: - M-W1: ${SANDBOX}-wechat-bridge provider registered in gateway. - M-W3/3a-3d: host-side WECHAT_BOT_TOKEN must not leak into sandbox env, process list, or filesystem; placeholder is present. - M-W8: openclaw.json registers channels.openclaw-weixin with the configured accountId enabled. - M-W9: per-account credential file uses the L7-resolved placeholder (openshell:resolve:env:WECHAT_BOT_TOKEN), not the real token. - M-W10: accounts.json index lists the configured accountId. Fixes the cascade from inserting wechatConfig between telegramConfig and darwinVmCompat in patchStagedDockerfile's signature: test/onboard.test.ts and src/lib/onboard/dockerfile-patch.test.ts now pass {} for wechatConfig before the darwinVmCompat boolean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary `channels stop <ch>` followed by a rebuild left the channel running: the rebuild path destroyed the registry entry before `onboard --resume` read back `disabledChannels`, so the filter that should have excluded the stopped channel from the new image was a silent no-op. This is the complementary half of #3395 — that PR fixed the same destroy/recreate gap on the **write** side (preserving `messagingChannels` so `channels start` could recover, #3381); this one fixes it on the **read** side (honoring `disabledChannels` so `channels stop` actually disables, #3453). After both fixes the registry, gateway, and image agree at rest. ## Related Issue Fixes #3453. Refs #3381, #3395, #3462. ## Changes - `src/lib/state/onboard-session.ts` — add `disabledChannels: string[] | null` to `Session`, `SessionUpdates`, `createSession`, `normalizeSession`, and `filterSafeUpdates` (mirrors the existing `messagingChannels` round-trip). - `src/lib/actions/sandbox/rebuild.ts` — snapshot `sb.disabledChannels` (with session fallback for cross-process resumes) and stash it into the session inside the same `updateSession` block that already mirrors `messagingChannels`, before `removeSandboxRegistryEntry` wipes the registry. - `src/lib/onboard.ts` — `createSandbox` prefers `session.disabledChannels` over `registry.getDisabledChannels(sandboxName)`; the registry fallback preserves the fresh-onboard path that #3395 already exercises. - `src/lib/state/onboard-session.test.ts` — 4 new cases covering round-trip, JSON filter, default-null on fresh sessions, and `filterSafeUpdates` accepting array + explicit `null` clear. - `src/lib/actions/inference-set.test.ts` — fixture updated for the new required `Session` field. - `test/e2e/test-channels-stop-start.sh` — new e2e regression test covering Test 1 of #3462. 47 stable `PASS:`/`FAIL:` assertions across 6 phases; Phase 4 C4a is the load-bearing check for #3453 (`openclaw.json` excludes `telegram` after stop+rebuild), Phase 6 C6 assertions guard against regression of #3395's #3381 fix. - `test/e2e/docs/parity-map.yaml` + `parity-inventory.generated.json` — entry for the new test, regenerated inventory. - `.github/workflows/nightly-e2e.yaml` — new `channels-stop-start-e2e` job (mirrors `messaging-providers-e2e`), wired into `notify-on-failure`, `report-to-pr`, and `scorecard` aggregator `needs:` lists. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ---  Signed-off-by: San Dang <sdang@nvidia.com>  ## Summary by CodeRabbit * **New Features** * End-to-end test and script added to validate Telegram channel stop/start across rebuilds. * Sandbox sessions now persist a disabled-channel state across rebuild/resume flows. * **Tests** * Expanded unit and E2E coverage for disabled-channel resolution, persistence, and registry interactions. * Test inventory and parity mappings updated to include the new nightly scenario. * **Chores** * Nightly CI updated to run the new E2E job and include its results in reports and scorecards.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3532)   --------- Signed-off-by: San Dang <sdang@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>

fix(onboard): preserve disabled channels through rebuild so channels …

7922b66

…start can recover Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added fix labels May 12, 2026

fix(messaging-conflict): exclude disabled channels from conflict dete…

65d8ce8

…ction Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng marked this pull request as ready for review May 12, 2026 10:09

laitingsheng added VRDC Issues and PRs submitted by NVIDIA VRDC test team. v0.0.40 and removed VRDC Issues and PRs submitted by NVIDIA VRDC test team. labels May 12, 2026

cv assigned prekshivyas May 12, 2026

Merge branch 'main' into fix/channels-start-after-stop

e4c0878

prekshivyas approved these changes May 12, 2026

View reviewed changes

Merge branch 'main' into fix/channels-start-after-stop

11fb3d4

Merge branch 'main' into fix/channels-start-after-stop

2408bc6

prekshivyas requested a review from ericksoa May 12, 2026 21:44

ericksoa approved these changes May 13, 2026

View reviewed changes

ericksoa merged commit cf3fd66 into main May 13, 2026
72 checks passed

ericksoa deleted the fix/channels-start-after-stop branch May 13, 2026 04:01

coderabbitai Bot mentioned this pull request May 13, 2026

feat(channels): add WhatsApp messaging channel with QR pairing #3392

Merged

12 tasks

sandl99 mentioned this pull request May 13, 2026

bug(messaging) channels stop <name> does not stop the bridge through rebuild — openclaw.json still bakes the channel in #3453

Closed

sandl99 mentioned this pull request May 13, 2026

test(messaging): add E2E coverage for channels stop/start and channels add/remove rebuild flows #3462

Closed

sandl99 mentioned this pull request May 14, 2026

fix(rebuild): preserve disabledChannels across destroy/recreate #3532

Merged

12 tasks

Conversation

laitingsheng commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

Pi Semantic E2E Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

sandl99 commented May 12, 2026

Uh oh!

prekshivyas left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 12, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

jyaunches commented May 12, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

laitingsheng commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading