fix(onboard): preserve disabled channels through rebuild so channels start can recover#3395
Conversation
…start can recover Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR fixes issue ChangesMessaging channel persistence and conflict detection fix
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryPi Semantic E2E AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
…ction Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
Hi I am having this PR #3186, please make sure it won't override each others |
prekshivyas
left a comment
There was a problem hiding this comment.
LGTM.
Solid two-sided fix:
onboard.tsnow persists the operator's configuredenabledChannelsinstead of the post-disabled-filteractiveMessagingChannels, so a laterchannels starthas something to re-enable afterchannels stop+ rebuild.messaging-conflict.tsskipsdisabledChannelsin bothhasStoredChannelandfindAllOverlaps, which is necessary — without it the preservation would create phantom cross-sandbox token conflicts on paused channels.
The PR aligns one stale write site with the convention already used elsewhere (doctor.ts:374-380 filters disabledChannels; rebuild.ts:625-629 round-trips disabledChannels). Test coverage is thorough: registry preservation, dockerfile build arg, gateway provider attachment, and conflict detection all asserted.
Non-blocking follow-up worth a separate PR: verify-deployment.ts:167 (verifyMessagingBridges) and inventory/index.ts:415 don't yet subtract disabledChannels before checking provider presence — same pattern as doctor.ts would prevent a latent false-positive "missing telegram bridge" warning when verifyDeployment gets wired to a production command.
macos-e2e failure is test/cli.test.ts:1310 timeout preceded by daemon.json is malformed — unrelated to this PR's surface area (no macOS-specific or cli.test.ts changes). Worth a re-run before merge.
Selective E2E Results — ✅ All requested jobs passedRun: 25747639368
|
|
Targeted E2E validation requested by the E2E Advisor has passed. Run: https://github.com/NVIDIA/NemoClaw/actions/runs/25747639368 Passed targeted jobs: PR review can proceed with that E2E signal in mind. |
Selective E2E Results — ✅ All requested jobs passedRun: 25777076092
|
ericksoa
left a comment
There was a problem hiding this comment.
Reviewed current head 2408bc6. The fix is scoped to preserving disabled messaging-channel registry state through rebuilds and filtering disabled channels out of conflict reporting. Checks are green, and a fresh messaging-providers-e2e run passed on this head.
…chat e2e NVIDIA#3395 (merged) preserves messagingChannels and disabledChannels in the registry across rebuild, so the session-based carry-across introduced in 961f222 is now redundant for the channels-start recovery side. The channels-add policy auto-apply will land in a separate PR alongside the global build-time disable filter fix. Removed (defer to NVIDIA#3395 / separate PR): - src/lib/state/onboard-session.ts: drop disabledChannels field from Session/SessionUpdates and its read/write/normalize sites. - src/lib/actions/sandbox/rebuild.ts: drop preservedDisabledChannels snapshot and the s.disabledChannels session stash. - src/lib/onboard.ts: drop sessionDisabledChannels read in createSandbox (back to registry.getDisabledChannels) and the session-clear after registry.setDefault. - src/lib/actions/sandbox/policy-channel.ts: drop channels-add auto-apply policy preset and channels-remove "still applied" hint. - src/lib/actions/inference-set.test.ts: drop disabledChannels: null from the test session fixture. Added E2E coverage for non-interactive wechat onboarding (test/e2e/test-messaging-providers.sh). Default fake env vars (WECHAT_BOT_TOKEN, WECHAT_ACCOUNT_ID, WECHAT_BASE_URL, WECHAT_USER_ID, WECHAT_ALLOWED_IDS) let onboard short-circuit HOST_QR_LOGIN_HANDLERS at src/lib/onboard.ts:8433 so no QR scan is needed. New markers: - M-W1: ${SANDBOX}-wechat-bridge provider registered in gateway. - M-W3/3a-3d: host-side WECHAT_BOT_TOKEN must not leak into sandbox env, process list, or filesystem; placeholder is present. - M-W8: openclaw.json registers channels.openclaw-weixin with the configured accountId enabled. - M-W9: per-account credential file uses the L7-resolved placeholder (openshell:resolve:env:WECHAT_BOT_TOKEN), not the real token. - M-W10: accounts.json index lists the configured accountId. Fixes the cascade from inserting wechatConfig between telegramConfig and darwinVmCompat in patchStagedDockerfile's signature: test/onboard.test.ts and src/lib/onboard/dockerfile-patch.test.ts now pass {} for wechatConfig before the darwinVmCompat boolean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…chat e2e NVIDIA#3395 (merged) preserves messagingChannels and disabledChannels in the registry across rebuild, so the session-based carry-across introduced in 961f222 is now redundant for the channels-start recovery side. The channels-add policy auto-apply will land in a separate PR alongside the global build-time disable filter fix. Removed (defer to NVIDIA#3395 / separate PR): - src/lib/state/onboard-session.ts: drop disabledChannels field from Session/SessionUpdates and its read/write/normalize sites. - src/lib/actions/sandbox/rebuild.ts: drop preservedDisabledChannels snapshot and the s.disabledChannels session stash. - src/lib/onboard.ts: drop sessionDisabledChannels read in createSandbox (back to registry.getDisabledChannels) and the session-clear after registry.setDefault. - src/lib/actions/sandbox/policy-channel.ts: drop channels-add auto-apply policy preset and channels-remove "still applied" hint. - src/lib/actions/inference-set.test.ts: drop disabledChannels: null from the test session fixture. Added E2E coverage for non-interactive wechat onboarding (test/e2e/test-messaging-providers.sh). Default fake env vars (WECHAT_BOT_TOKEN, WECHAT_ACCOUNT_ID, WECHAT_BASE_URL, WECHAT_USER_ID, WECHAT_ALLOWED_IDS) let onboard short-circuit HOST_QR_LOGIN_HANDLERS at src/lib/onboard.ts:8433 so no QR scan is needed. New markers: - M-W1: ${SANDBOX}-wechat-bridge provider registered in gateway. - M-W3/3a-3d: host-side WECHAT_BOT_TOKEN must not leak into sandbox env, process list, or filesystem; placeholder is present. - M-W8: openclaw.json registers channels.openclaw-weixin with the configured accountId enabled. - M-W9: per-account credential file uses the L7-resolved placeholder (openshell:resolve:env:WECHAT_BOT_TOKEN), not the real token. - M-W10: accounts.json index lists the configured accountId. Fixes the cascade from inserting wechatConfig between telegramConfig and darwinVmCompat in patchStagedDockerfile's signature: test/onboard.test.ts and src/lib/onboard/dockerfile-patch.test.ts now pass {} for wechatConfig before the darwinVmCompat boolean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…chat e2e NVIDIA#3395 (merged) preserves messagingChannels and disabledChannels in the registry across rebuild, so the session-based carry-across introduced in 961f222 is now redundant for the channels-start recovery side. The channels-add policy auto-apply will land in a separate PR alongside the global build-time disable filter fix. Removed (defer to NVIDIA#3395 / separate PR): - src/lib/state/onboard-session.ts: drop disabledChannels field from Session/SessionUpdates and its read/write/normalize sites. - src/lib/actions/sandbox/rebuild.ts: drop preservedDisabledChannels snapshot and the s.disabledChannels session stash. - src/lib/onboard.ts: drop sessionDisabledChannels read in createSandbox (back to registry.getDisabledChannels) and the session-clear after registry.setDefault. - src/lib/actions/sandbox/policy-channel.ts: drop channels-add auto-apply policy preset and channels-remove "still applied" hint. - src/lib/actions/inference-set.test.ts: drop disabledChannels: null from the test session fixture. Added E2E coverage for non-interactive wechat onboarding (test/e2e/test-messaging-providers.sh). Default fake env vars (WECHAT_BOT_TOKEN, WECHAT_ACCOUNT_ID, WECHAT_BASE_URL, WECHAT_USER_ID, WECHAT_ALLOWED_IDS) let onboard short-circuit HOST_QR_LOGIN_HANDLERS at src/lib/onboard.ts:8433 so no QR scan is needed. New markers: - M-W1: ${SANDBOX}-wechat-bridge provider registered in gateway. - M-W3/3a-3d: host-side WECHAT_BOT_TOKEN must not leak into sandbox env, process list, or filesystem; placeholder is present. - M-W8: openclaw.json registers channels.openclaw-weixin with the configured accountId enabled. - M-W9: per-account credential file uses the L7-resolved placeholder (openshell:resolve:env:WECHAT_BOT_TOKEN), not the real token. - M-W10: accounts.json index lists the configured accountId. Fixes the cascade from inserting wechatConfig between telegramConfig and darwinVmCompat in patchStagedDockerfile's signature: test/onboard.test.ts and src/lib/onboard/dockerfile-patch.test.ts now pass {} for wechatConfig before the darwinVmCompat boolean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary `channels stop <ch>` followed by a rebuild left the channel running: the rebuild path destroyed the registry entry before `onboard --resume` read back `disabledChannels`, so the filter that should have excluded the stopped channel from the new image was a silent no-op. This is the complementary half of #3395 — that PR fixed the same destroy/recreate gap on the **write** side (preserving `messagingChannels` so `channels start` could recover, #3381); this one fixes it on the **read** side (honoring `disabledChannels` so `channels stop` actually disables, #3453). After both fixes the registry, gateway, and image agree at rest. ## Related Issue Fixes #3453. Refs #3381, #3395, #3462. ## Changes - `src/lib/state/onboard-session.ts` — add `disabledChannels: string[] | null` to `Session`, `SessionUpdates`, `createSession`, `normalizeSession`, and `filterSafeUpdates` (mirrors the existing `messagingChannels` round-trip). - `src/lib/actions/sandbox/rebuild.ts` — snapshot `sb.disabledChannels` (with session fallback for cross-process resumes) and stash it into the session inside the same `updateSession` block that already mirrors `messagingChannels`, before `removeSandboxRegistryEntry` wipes the registry. - `src/lib/onboard.ts` — `createSandbox` prefers `session.disabledChannels` over `registry.getDisabledChannels(sandboxName)`; the registry fallback preserves the fresh-onboard path that #3395 already exercises. - `src/lib/state/onboard-session.test.ts` — 4 new cases covering round-trip, JSON filter, default-null on fresh sessions, and `filterSafeUpdates` accepting array + explicit `null` clear. - `src/lib/actions/inference-set.test.ts` — fixture updated for the new required `Session` field. - `test/e2e/test-channels-stop-start.sh` — new e2e regression test covering Test 1 of #3462. 47 stable `PASS:`/`FAIL:` assertions across 6 phases; Phase 4 C4a is the load-bearing check for #3453 (`openclaw.json` excludes `telegram` after stop+rebuild), Phase 6 C6 assertions guard against regression of #3395's #3381 fix. - `test/e2e/docs/parity-map.yaml` + `parity-inventory.generated.json` — entry for the new test, regenerated inventory. - `.github/workflows/nightly-e2e.yaml` — new `channels-stop-start-e2e` job (mirrors `messaging-providers-e2e`), wired into `notify-on-failure`, `report-to-pr`, and `scorecard` aggregator `needs:` lists. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: San Dang <sdang@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * End-to-end test and script added to validate Telegram channel stop/start across rebuilds. * Sandbox sessions now persist a disabled-channel state across rebuild/resume flows. * **Tests** * Expanded unit and E2E coverage for disabled-channel resolution, persistence, and registry interactions. * Test inventory and parity mappings updated to include the new nightly scenario. * **Chores** * Nightly CI updated to run the new E2E job and include its results in reports and scorecards. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3532) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: San Dang <sdang@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
nemoclaw <sandbox> channels start <name>reported success but left the bot offline whenever it ran after achannels stop <name>+ rebuild. The stop path correctly disabled the bridge, but the subsequent rebuild silently dropped the channel from the registry's configured set, leaving a laterchannels startwith nothing to re-enable.Related Issue
Fixes #3381
Changes
src/lib/onboard.ts: persist the operator's configured channel set on the new registry entry (inputenabledChannels), not the post-disabled-filteractiveMessagingChannels.disabledChannelsremains the runtime modifier; the baked image and the gateway provider attachments still respect it.test/onboard.test.ts: regression test that drivescreateSandboxwithdisabledChannels: ["telegram"]already in the registry and asserts the newregisterSandboxpayload keepsmessagingChannels: ["telegram"]while the Dockerfile build arg and--providerflags continue to exclude the disabled channel.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit