fix(onboard): detect messaging bot-token conflicts across sandboxes (#1953) by cjagwani · Pull Request #1979 · NVIDIA/NemoClaw

cjagwani · 2026-04-16T20:03:08Z

Summary

Closes #1953. Telegram/Discord/Slack bot tokens only allow one consumer per token, globally — enforced by the platform, not by NemoClaw. Two NemoClaw sandboxes resolving the same token from credentials.json both start polling, each kicks the other off, and neither delivers messages. nemoclaw status still reports the bridge as running; the only evidence is a repeating 409 line inside /tmp/gateway.log.

This PR adds three layers of defense inside NemoClaw. No OpenClaw changes needed.

What changed

Onboard-time prevention — before creating a sandbox that enables a messaging channel, check the registry for any other sandbox already using that channel. Prompt [y/N] default No; hard-exit 1 in non-interactive mode.
Status-time overlap warning — nemoclaw status lists every cross-sandbox messaging-channel overlap so users who upgrade or click through the onboard prompt still see the problem.
Status-time 409 detection — nemoclaw status tails /tmp/gateway.log inside the default sandbox and flags Telegram's getUpdates conflict / 409 Conflict pattern as degraded. Short timeout (3s) and silent on failure — it never breaks status.
Registry backfill — new messagingChannels field on SandboxEntry. Pre-existing sandboxes have no record of their channels, so the check would miss conflicts after an upgrade; a lazy backfill probes OpenShell for ${name}-{telegram,discord,slack}-bridge providers and fills the field. Runs on first onboard/status after upgrade.
Troubleshooting docs — new section explaining the 409 pattern, how to diagnose via /tmp/gateway.log, and how to recover.

The conflict-detection logic is factored into a pure src/lib/messaging-conflict.ts with dependency injection and unit tests — no child_process or openshell imports, so it's trivially testable.

Scope notes

Option B from the issue ("only default sandbox polls") is intentionally not implemented. Silently disabling a feature the user explicitly enabled is worse UX than a clear warning. The issue listed all three expected results as OR alternatives; this PR picks the detect-and-warn branch.
Discord/Slack log-pattern matching is not included. The only verified log signature is Telegram's (from the issue reporter's actual output). Discord gateway and Slack Socket Mode have similar protocol-level conflicts but different log formats that belong to OpenClaw, not NemoClaw. The registry-based overlap warning covers Discord/Slack for prevention; log-based diagnosis for those platforms can be added when OpenClaw exposes a structured bridge-health signal.

Summary by CodeRabbit

New Features
- Detects when the same messaging bot token/channel is enabled in multiple sandboxes; status shows warnings and onboarding will prompt (or abort in non-interactive mode) to prevent conflicts.
- Sandboxes now persist selected messaging channels so status reflects configured channels.
Documentation
- Added a troubleshooting guide with symptoms, diagnosis steps, example log indicators, and remediation for messaging-bridge token conflicts.
Tests
- Added tests covering conflict detection, channel backfill, overlap reporting, and status warnings.

…1953) Telegram/Discord/Slack bot tokens only allow one consumer. When two NemoClaw sandboxes resolved the same token from credentials.json, both bridges entered a silent 409 loop visible only in /tmp/gateway.log. nemoclaw status reported "running" while no messages were delivered. - Persist messagingChannels on SandboxEntry so NemoClaw knows which channels each sandbox uses. - At onboard, warn and prompt [y/N] before creating a sandbox that would duplicate another sandbox's messaging channel; hard-exit in non-interactive. - At status, list every cross-sandbox channel overlap and continue to tail /tmp/gateway.log for Telegram's 409 signature on the default sandbox. - Backfill messagingChannels for pre-existing sandboxes by probing OpenShell for the expected *-telegram/discord/slack-bridge provider names, so users upgrading from earlier versions are protected on first run. - Document the conflict pattern and recovery steps in troubleshooting docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-16T20:03:23Z

📝 Walkthrough

Walkthrough

Adds cross-sandbox messaging-channel tracking, backfill, and conflict detection; surfaces overlap and gateway 409 conflict diagnostics in onboarding and status flows; records selected channels in the registry; includes troubleshooting docs and health-check helpers that inspect /tmp/gateway.log.

Changes

Cohort / File(s)	Summary
Documentation `.agents/skills/nemoclaw-user-reference/references/troubleshooting.md`, `docs/reference/troubleshooting.md`	New troubleshooting content describing single-consumer token conflicts (Telegram/Discord/Slack), diagnostic workflow (`openshell term <sandbox>` + `tail -f /tmp/gateway.log`), example 409 log lines, and remediation steps.
Messaging conflict library & tests `src/lib/messaging-conflict.ts`, `src/lib/messaging-conflict.test.ts`	New module and tests: KNOWN_CHANNELS, ConflictRegistry/Probe types, `backfillMessagingChannels`, `findChannelConflicts`, `findAllOverlaps`; tests cover detection, pairwise overlap emission, backfill probing, and error handling.
Registry schema `src/lib/registry.ts`	`SandboxEntry` gains optional `messagingChannels?: string[]`; `registerSandbox` persists `messagingChannels` (defaults to `[]`).
Onboard flow `src/lib/onboard.ts`	`createSandbox()` probes/backfills channels, computes conflicts, logs per-channel warnings, aborts non-interactively or prompts interactively, and stores `messagingChannels` on registration.
Status & inventory `src/lib/inventory-commands.ts`, `src/lib/inventory-commands.test.ts`	`ShowStatusCommandDeps` accepts `checkMessagingBridgeHealth` and `backfillAndFindOverlaps`; `showStatusCommand` re-fetches default sandbox, logs degraded-channel warnings from health checks and overlap warnings from backfill; tests added for these behaviors.
CLI helpers `src/nemoclaw.ts`	Adds `checkMessagingBridgeHealth(sandboxName, channels)` (tails/parses `/tmp/gateway.log` for Telegram conflict counts), `makeConflictProbe()` (gateway-liveness + providerExists probe), and `backfillAndFindOverlaps()` (invokes messaging-conflict backfill and overlap computation); wired into `showStatus`.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as NemoClaw CLI
    participant Registry as Registry
    participant ConflictLib as messaging-conflict
    participant Probe as provider probe
    participant Gateway as Sandbox Gateway
    participant Prompt as User Prompt

    User->>CLI: nemoclaw onboard --enable-channels
    CLI->>Registry: listSandboxes()
    Registry-->>CLI: sandboxes + default
    CLI->>ConflictLib: backfillMessagingChannels(registry, probe)
    ConflictLib->>Probe: providerExists("sandboxName"+suffix)
    Probe->>Gateway: openshell provider get (cached gateway liveness)
    Gateway-->>Probe: present/absent or error
    Probe-->>ConflictLib: present/absent/error
    ConflictLib->>Registry: updateSandbox(messagingChannels) (if no probe errors)
    Registry-->>ConflictLib: ok
    CLI->>ConflictLib: findChannelConflicts(currentSandbox, enabledChannels)
    ConflictLib-->>CLI: conflicts[]
    alt conflicts found
        CLI->>CLI: log warnings per channel referencing /tmp/gateway.log
        alt interactive
            CLI->>Prompt: Continue anyway? (y/N)
            Prompt-->>CLI: yes/no
            alt no
                CLI-->>User: abort (exit 1)
            else yes
                CLI->>Registry: registerSandbox(messagingChannels)
            end
        else non-interactive
            CLI-->>User: abort (exit 1)
        end
    else no conflicts
        CLI->>Registry: registerSandbox(messagingChannels)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through logs where tokens twined,

Found quiet fights that left bridges blind.
I probed the providers, counted every sigh—
One poll to rule them, now messages fly.
Nibble a carrot; the sandbox waves hi!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the primary change: detecting messaging bot-token conflicts across sandboxes during onboarding.
Linked Issues check	✅ Passed	The PR comprehensively addresses all coding objectives from issue `#1953`: onboard-time conflict detection with user prompts [`#1953`], status-time overlap warnings [`#1953`], gateway log conflict detection [`#1953`], and registry backfill for metadata across upgrades [`#1953`].
Out of Scope Changes check	✅ Passed	All changes are directly scoped to addressing issue `#1953` messaging-token conflict detection. Documentation updates explain the 409 pattern and recovery. Testing covers new utilities and integration with status command. No unrelated or tangential changes are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/messaging-bridge-conflict-1953

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Reuse the existing src/lib/onboard.ts providerExistsInGateway instead of maintaining a second copy in nemoclaw.ts, and remove the unused export on messaging-conflict KNOWN_CHANNELS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lib/onboard.ts (1)
3193-3201: ⚠️ Potential issue | 🟠 Major

Persist the channels that were actually configured, not the raw selections.

enabledChannels is the UI selection, not the final bridge set. If a user selects telegram and then skips the token prompt, activeMessagingChannels stays empty but Line 3201 still records ["telegram"]. That causes later status/overlap checks to warn about a bridge that was never attached, and the earlier conflict prompt has the same false-positive behavior. Use the filtered, token-backed channel list here and feed the same list into the conflict check above.
Suggested fix
   registry.registerSandbox({
     name: sandboxName,
     model: model || null,
     provider: provider || null,
     gpuEnabled: !!gpu,
     agent: agent ? agent.name : null,
     agentVersion: fromDockerfile ? null : effectiveAgent.expectedVersion || null,
     dangerouslySkipPermissions: dangerouslySkipPermissions || undefined,
-    messagingChannels: Array.isArray(enabledChannels) ? enabledChannels : [],
+    messagingChannels: activeMessagingChannels,
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 3193 - 3201, The code is persisting the raw
UI selection (enabledChannels) into registry.registerSandbox instead of the
token-backed, filtered list that actually got configured (e.g.,
activeMessagingChannels/activeMessagingBridgeSet); update the call to
registry.registerSandbox to set messagingChannels to the filtered active list
(ensure you use the same variable used for the bridge attachment logic, e.g.,
activeMessagingChannels or activeMessagingBridgeSet) and also pass that same
filtered list into the earlier conflict/status check so both persistence and
checks reflect the real configured channels rather than the raw UI selections.

🧹 Nitpick comments (1)

docs/reference/troubleshooting.md (1)

449-465: Split the new prose into one sentence per source line.

Lines 449 and 464 each pack multiple sentences onto one line, which makes future doc diffs harder to review.

As per coding guidelines, "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/reference/troubleshooting.md` around lines 449 - 465, The two source
lines containing multiple sentences should be split so each sentence occupies
its own source line: break the line beginning "Bot tokens for Telegram
(`getUpdates`), Discord (gateway), and Slack (Socket Mode) only allow one active
consumer per token. If two NemoClaw sandboxes..." into separate lines for each
sentence, and likewise break the line starting "To fix, remove the messaging
channel from the other sandbox (`nemoclaw remove <other-sandbox>` or rerun
onboarding on it with the channel disabled). Current NemoClaw warns at `nemoclaw
onboard` time when another sandbox already has the same channel enabled, but
sandboxes created before that check was added may still be in a conflict loop."
into individual-sentence lines; ensure punctuation is preserved and wrap each
sentence on its own source line to satisfy the "one sentence per line"
guideline.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/reference/troubleshooting.md`:
- Around line 453-456: Replace the fenced CLI code block tagged as ```bash with
a ```console block and add the prompt prefixes: change the two lines "openshell
term <sandbox-name>" and "tail -f /tmp/gateway.log" to include a leading "$ "
prompt so they read "$ openshell term <sandbox-name>" and "$ tail -f
/tmp/gateway.log"; ensure the fence opener becomes ```console and the fence
closer remains ```.

In `@src/lib/inventory-commands.ts`:
- Around line 129-130: The overlap warning message currently instructs users to
run the non-existent command "nemoclaw remove <sandbox>"; update the string to
instruct "Run `nemoclaw destroy <sandbox>`..." so the remediation command
matches the CLI; locate the string "Run `nemoclaw remove <sandbox>` on whichever
sandbox should stop polling, or rerun onboarding with the channel disabled." in
src/lib/inventory-commands.ts and replace "remove" with "destroy" (and adjust
any corresponding unit tests or snapshots that assert this message).

In `@src/lib/messaging-conflict.ts`:
- Around line 61-68: The current loop calls providerExists(providerName) and
swallows probe errors but still calls registry.updateSandbox(entry.name, {
messagingChannels: discovered }), which allows a transient probe failure to
persist an empty/partial list; instead, track if any providerExists call throws
(e.g., set a boolean probeError flag during the catch), and only call
registry.updateSandbox(entry.name, { messagingChannels: discovered }) when no
probe errors occurred (or when you have verified all probes completed
successfully); reference the existing symbols providerExists, discovered,
registry.updateSandbox, and entry.name and add a brief log when skipping the
update due to probe errors.

In `@src/nemoclaw.ts`:
- Around line 1062-1069: backfillAndFindOverlaps currently lets exceptions
bubble up from backfillMessagingChannels or findAllOverlaps which can break the
non-critical `nemoclaw status` command; wrap the calls in a try/catch in the
backfillAndFindOverlaps function, call backfillMessagingChannels(registry, {
providerExists: providerExistsInGateway }) and then findAllOverlaps(registry)
inside the try, and on any exception log or warn (using the existing logger) and
return an empty array so status remains usable; keep references to
backfillMessagingChannels, findAllOverlaps, registry, and
providerExistsInGateway so reviewers can locate the change.

---

Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 3193-3201: The code is persisting the raw UI selection
(enabledChannels) into registry.registerSandbox instead of the token-backed,
filtered list that actually got configured (e.g.,
activeMessagingChannels/activeMessagingBridgeSet); update the call to
registry.registerSandbox to set messagingChannels to the filtered active list
(ensure you use the same variable used for the bridge attachment logic, e.g.,
activeMessagingChannels or activeMessagingBridgeSet) and also pass that same
filtered list into the earlier conflict/status check so both persistence and
checks reflect the real configured channels rather than the raw UI selections.

---

Nitpick comments:
In `@docs/reference/troubleshooting.md`:
- Around line 449-465: The two source lines containing multiple sentences should
be split so each sentence occupies its own source line: break the line beginning
"Bot tokens for Telegram (`getUpdates`), Discord (gateway), and Slack (Socket
Mode) only allow one active consumer per token. If two NemoClaw sandboxes..."
into separate lines for each sentence, and likewise break the line starting "To
fix, remove the messaging channel from the other sandbox (`nemoclaw remove
<other-sandbox>` or rerun onboarding on it with the channel disabled). Current
NemoClaw warns at `nemoclaw onboard` time when another sandbox already has the
same channel enabled, but sandboxes created before that check was added may
still be in a conflict loop." into individual-sentence lines; ensure punctuation
is preserved and wrap each sentence on its own source line to satisfy the "one
sentence per line" guideline.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b23190cb-9518-4141-b17d-7debf65a7587

📥 Commits

Reviewing files that changed from the base of the PR and between dbc805c and 6d2cede.

📒 Files selected for processing (9)

.agents/skills/nemoclaw-user-reference/references/troubleshooting.md
docs/reference/troubleshooting.md
src/lib/inventory-commands.test.ts
src/lib/inventory-commands.ts
src/lib/messaging-conflict.test.ts
src/lib/messaging-conflict.ts
src/lib/onboard.ts
src/lib/registry.ts
src/nemoclaw.ts

coderabbitai · 2026-04-16T20:11:21Z

+        "    Run `nemoclaw remove <sandbox>` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.",
+      );


⚠️ Potential issue | 🟠 Major

Fix remediation command in overlap warning text.

The message suggests nemoclaw remove <sandbox>, but the CLI exposes destroy, not remove. This sends users to a non-existent command during recovery.

Suggested text fix

log( - " Run `nemoclaw remove <sandbox>` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.", + " Run `nemoclaw <sandbox> destroy` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.", );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

" Run `nemoclaw remove <sandbox>` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.",

);

" Run `nemoclaw <sandbox> destroy` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.",

);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/inventory-commands.ts` around lines 129 - 130, The overlap warning message currently instructs users to run the non-existent command "nemoclaw remove <sandbox>"; update the string to instruct "Run `nemoclaw destroy <sandbox>`..." so the remediation command matches the CLI; locate the string "Run `nemoclaw remove <sandbox>` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled." in src/lib/inventory-commands.ts and replace "remove" with "destroy" (and adjust any corresponding unit tests or snapshots that assert this message).

coderabbitai

♻️ Duplicate comments (1)

src/nemoclaw.ts (1)

1057-1065: ⚠️ Potential issue | 🟠 Major

Guard overlap backfill so status can’t fail on non-critical diagnostics.

backfillAndFindOverlaps() still lets exceptions bubble (e.g., require/probe/registry errors), which can break nemoclaw status. This path should degrade to [] instead of aborting status output.

Suggested hardening

 function backfillAndFindOverlaps() {
-  const {
-    backfillMessagingChannels,
-    findAllOverlaps,
-  } = require("./lib/messaging-conflict");
-  const { providerExistsInGateway } = require("./lib/onboard");
-  backfillMessagingChannels(registry, { providerExists: providerExistsInGateway });
-  return findAllOverlaps(registry);
+  try {
+    const { backfillMessagingChannels, findAllOverlaps } = require("./lib/messaging-conflict");
+    const { providerExistsInGateway } = require("./lib/onboard");
+    backfillMessagingChannels(registry, { providerExists: providerExistsInGateway });
+    return findAllOverlaps(registry);
+  } catch {
+    return [];
+  }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/nemoclaw.ts` around lines 1057 - 1065, Wrap the body of
backfillAndFindOverlaps (which requires ./lib/messaging-conflict and
./lib/onboard and calls backfillMessagingChannels(registry, { providerExists:
providerExistsInGateway }) and findAllOverlaps(registry)) in a try/catch so any
exception (require/probe/registry errors) is caught and the function returns an
empty array instead of throwing; log or noop the error as appropriate and ensure
the function signature and return type remain the same (still returning the
result of findAllOverlaps(registry) on success, [] on any failure).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/nemoclaw.ts`:
- Around line 1057-1065: Wrap the body of backfillAndFindOverlaps (which
requires ./lib/messaging-conflict and ./lib/onboard and calls
backfillMessagingChannels(registry, { providerExists: providerExistsInGateway })
and findAllOverlaps(registry)) in a try/catch so any exception
(require/probe/registry errors) is caught and the function returns an empty
array instead of throwing; log or noop the error as appropriate and ensure the
function signature and return type remain the same (still returning the result
of findAllOverlaps(registry) on success, [] on any failure).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 75c77505-2dd1-4102-b022-8f3de9f17f57

📥 Commits

Reviewing files that changed from the base of the PR and between 6d2cede and c52c65a.

📒 Files selected for processing (2)

src/lib/messaging-conflict.ts
src/nemoclaw.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/lib/messaging-conflict.ts

- Use console/\$ prompt syntax in troubleshooting docs (repo convention). - Correct remediation command in onboard + status warnings: nemoclaw does not expose 'remove'; the sandbox-scoped action is 'destroy'. - Do not persist partial backfill results when a probe throws — writing a partial/empty messagingChannels list set the field and prevented all future retries, permanently hiding real overlaps. - Swallow unexpected failures inside backfillAndFindOverlaps so a flaky gateway cannot break nemoclaw status for a non-critical warning path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wscurran · 2026-04-16T20:26:36Z

✨ Thanks for submitting this PR that proposes a fix for the onboard process to detect messaging bot-token conflicts, which could help prevent issues with multiple sandboxes using the same token.

Possibly related open issues:

#1953 [NemoClaw][brev]Telegram bridge: multiple sandboxes with same bot token cause 409 polling conflict — all bridges fail silently

wscurran · 2026-04-16T20:26:36Z

✨ Thanks for submitting this PR that proposes a fix for the onboard process to detect messaging bot-token conflicts, which could help prevent issues with multiple sandboxes using the same token.

Possibly related open issues:

#1953 [NemoClaw][brev]Telegram bridge: multiple sandboxes with same bot token cause 409 polling conflict — all bridges fail silently

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

docs/reference/troubleshooting.md (1)
449-465: Split this paragraph to one sentence per source line.

Line 449 and Line 464 each contain multiple sentences on one source line, which makes future docs diffs harder to review.

As per coding guidelines, "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/reference/troubleshooting.md` around lines 449 - 465, The paragraph that
starts with "Bot tokens for Telegram (`getUpdates`), Discord (gateway), and
Slack (Socket Mode) only allow one active consumer per token." and the paragraph
that ends with "Current NemoClaw warns at `nemoclaw onboard` time when another
sandbox already has the same channel enabled, but sandboxes created before that
check was added may still be in a conflict loop." should be rewritten so each
sentence occupies its own source line (one sentence per line), preserving the
code blocks and inline commands like `nemoclaw status`, `openshell term
<sandbox-name>`, `tail -f /tmp/gateway.log`, and `nemoclaw <other-sandbox>
destroy` exactly as-is. Ensure no sentences are merged on the same source line
anywhere in those paragraphs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/inventory-commands.ts`:
- Around line 119-151: The code reads the sandboxes snapshot into `sandboxes`
before calling `deps.backfillAndFindOverlaps()`, so any updates to a sandbox's
`messagingChannels` by that callback are not seen when computing `defaultEntry`
later; after `deps.backfillAndFindOverlaps()` returns (inside the if-block where
it's called), refresh or reassign `sandboxes` (or re-fetch via the appropriate
dependency) so the subsequent `const defaultEntry = sandboxes.find((sb) =>
sb.name === defaultSandbox)` and the
`deps.checkMessagingBridgeHealth(defaultSandbox, channels)` call use the
up-to-date sandbox state; specifically update the `sandboxes` variable or
recompute `defaultEntry` after backfill before running the bridge health checks.

In `@src/lib/messaging-conflict.ts`:
- Around line 17-20: The probe currently collapses execution failures into a
false "provider absent" signal; update the ConflictProbe.providerExists contract
and its callers so failures are distinct from a definitive false: change
providerExists to return boolean | "error" (or similar tri-state) and update
backfillMessagingChannels to treat the "error" result as an execution error
(skip or propagate and do not persist messagingChannels: []), not as provider
absent; also update providerExistsInGateway and other callers to return "error"
when the openshell/gateway command fails instead of returning false (this
applies to the providerExists usage around backfillMessagingChannels and the
related logic in the 51-76 region).

In `@src/lib/onboard.ts`:
- Around line 2702-2728: Compute the actual configured channels
(activeMessagingChannels) before the conflict check and use that instead of the
raw UI selection (enabledChannels) when calling findChannelConflicts and when
persisting to the registry; specifically, move the logic that derives
activeMessagingChannels (the filter that removes channels with missing/empty
tokens or that were skipped) above the block that calls
backfillMessagingChannels and findChannelConflicts, then pass
activeMessagingChannels into findChannelConflicts(sandboxName,
activeMessagingChannels, registry) and use activeMessagingChannels for the later
registry write (instead of enabledChannels) so warnings, aborts, and persisted
state all reflect the real attached providers.

---

Nitpick comments:
In `@docs/reference/troubleshooting.md`:
- Around line 449-465: The paragraph that starts with "Bot tokens for Telegram
(`getUpdates`), Discord (gateway), and Slack (Socket Mode) only allow one active
consumer per token." and the paragraph that ends with "Current NemoClaw warns at
`nemoclaw onboard` time when another sandbox already has the same channel
enabled, but sandboxes created before that check was added may still be in a
conflict loop." should be rewritten so each sentence occupies its own source
line (one sentence per line), preserving the code blocks and inline commands
like `nemoclaw status`, `openshell term <sandbox-name>`, `tail -f
/tmp/gateway.log`, and `nemoclaw <other-sandbox> destroy` exactly as-is. Ensure
no sentences are merged on the same source line anywhere in those paragraphs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b685d14b-b1fd-489d-97f6-8f8ee1e4624d

📥 Commits

Reviewing files that changed from the base of the PR and between c52c65a and 5c9374d.

📒 Files selected for processing (7)

.agents/skills/nemoclaw-user-reference/references/troubleshooting.md
docs/reference/troubleshooting.md
src/lib/inventory-commands.ts
src/lib/messaging-conflict.test.ts
src/lib/messaging-conflict.ts
src/lib/onboard.ts
src/nemoclaw.ts

✅ Files skipped from review due to trivial changes (1)

.agents/skills/nemoclaw-user-reference/references/troubleshooting.md

🚧 Files skipped from review as they are similar to previous changes (2)

src/nemoclaw.ts
src/lib/messaging-conflict.test.ts

coderabbitai · 2026-04-16T20:28:45Z

+export interface ConflictProbe {
+  // Returns true when a provider with this exact name exists in OpenShell.
+  providerExists: (name: string) => boolean;
+}


⚠️ Potential issue | 🟠 Major

Don't collapse probe failures into “provider absent”.

backfillMessagingChannels() only skips the write when providerExists() throws, but the current onboard caller passes providerExistsInGateway(), which returns false for any non-zero openshell provider get. That means a transient gateway failure can still persist messagingChannels: [], permanently suppressing later backfill retries and hiding real overlaps.

Possible fix

export interface ConflictProbe { - // Returns true when a provider with this exact name exists in OpenShell. - providerExists: (name: string) => boolean; + // Distinguish "missing" from probe/execution failures. + providerExists: (name: string) => "present" | "absent" | "error"; } ... for (const channel of KNOWN_CHANNELS) { const providerName = `${entry.name}${PROVIDER_SUFFIXES[channel]}`; - try { - if (probe.providerExists(providerName)) discovered.push(channel); - } catch { + const state = probe.providerExists(providerName); + if (state === "present") discovered.push(channel); + if (state === "error") { // Partial results can't be persisted: writing a partial/empty list // sets messagingChannels, preventing future retries and permanently // hiding real overlaps. Skip the write so we retry on next call. probeFailed = true; break; - } + } }

Then update callers to return "error" for command/execution failures instead of folding them into false.

Also applies to: 51-76

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/messaging-conflict.ts` around lines 17 - 20, The probe currently collapses execution failures into a false "provider absent" signal; update the ConflictProbe.providerExists contract and its callers so failures are distinct from a definitive false: change providerExists to return boolean | "error" (or similar tri-state) and update backfillMessagingChannels to treat the "error" result as an execution error (skip or propagate and do not persist messagingChannels: []), not as provider absent; also update providerExistsInGateway and other callers to return "error" when the openshell/gateway command fails instead of returning false (this applies to the providerExists usage around backfillMessagingChannels and the related logic in the 51-76 region).

coderabbitai · 2026-04-16T20:28:45Z

+  if (Array.isArray(enabledChannels) && enabledChannels.length > 0) {
+    const {
+      backfillMessagingChannels,
+      findChannelConflicts,
+    } = require("./messaging-conflict");
+    backfillMessagingChannels(registry, { providerExists: providerExistsInGateway });
+    const conflicts = findChannelConflicts(sandboxName, enabledChannels, registry);
+    if (conflicts.length > 0) {
+      for (const { channel, sandbox } of conflicts) {
+        console.log(
+          `  ⚠ Sandbox '${sandbox}' already has ${channel} enabled. Bot tokens only allow one sandbox to poll — continuing will break both bridges.`,
+        );
+      }
+      if (isNonInteractive()) {
+        console.error(
+          "  Aborting: resolve the messaging channel conflict above or run `nemoclaw <sandbox> destroy` on the other sandbox.",
+        );
+        process.exit(1);
+      }
+      const answer = (await promptOrDefault("  Continue anyway? [y/N]: ", null, "n"))
+        .trim()
+        .toLowerCase();
+      if (answer !== "y" && answer !== "yes") {
+        console.log("  Aborting sandbox creation.");
+        process.exit(1);
+      }
+    }


⚠️ Potential issue | 🟠 Major

Use the actually configured channels here, not the raw UI selection.

If the user selects a channel and then skips its token prompt, enabledChannels still contains that name. Line 2708 can then warn or abort on a conflict for a provider that will never be attached, and Line 3201 persists the same false positive into the registry, which also prevents later backfill from correcting it.

Possible fix

- const conflicts = findChannelConflicts(sandboxName, enabledChannels, registry); + const conflicts = findChannelConflicts(sandboxName, activeMessagingChannels, registry); ... - messagingChannels: Array.isArray(enabledChannels) ? enabledChannels : [], + messagingChannels: activeMessagingChannels,

This needs the activeMessagingChannels calculation moved above the conflict check so both the warning path and the registry write use the same source of truth.

Also applies to: 3201-3201

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 2702 - 2728, Compute the actual configured channels (activeMessagingChannels) before the conflict check and use that instead of the raw UI selection (enabledChannels) when calling findChannelConflicts and when persisting to the registry; specifically, move the logic that derives activeMessagingChannels (the filter that removes channels with missing/empty tokens or that were skipped) above the block that calls backfillMessagingChannels and findChannelConflicts, then pass activeMessagingChannels into findChannelConflicts(sandboxName, activeMessagingChannels, registry) and use activeMessagingChannels for the later registry write (instead of enabledChannels) so warnings, aborts, and persisted state all reflect the real attached providers.

Conflict, ConflictProbe, and ConflictRegistry are used only as internal parameter/return types within messaging-conflict.ts — no other module imports them. Mirrors the earlier KNOWN_CHANNELS export cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three correctness fixes surfaced by CodeRabbit after the first round: 1. Probe collapse: providerExistsInGateway uses ignoreError:true and returns false for any non-zero status, including gateway unreachable. The previous backfill guard only skipped partial writes when the probe threw — but our probe never threw, it just returned false. Change ConflictProbe to a tri-state contract (present/absent/error) and add an upfront openshell liveness check in both onboard and status call sites; probe returns "error" whenever the gateway is unreachable, so a transient failure no longer persists messagingChannels: [] and masks real overlaps. 2. Phantom channels: the onboard conflict check used the raw UI toggle list as enabledChannels, which includes channels the user selected but then skipped the token prompt for. Filter to channels with an actual token before the check so we don't warn about — or persist — phantoms. The registry write was already using the precise activeMessagingChannels computed from messagingTokenDefs further down. 3. Stale snapshot in status: showStatusCommand read `sandboxes` before backfillAndFindOverlaps ran, so a just-populated messagingChannels on the default sandbox wasn't visible to the subsequent bridge-health check on first run after upgrade. Re-fetch after backfill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (1)

src/nemoclaw.ts (1)

1071-1074: ⚠️ Potential issue | 🟠 Major

Don't fold provider get failures into "absent" here.

The upfront sandbox list check only proves the gateway was reachable once. If openshell provider get later fails due to a transport/auth/gateway error, this still returns "absent" and lets backfillMessagingChannels() persist messagingChannels: [], which hides real overlaps until the registry is edited again. Please classify non-NotFound failures from result.output as "error" instead of treating every non-zero exit as absence.

Suggested hardening

   return {
     providerExists: (name) => {
       if (!isGatewayAlive()) return "error";
       const result = captureOpenshell(["provider", "get", name], { ignoreError: true });
-      return result.status === 0 ? "present" : "absent";
+      if (result.status === 0) return "present";
+      const output = stripAnsi(result.output || "");
+      if (/\bNotFound\b|\bNot Found\b|provider .* not found|provider does not exist/i.test(output)) {
+        return "absent";
+      }
+      return "error";
     },
   };
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/nemoclaw.ts` around lines 1071 - 1074, The providerExists function
currently treats any non-zero exit from captureOpenshell(["provider","get",
name]) as "absent", which hides transport/auth/gateway errors and lets
backfillMessagingChannels persist messagingChannels: []; modify providerExists
to inspect the failing result from captureOpenshell (the returned object from
captureOpenshell) and only return "absent" when the output indicates a true
NotFound/does-not-exist case; for any other non-NotFound error (transport, auth,
gateway, unexpected status) return "error" so callers like
backfillMessagingChannels can avoid persisting an empty messagingChannels array
and surface the failure instead.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/nemoclaw.ts`:
- Around line 1071-1074: The providerExists function currently treats any
non-zero exit from captureOpenshell(["provider","get", name]) as "absent", which
hides transport/auth/gateway errors and lets backfillMessagingChannels persist
messagingChannels: []; modify providerExists to inspect the failing result from
captureOpenshell (the returned object from captureOpenshell) and only return
"absent" when the output indicates a true NotFound/does-not-exist case; for any
other non-NotFound error (transport, auth, gateway, unexpected status) return
"error" so callers like backfillMessagingChannels can avoid persisting an empty
messagingChannels array and surface the failure instead.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e4b665c2-ba3a-45ba-9c0e-dcf678a2ba51

📥 Commits

Reviewing files that changed from the base of the PR and between 5c9374d and 30c77d9.

📒 Files selected for processing (5)

src/lib/inventory-commands.ts
src/lib/messaging-conflict.test.ts
src/lib/messaging-conflict.ts
src/lib/onboard.ts
src/nemoclaw.ts

🚧 Files skipped from review as they are similar to previous changes (2)

src/lib/inventory-commands.ts
src/lib/onboard.ts

ericksoa

Looks great — well-structured, thorough tests, and the tri-state probe design is a nice touch. Approving.

One remaining CodeRabbit nit: in nemoclaw.ts:1071-1074, providerExists treats any non-zero exit as "absent" rather than distinguishing NotFound from transport/gateway errors. The liveness guard covers the main failure case, but it'd be nice to tighten that up in a follow-up PR.

cjagwani self-assigned this Apr 16, 2026

cjagwani requested a review from ericksoa April 16, 2026 20:08

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

NVIDIA deleted a comment from github-actions Bot Apr 16, 2026

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

jyaunches mentioned this pull request Apr 16, 2026

fix(onboard): propagate rotated messaging credentials to sandbox L7 proxy #1967

Merged

6 tasks

cjagwani and others added 2 commits April 16, 2026 13:19

Merge branch 'main' into fix/messaging-bridge-conflict-1953

a8d2760

wscurran added bug Something fails against expected or documented behavior NemoClaw CLI labels Apr 16, 2026

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

cjagwani and others added 3 commits April 16, 2026 13:29

Merge branch 'main' into fix/messaging-bridge-conflict-1953

cb8cef9

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

ericksoa approved these changes Apr 16, 2026

View reviewed changes

ericksoa merged commit fcdfeed into main Apr 16, 2026
15 checks passed

wscurran added area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression and removed NemoClaw CLI bug Something fails against expected or documented behavior labels Jun 3, 2026

coderabbitai Bot mentioned this pull request Jun 10, 2026

fix(messaging): warn/block second Slack Socket Mode sandbox on a shared gateway #5039

Merged

5 tasks

		" Run `nemoclaw remove <sandbox>` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.",
		);

Conversation

cjagwani commented Apr 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Scope notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

wscurran commented Apr 16, 2026

Uh oh!

wscurran commented Apr 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cjagwani commented Apr 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading