fix(onboard): detect messaging bot-token conflicts across sandboxes (#1953)#1979
Conversation
…1953) Telegram/Discord/Slack bot tokens only allow one consumer. When two NemoClaw sandboxes resolved the same token from credentials.json, both bridges entered a silent 409 loop visible only in /tmp/gateway.log. nemoclaw status reported "running" while no messages were delivered. - Persist messagingChannels on SandboxEntry so NemoClaw knows which channels each sandbox uses. - At onboard, warn and prompt [y/N] before creating a sandbox that would duplicate another sandbox's messaging channel; hard-exit in non-interactive. - At status, list every cross-sandbox channel overlap and continue to tail /tmp/gateway.log for Telegram's 409 signature on the default sandbox. - Backfill messagingChannels for pre-existing sandboxes by probing OpenShell for the expected *-telegram/discord/slack-bridge provider names, so users upgrading from earlier versions are protected on first run. - Document the conflict pattern and recovery steps in troubleshooting docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds cross-sandbox messaging-channel tracking, backfill, and conflict detection; surfaces overlap and gateway 409 conflict diagnostics in onboarding and status flows; records selected channels in the registry; includes troubleshooting docs and health-check helpers that inspect /tmp/gateway.log. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as NemoClaw CLI
participant Registry as Registry
participant ConflictLib as messaging-conflict
participant Probe as provider probe
participant Gateway as Sandbox Gateway
participant Prompt as User Prompt
User->>CLI: nemoclaw onboard --enable-channels
CLI->>Registry: listSandboxes()
Registry-->>CLI: sandboxes + default
CLI->>ConflictLib: backfillMessagingChannels(registry, probe)
ConflictLib->>Probe: providerExists("sandboxName"+suffix)
Probe->>Gateway: openshell provider get (cached gateway liveness)
Gateway-->>Probe: present/absent or error
Probe-->>ConflictLib: present/absent/error
ConflictLib->>Registry: updateSandbox(messagingChannels) (if no probe errors)
Registry-->>ConflictLib: ok
CLI->>ConflictLib: findChannelConflicts(currentSandbox, enabledChannels)
ConflictLib-->>CLI: conflicts[]
alt conflicts found
CLI->>CLI: log warnings per channel referencing /tmp/gateway.log
alt interactive
CLI->>Prompt: Continue anyway? (y/N)
Prompt-->>CLI: yes/no
alt no
CLI-->>User: abort (exit 1)
else yes
CLI->>Registry: registerSandbox(messagingChannels)
end
else non-interactive
CLI-->>User: abort (exit 1)
end
else no conflicts
CLI->>Registry: registerSandbox(messagingChannels)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Reuse the existing src/lib/onboard.ts providerExistsInGateway instead of maintaining a second copy in nemoclaw.ts, and remove the unused export on messaging-conflict KNOWN_CHANNELS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)
3193-3201:⚠️ Potential issue | 🟠 MajorPersist the channels that were actually configured, not the raw selections.
enabledChannelsis the UI selection, not the final bridge set. If a user selectstelegramand then skips the token prompt,activeMessagingChannelsstays empty but Line 3201 still records["telegram"]. That causes laterstatus/overlap checks to warn about a bridge that was never attached, and the earlier conflict prompt has the same false-positive behavior. Use the filtered, token-backed channel list here and feed the same list into the conflict check above.Suggested fix
registry.registerSandbox({ name: sandboxName, model: model || null, provider: provider || null, gpuEnabled: !!gpu, agent: agent ? agent.name : null, agentVersion: fromDockerfile ? null : effectiveAgent.expectedVersion || null, dangerouslySkipPermissions: dangerouslySkipPermissions || undefined, - messagingChannels: Array.isArray(enabledChannels) ? enabledChannels : [], + messagingChannels: activeMessagingChannels, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 3193 - 3201, The code is persisting the raw UI selection (enabledChannels) into registry.registerSandbox instead of the token-backed, filtered list that actually got configured (e.g., activeMessagingChannels/activeMessagingBridgeSet); update the call to registry.registerSandbox to set messagingChannels to the filtered active list (ensure you use the same variable used for the bridge attachment logic, e.g., activeMessagingChannels or activeMessagingBridgeSet) and also pass that same filtered list into the earlier conflict/status check so both persistence and checks reflect the real configured channels rather than the raw UI selections.
🧹 Nitpick comments (1)
docs/reference/troubleshooting.md (1)
449-465: Split the new prose into one sentence per source line.Lines 449 and 464 each pack multiple sentences onto one line, which makes future doc diffs harder to review.
As per coding guidelines, "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/reference/troubleshooting.md` around lines 449 - 465, The two source lines containing multiple sentences should be split so each sentence occupies its own source line: break the line beginning "Bot tokens for Telegram (`getUpdates`), Discord (gateway), and Slack (Socket Mode) only allow one active consumer per token. If two NemoClaw sandboxes..." into separate lines for each sentence, and likewise break the line starting "To fix, remove the messaging channel from the other sandbox (`nemoclaw remove <other-sandbox>` or rerun onboarding on it with the channel disabled). Current NemoClaw warns at `nemoclaw onboard` time when another sandbox already has the same channel enabled, but sandboxes created before that check was added may still be in a conflict loop." into individual-sentence lines; ensure punctuation is preserved and wrap each sentence on its own source line to satisfy the "one sentence per line" guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/reference/troubleshooting.md`:
- Around line 453-456: Replace the fenced CLI code block tagged as ```bash with
a ```console block and add the prompt prefixes: change the two lines "openshell
term <sandbox-name>" and "tail -f /tmp/gateway.log" to include a leading "$ "
prompt so they read "$ openshell term <sandbox-name>" and "$ tail -f
/tmp/gateway.log"; ensure the fence opener becomes ```console and the fence
closer remains ```.
In `@src/lib/inventory-commands.ts`:
- Around line 129-130: The overlap warning message currently instructs users to
run the non-existent command "nemoclaw remove <sandbox>"; update the string to
instruct "Run `nemoclaw destroy <sandbox>`..." so the remediation command
matches the CLI; locate the string "Run `nemoclaw remove <sandbox>` on whichever
sandbox should stop polling, or rerun onboarding with the channel disabled." in
src/lib/inventory-commands.ts and replace "remove" with "destroy" (and adjust
any corresponding unit tests or snapshots that assert this message).
In `@src/lib/messaging-conflict.ts`:
- Around line 61-68: The current loop calls providerExists(providerName) and
swallows probe errors but still calls registry.updateSandbox(entry.name, {
messagingChannels: discovered }), which allows a transient probe failure to
persist an empty/partial list; instead, track if any providerExists call throws
(e.g., set a boolean probeError flag during the catch), and only call
registry.updateSandbox(entry.name, { messagingChannels: discovered }) when no
probe errors occurred (or when you have verified all probes completed
successfully); reference the existing symbols providerExists, discovered,
registry.updateSandbox, and entry.name and add a brief log when skipping the
update due to probe errors.
In `@src/nemoclaw.ts`:
- Around line 1062-1069: backfillAndFindOverlaps currently lets exceptions
bubble up from backfillMessagingChannels or findAllOverlaps which can break the
non-critical `nemoclaw status` command; wrap the calls in a try/catch in the
backfillAndFindOverlaps function, call backfillMessagingChannels(registry, {
providerExists: providerExistsInGateway }) and then findAllOverlaps(registry)
inside the try, and on any exception log or warn (using the existing logger) and
return an empty array so status remains usable; keep references to
backfillMessagingChannels, findAllOverlaps, registry, and
providerExistsInGateway so reviewers can locate the change.
---
Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 3193-3201: The code is persisting the raw UI selection
(enabledChannels) into registry.registerSandbox instead of the token-backed,
filtered list that actually got configured (e.g.,
activeMessagingChannels/activeMessagingBridgeSet); update the call to
registry.registerSandbox to set messagingChannels to the filtered active list
(ensure you use the same variable used for the bridge attachment logic, e.g.,
activeMessagingChannels or activeMessagingBridgeSet) and also pass that same
filtered list into the earlier conflict/status check so both persistence and
checks reflect the real configured channels rather than the raw UI selections.
---
Nitpick comments:
In `@docs/reference/troubleshooting.md`:
- Around line 449-465: The two source lines containing multiple sentences should
be split so each sentence occupies its own source line: break the line beginning
"Bot tokens for Telegram (`getUpdates`), Discord (gateway), and Slack (Socket
Mode) only allow one active consumer per token. If two NemoClaw sandboxes..."
into separate lines for each sentence, and likewise break the line starting "To
fix, remove the messaging channel from the other sandbox (`nemoclaw remove
<other-sandbox>` or rerun onboarding on it with the channel disabled). Current
NemoClaw warns at `nemoclaw onboard` time when another sandbox already has the
same channel enabled, but sandboxes created before that check was added may
still be in a conflict loop." into individual-sentence lines; ensure punctuation
is preserved and wrap each sentence on its own source line to satisfy the "one
sentence per line" guideline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b23190cb-9518-4141-b17d-7debf65a7587
📒 Files selected for processing (9)
.agents/skills/nemoclaw-user-reference/references/troubleshooting.mddocs/reference/troubleshooting.mdsrc/lib/inventory-commands.test.tssrc/lib/inventory-commands.tssrc/lib/messaging-conflict.test.tssrc/lib/messaging-conflict.tssrc/lib/onboard.tssrc/lib/registry.tssrc/nemoclaw.ts
| " Run `nemoclaw remove <sandbox>` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.", | ||
| ); |
There was a problem hiding this comment.
Fix remediation command in overlap warning text.
The message suggests nemoclaw remove <sandbox>, but the CLI exposes destroy, not remove. This sends users to a non-existent command during recovery.
Suggested text fix
log(
- " Run `nemoclaw remove <sandbox>` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.",
+ " Run `nemoclaw <sandbox> destroy` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.",
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| " Run `nemoclaw remove <sandbox>` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.", | |
| ); | |
| " Run `nemoclaw <sandbox> destroy` on whichever sandbox should stop polling, or rerun onboarding with the channel disabled.", | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/inventory-commands.ts` around lines 129 - 130, The overlap warning
message currently instructs users to run the non-existent command "nemoclaw
remove <sandbox>"; update the string to instruct "Run `nemoclaw destroy
<sandbox>`..." so the remediation command matches the CLI; locate the string
"Run `nemoclaw remove <sandbox>` on whichever sandbox should stop polling, or
rerun onboarding with the channel disabled." in src/lib/inventory-commands.ts
and replace "remove" with "destroy" (and adjust any corresponding unit tests or
snapshots that assert this message).
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/nemoclaw.ts (1)
1057-1065:⚠️ Potential issue | 🟠 MajorGuard overlap backfill so
statuscan’t fail on non-critical diagnostics.
backfillAndFindOverlaps()still lets exceptions bubble (e.g., require/probe/registry errors), which can breaknemoclaw status. This path should degrade to[]instead of aborting status output.Suggested hardening
function backfillAndFindOverlaps() { - const { - backfillMessagingChannels, - findAllOverlaps, - } = require("./lib/messaging-conflict"); - const { providerExistsInGateway } = require("./lib/onboard"); - backfillMessagingChannels(registry, { providerExists: providerExistsInGateway }); - return findAllOverlaps(registry); + try { + const { backfillMessagingChannels, findAllOverlaps } = require("./lib/messaging-conflict"); + const { providerExistsInGateway } = require("./lib/onboard"); + backfillMessagingChannels(registry, { providerExists: providerExistsInGateway }); + return findAllOverlaps(registry); + } catch { + return []; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nemoclaw.ts` around lines 1057 - 1065, Wrap the body of backfillAndFindOverlaps (which requires ./lib/messaging-conflict and ./lib/onboard and calls backfillMessagingChannels(registry, { providerExists: providerExistsInGateway }) and findAllOverlaps(registry)) in a try/catch so any exception (require/probe/registry errors) is caught and the function returns an empty array instead of throwing; log or noop the error as appropriate and ensure the function signature and return type remain the same (still returning the result of findAllOverlaps(registry) on success, [] on any failure).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/nemoclaw.ts`:
- Around line 1057-1065: Wrap the body of backfillAndFindOverlaps (which
requires ./lib/messaging-conflict and ./lib/onboard and calls
backfillMessagingChannels(registry, { providerExists: providerExistsInGateway })
and findAllOverlaps(registry)) in a try/catch so any exception
(require/probe/registry errors) is caught and the function returns an empty
array instead of throwing; log or noop the error as appropriate and ensure the
function signature and return type remain the same (still returning the result
of findAllOverlaps(registry) on success, [] on any failure).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 75c77505-2dd1-4102-b022-8f3de9f17f57
📒 Files selected for processing (2)
src/lib/messaging-conflict.tssrc/nemoclaw.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/messaging-conflict.ts
- Use console/\$ prompt syntax in troubleshooting docs (repo convention). - Correct remediation command in onboard + status warnings: nemoclaw does not expose 'remove'; the sandbox-scoped action is 'destroy'. - Do not persist partial backfill results when a probe throws — writing a partial/empty messagingChannels list set the field and prevented all future retries, permanently hiding real overlaps. - Swallow unexpected failures inside backfillAndFindOverlaps so a flaky gateway cannot break nemoclaw status for a non-critical warning path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
✨ Thanks for submitting this PR that proposes a fix for the onboard process to detect messaging bot-token conflicts, which could help prevent issues with multiple sandboxes using the same token. Possibly related open issues: |
1 similar comment
|
✨ Thanks for submitting this PR that proposes a fix for the onboard process to detect messaging bot-token conflicts, which could help prevent issues with multiple sandboxes using the same token. Possibly related open issues: |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
docs/reference/troubleshooting.md (1)
449-465: Split this paragraph to one sentence per source line.Line 449 and Line 464 each contain multiple sentences on one source line, which makes future docs diffs harder to review.
As per coding guidelines, "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/reference/troubleshooting.md` around lines 449 - 465, The paragraph that starts with "Bot tokens for Telegram (`getUpdates`), Discord (gateway), and Slack (Socket Mode) only allow one active consumer per token." and the paragraph that ends with "Current NemoClaw warns at `nemoclaw onboard` time when another sandbox already has the same channel enabled, but sandboxes created before that check was added may still be in a conflict loop." should be rewritten so each sentence occupies its own source line (one sentence per line), preserving the code blocks and inline commands like `nemoclaw status`, `openshell term <sandbox-name>`, `tail -f /tmp/gateway.log`, and `nemoclaw <other-sandbox> destroy` exactly as-is. Ensure no sentences are merged on the same source line anywhere in those paragraphs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/inventory-commands.ts`:
- Around line 119-151: The code reads the sandboxes snapshot into `sandboxes`
before calling `deps.backfillAndFindOverlaps()`, so any updates to a sandbox's
`messagingChannels` by that callback are not seen when computing `defaultEntry`
later; after `deps.backfillAndFindOverlaps()` returns (inside the if-block where
it's called), refresh or reassign `sandboxes` (or re-fetch via the appropriate
dependency) so the subsequent `const defaultEntry = sandboxes.find((sb) =>
sb.name === defaultSandbox)` and the
`deps.checkMessagingBridgeHealth(defaultSandbox, channels)` call use the
up-to-date sandbox state; specifically update the `sandboxes` variable or
recompute `defaultEntry` after backfill before running the bridge health checks.
In `@src/lib/messaging-conflict.ts`:
- Around line 17-20: The probe currently collapses execution failures into a
false "provider absent" signal; update the ConflictProbe.providerExists contract
and its callers so failures are distinct from a definitive false: change
providerExists to return boolean | "error" (or similar tri-state) and update
backfillMessagingChannels to treat the "error" result as an execution error
(skip or propagate and do not persist messagingChannels: []), not as provider
absent; also update providerExistsInGateway and other callers to return "error"
when the openshell/gateway command fails instead of returning false (this
applies to the providerExists usage around backfillMessagingChannels and the
related logic in the 51-76 region).
In `@src/lib/onboard.ts`:
- Around line 2702-2728: Compute the actual configured channels
(activeMessagingChannels) before the conflict check and use that instead of the
raw UI selection (enabledChannels) when calling findChannelConflicts and when
persisting to the registry; specifically, move the logic that derives
activeMessagingChannels (the filter that removes channels with missing/empty
tokens or that were skipped) above the block that calls
backfillMessagingChannels and findChannelConflicts, then pass
activeMessagingChannels into findChannelConflicts(sandboxName,
activeMessagingChannels, registry) and use activeMessagingChannels for the later
registry write (instead of enabledChannels) so warnings, aborts, and persisted
state all reflect the real attached providers.
---
Nitpick comments:
In `@docs/reference/troubleshooting.md`:
- Around line 449-465: The paragraph that starts with "Bot tokens for Telegram
(`getUpdates`), Discord (gateway), and Slack (Socket Mode) only allow one active
consumer per token." and the paragraph that ends with "Current NemoClaw warns at
`nemoclaw onboard` time when another sandbox already has the same channel
enabled, but sandboxes created before that check was added may still be in a
conflict loop." should be rewritten so each sentence occupies its own source
line (one sentence per line), preserving the code blocks and inline commands
like `nemoclaw status`, `openshell term <sandbox-name>`, `tail -f
/tmp/gateway.log`, and `nemoclaw <other-sandbox> destroy` exactly as-is. Ensure
no sentences are merged on the same source line anywhere in those paragraphs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b685d14b-b1fd-489d-97f6-8f8ee1e4624d
📒 Files selected for processing (7)
.agents/skills/nemoclaw-user-reference/references/troubleshooting.mddocs/reference/troubleshooting.mdsrc/lib/inventory-commands.tssrc/lib/messaging-conflict.test.tssrc/lib/messaging-conflict.tssrc/lib/onboard.tssrc/nemoclaw.ts
✅ Files skipped from review due to trivial changes (1)
- .agents/skills/nemoclaw-user-reference/references/troubleshooting.md
🚧 Files skipped from review as they are similar to previous changes (2)
- src/nemoclaw.ts
- src/lib/messaging-conflict.test.ts
| export interface ConflictProbe { | ||
| // Returns true when a provider with this exact name exists in OpenShell. | ||
| providerExists: (name: string) => boolean; | ||
| } |
There was a problem hiding this comment.
Don't collapse probe failures into “provider absent”.
backfillMessagingChannels() only skips the write when providerExists() throws, but the current onboard caller passes providerExistsInGateway(), which returns false for any non-zero openshell provider get. That means a transient gateway failure can still persist messagingChannels: [], permanently suppressing later backfill retries and hiding real overlaps.
Possible fix
export interface ConflictProbe {
- // Returns true when a provider with this exact name exists in OpenShell.
- providerExists: (name: string) => boolean;
+ // Distinguish "missing" from probe/execution failures.
+ providerExists: (name: string) => "present" | "absent" | "error";
}
...
for (const channel of KNOWN_CHANNELS) {
const providerName = `${entry.name}${PROVIDER_SUFFIXES[channel]}`;
- try {
- if (probe.providerExists(providerName)) discovered.push(channel);
- } catch {
+ const state = probe.providerExists(providerName);
+ if (state === "present") discovered.push(channel);
+ if (state === "error") {
// Partial results can't be persisted: writing a partial/empty list
// sets messagingChannels, preventing future retries and permanently
// hiding real overlaps. Skip the write so we retry on next call.
probeFailed = true;
break;
- }
+ }
}Then update callers to return "error" for command/execution failures instead of folding them into false.
Also applies to: 51-76
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/messaging-conflict.ts` around lines 17 - 20, The probe currently
collapses execution failures into a false "provider absent" signal; update the
ConflictProbe.providerExists contract and its callers so failures are distinct
from a definitive false: change providerExists to return boolean | "error" (or
similar tri-state) and update backfillMessagingChannels to treat the "error"
result as an execution error (skip or propagate and do not persist
messagingChannels: []), not as provider absent; also update
providerExistsInGateway and other callers to return "error" when the
openshell/gateway command fails instead of returning false (this applies to the
providerExists usage around backfillMessagingChannels and the related logic in
the 51-76 region).
| if (Array.isArray(enabledChannels) && enabledChannels.length > 0) { | ||
| const { | ||
| backfillMessagingChannels, | ||
| findChannelConflicts, | ||
| } = require("./messaging-conflict"); | ||
| backfillMessagingChannels(registry, { providerExists: providerExistsInGateway }); | ||
| const conflicts = findChannelConflicts(sandboxName, enabledChannels, registry); | ||
| if (conflicts.length > 0) { | ||
| for (const { channel, sandbox } of conflicts) { | ||
| console.log( | ||
| ` ⚠ Sandbox '${sandbox}' already has ${channel} enabled. Bot tokens only allow one sandbox to poll — continuing will break both bridges.`, | ||
| ); | ||
| } | ||
| if (isNonInteractive()) { | ||
| console.error( | ||
| " Aborting: resolve the messaging channel conflict above or run `nemoclaw <sandbox> destroy` on the other sandbox.", | ||
| ); | ||
| process.exit(1); | ||
| } | ||
| const answer = (await promptOrDefault(" Continue anyway? [y/N]: ", null, "n")) | ||
| .trim() | ||
| .toLowerCase(); | ||
| if (answer !== "y" && answer !== "yes") { | ||
| console.log(" Aborting sandbox creation."); | ||
| process.exit(1); | ||
| } | ||
| } |
There was a problem hiding this comment.
Use the actually configured channels here, not the raw UI selection.
If the user selects a channel and then skips its token prompt, enabledChannels still contains that name. Line 2708 can then warn or abort on a conflict for a provider that will never be attached, and Line 3201 persists the same false positive into the registry, which also prevents later backfill from correcting it.
Possible fix
- const conflicts = findChannelConflicts(sandboxName, enabledChannels, registry);
+ const conflicts = findChannelConflicts(sandboxName, activeMessagingChannels, registry);
...
- messagingChannels: Array.isArray(enabledChannels) ? enabledChannels : [],
+ messagingChannels: activeMessagingChannels,This needs the activeMessagingChannels calculation moved above the conflict check so both the warning path and the registry write use the same source of truth.
Also applies to: 3201-3201
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/onboard.ts` around lines 2702 - 2728, Compute the actual configured
channels (activeMessagingChannels) before the conflict check and use that
instead of the raw UI selection (enabledChannels) when calling
findChannelConflicts and when persisting to the registry; specifically, move the
logic that derives activeMessagingChannels (the filter that removes channels
with missing/empty tokens or that were skipped) above the block that calls
backfillMessagingChannels and findChannelConflicts, then pass
activeMessagingChannels into findChannelConflicts(sandboxName,
activeMessagingChannels, registry) and use activeMessagingChannels for the later
registry write (instead of enabledChannels) so warnings, aborts, and persisted
state all reflect the real attached providers.
Conflict, ConflictProbe, and ConflictRegistry are used only as internal parameter/return types within messaging-conflict.ts — no other module imports them. Mirrors the earlier KNOWN_CHANNELS export cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three correctness fixes surfaced by CodeRabbit after the first round: 1. Probe collapse: providerExistsInGateway uses ignoreError:true and returns false for any non-zero status, including gateway unreachable. The previous backfill guard only skipped partial writes when the probe threw — but our probe never threw, it just returned false. Change ConflictProbe to a tri-state contract (present/absent/error) and add an upfront openshell liveness check in both onboard and status call sites; probe returns "error" whenever the gateway is unreachable, so a transient failure no longer persists messagingChannels: [] and masks real overlaps. 2. Phantom channels: the onboard conflict check used the raw UI toggle list as enabledChannels, which includes channels the user selected but then skipped the token prompt for. Filter to channels with an actual token before the check so we don't warn about — or persist — phantoms. The registry write was already using the precise activeMessagingChannels computed from messagingTokenDefs further down. 3. Stale snapshot in status: showStatusCommand read `sandboxes` before backfillAndFindOverlaps ran, so a just-populated messagingChannels on the default sandbox wasn't visible to the subsequent bridge-health check on first run after upgrade. Re-fetch after backfill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/nemoclaw.ts (1)
1071-1074:⚠️ Potential issue | 🟠 MajorDon't fold
provider getfailures into"absent"here.The upfront
sandbox listcheck only proves the gateway was reachable once. Ifopenshell provider getlater fails due to a transport/auth/gateway error, this still returns"absent"and letsbackfillMessagingChannels()persistmessagingChannels: [], which hides real overlaps until the registry is edited again. Please classify non-NotFound failures fromresult.outputas"error"instead of treating every non-zero exit as absence.Suggested hardening
return { providerExists: (name) => { if (!isGatewayAlive()) return "error"; const result = captureOpenshell(["provider", "get", name], { ignoreError: true }); - return result.status === 0 ? "present" : "absent"; + if (result.status === 0) return "present"; + const output = stripAnsi(result.output || ""); + if (/\bNotFound\b|\bNot Found\b|provider .* not found|provider does not exist/i.test(output)) { + return "absent"; + } + return "error"; }, }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nemoclaw.ts` around lines 1071 - 1074, The providerExists function currently treats any non-zero exit from captureOpenshell(["provider","get", name]) as "absent", which hides transport/auth/gateway errors and lets backfillMessagingChannels persist messagingChannels: []; modify providerExists to inspect the failing result from captureOpenshell (the returned object from captureOpenshell) and only return "absent" when the output indicates a true NotFound/does-not-exist case; for any other non-NotFound error (transport, auth, gateway, unexpected status) return "error" so callers like backfillMessagingChannels can avoid persisting an empty messagingChannels array and surface the failure instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/nemoclaw.ts`:
- Around line 1071-1074: The providerExists function currently treats any
non-zero exit from captureOpenshell(["provider","get", name]) as "absent", which
hides transport/auth/gateway errors and lets backfillMessagingChannels persist
messagingChannels: []; modify providerExists to inspect the failing result from
captureOpenshell (the returned object from captureOpenshell) and only return
"absent" when the output indicates a true NotFound/does-not-exist case; for any
other non-NotFound error (transport, auth, gateway, unexpected status) return
"error" so callers like backfillMessagingChannels can avoid persisting an empty
messagingChannels array and surface the failure instead.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e4b665c2-ba3a-45ba-9c0e-dcf678a2ba51
📒 Files selected for processing (5)
src/lib/inventory-commands.tssrc/lib/messaging-conflict.test.tssrc/lib/messaging-conflict.tssrc/lib/onboard.tssrc/nemoclaw.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/lib/inventory-commands.ts
- src/lib/onboard.ts
ericksoa
left a comment
There was a problem hiding this comment.
Looks great — well-structured, thorough tests, and the tri-state probe design is a nice touch. Approving.
One remaining CodeRabbit nit: in nemoclaw.ts:1071-1074, providerExists treats any non-zero exit as "absent" rather than distinguishing NotFound from transport/gateway errors. The liveness guard covers the main failure case, but it'd be nice to tighten that up in a follow-up PR.
Summary
Closes #1953. Telegram/Discord/Slack bot tokens only allow one consumer per token, globally — enforced by the platform, not by NemoClaw. Two NemoClaw sandboxes resolving the same token from
credentials.jsonboth start polling, each kicks the other off, and neither delivers messages.nemoclaw statusstill reports the bridge as running; the only evidence is a repeating 409 line inside/tmp/gateway.log.This PR adds three layers of defense inside NemoClaw. No OpenClaw changes needed.
What changed
[y/N]default No; hard-exit 1 in non-interactive mode.nemoclaw statuslists every cross-sandbox messaging-channel overlap so users who upgrade or click through the onboard prompt still see the problem.nemoclaw statustails/tmp/gateway.loginside the default sandbox and flags Telegram'sgetUpdates conflict/409 Conflictpattern asdegraded. Short timeout (3s) and silent on failure — it never breaks status.messagingChannelsfield onSandboxEntry. Pre-existing sandboxes have no record of their channels, so the check would miss conflicts after an upgrade; a lazy backfill probes OpenShell for${name}-{telegram,discord,slack}-bridgeproviders and fills the field. Runs on first onboard/status after upgrade./tmp/gateway.log, and how to recover.The conflict-detection logic is factored into a pure
src/lib/messaging-conflict.tswith dependency injection and unit tests — nochild_processoropenshellimports, so it's trivially testable.Scope notes
ORalternatives; this PR picks the detect-and-warn branch.Summary by CodeRabbit