Skip to content

Fix multi-account channel binding drift for #836#841

Merged
ashione merged 1 commit intoValueCell-ai:mainfrom
ashione:codex/issue-836-gateway-timeout
Apr 13, 2026
Merged

Fix multi-account channel binding drift for #836#841
ashione merged 1 commit intoValueCell-ai:mainfrom
ashione:codex/issue-836-gateway-timeout

Conversation

@ashione
Copy link
Copy Markdown
Contributor

@ashione ashione commented Apr 13, 2026

Summary

This PR fixes the routing/config drift behind issue #836 in multi-account channel setups.

What changed

  • fixed account-scoped channel rebinding so rebinding one account no longer removes sibling bindings on the same agent/channel type
  • migrated legacy channel-wide fallback routing into an explicit default account binding before saving or manually binding non-default accounts
  • added unit coverage for both the account-rebind path and the legacy manual-bind migration path
  • added an Electron E2E regression test that verifies newly added non-default Feishu accounts stay unassigned until the user explicitly binds an agent

Root cause

Issue #836 was not primarily a conversation-compression problem. The main failure mode was channel routing drift:

  • non-default channel accounts could still inherit legacy channel-wide fallback routing
  • rebinding one account could silently drop another account binding on the same agent
  • in multi-agent / multi-model setups, those routing errors surfaced as messages landing on the wrong agent/model and as intermittent gateway instability symptoms

Validation

  • pnpm test -- --run tests/unit/agent-config.test.ts tests/unit/channel-routes.test.ts
  • pnpm run typecheck
  • pnpm run build:vite
  • pnpm exec playwright test tests/e2e/channels-binding-regression.spec.ts

User impact

Users adding multiple channel accounts should no longer see new accounts silently route through the default agent, and rebinding one account should no longer disturb sibling account ownership.

@ashione ashione marked this pull request as ready for review April 13, 2026 05:42
Copilot AI review requested due to automatic review settings April 13, 2026 05:42
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f94db5b249

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread electron/api/routes/channels.ts Outdated
Comment on lines +329 to +331
const defaultOwner =
agents.channelAccountOwners?.[`${channelType}:default`]
|| agents.channelOwners?.[channelType];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict legacy migration to actual channel-wide bindings

migrateLegacyChannelWideBinding uses agents.channelOwners[channelType] as a migration source even when no legacy channel-wide binding exists. In snapshots, channelOwners is always populated (including channels that only have account-scoped bindings), so a non-default bind/save can unexpectedly recreate a default account binding that the user had intentionally removed. For example, if only telegram:alt is bound and telegram:default is unassigned, binding telegram:alt2 will silently assign telegram:default to channelOwners, changing routing for default-account traffic.

Useful? React with 👍 / 👎.

Comment thread electron/utils/agent-config.ts Outdated
match.accountId = normalizedAccountId;
}
nextBindings.push({ agentId, match });
nextBindings.push({ agentId: normalizedAgentId || agentId, match });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve agent ID casing when storing bindings

The new write path lowercases agentId (normalizedAgentId) before persisting bindings, which can break legacy/migrated configs that contain mixed-case agent IDs. Other paths still do exact-ID comparisons (for example deleteAgentConfig removes bindings via binding.agentId === agentId), so after rebinding such an agent, deletion can leave stale bindings pointing to a non-existent owner. This regression is introduced by writing normalized IDs instead of the original configured ID.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes routing/config drift in multi-account channel setups by preventing sibling account bindings from being dropped, and by migrating legacy channel-wide fallback routing into explicit default-account bindings to avoid unintended inheritance.

Changes:

  • Adjust binding upsert logic to preserve sibling account bindings for the same agent/channel.
  • Add migration step to convert legacy channel-wide fallback into an explicit :default binding before saving/binding non-default accounts.
  • Add unit + Electron Playwright E2E regression coverage for these scenarios.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
electron/utils/agent-config.ts Updates binding upsert semantics for account-scoped updates and normalizes stored binding agent/account IDs.
electron/api/routes/channels.ts Introduces legacy channel-wide fallback migration and invokes it in relevant save/bind paths.
tests/unit/agent-config.test.ts Updates unit assertion to ensure sibling account bindings are preserved.
tests/unit/channel-routes.test.ts Adds unit tests for migration behavior during manual bind and non-default account saves.
tests/e2e/channels-binding-regression.spec.ts Adds an E2E regression test ensuring new non-default Feishu accounts remain unassigned until explicitly bound.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread electron/api/routes/channels.ts Outdated
Comment on lines +324 to +336
async function migrateLegacyChannelWideBinding(
channelType: string,
agentsSnapshot?: Awaited<ReturnType<typeof listAgentsSnapshot>>,
): Promise<void> {
const agents = agentsSnapshot ?? await listAgentsSnapshot();
const defaultOwner =
agents.channelAccountOwners?.[`${channelType}:default`]
|| agents.channelOwners?.[channelType];

if (defaultOwner) {
await assignChannelAccountToAgent(defaultOwner, channelType, 'default');
}

Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrateLegacyChannelWideBinding derives defaultOwner from agents.channelOwners[channelType], but channelOwners is always populated (it falls back to the default agent even when there is no legacy channel-wide binding). This can cause unintended side effects: calling this migration can bind the default account to an arbitrary “primaryOwner” (e.g., the owner of some other account) even when there was no legacy channel-wide fallback to migrate. Consider detecting an actual legacy channel-wide binding from the stored config/bindings (or other explicit signal) and only then converting it into an explicit :default binding; otherwise skip the migration to avoid silently rebinding default routing.

Suggested change
async function migrateLegacyChannelWideBinding(
channelType: string,
agentsSnapshot?: Awaited<ReturnType<typeof listAgentsSnapshot>>,
): Promise<void> {
const agents = agentsSnapshot ?? await listAgentsSnapshot();
const defaultOwner =
agents.channelAccountOwners?.[`${channelType}:default`]
|| agents.channelOwners?.[channelType];
if (defaultOwner) {
await assignChannelAccountToAgent(defaultOwner, channelType, 'default');
}
function isRecord(value: unknown): value is Record<string, unknown> {
return !!value && typeof value === 'object' && !Array.isArray(value);
}
function findExplicitLegacyChannelWideOwner(
value: unknown,
channelType: string,
seen = new Set<object>(),
): string | undefined {
if (!isRecord(value) || seen.has(value)) return undefined;
seen.add(value);
const channelOwners = value.channelOwners;
if (
isRecord(channelOwners)
&& Object.prototype.hasOwnProperty.call(channelOwners, channelType)
&& typeof channelOwners[channelType] === 'string'
&& channelOwners[channelType].trim() !== ''
) {
return channelOwners[channelType] as string;
}
for (const nestedValue of Object.values(value)) {
const owner = findExplicitLegacyChannelWideOwner(nestedValue, channelType, seen);
if (owner) return owner;
}
return undefined;
}
async function readExplicitLegacyChannelWideOwner(channelType: string): Promise<string | undefined> {
const config = await readOpenClawConfig();
return findExplicitLegacyChannelWideOwner(config, channelType);
}
async function migrateLegacyChannelWideBinding(
channelType: string,
agentsSnapshot?: Awaited<ReturnType<typeof listAgentsSnapshot>>,
): Promise<void> {
const agents = agentsSnapshot ?? await listAgentsSnapshot();
const explicitDefaultOwner = agents.channelAccountOwners?.[`${channelType}:default`];
const explicitLegacyOwner = await readExplicitLegacyChannelWideOwner(channelType);
const defaultOwner = explicitDefaultOwner || explicitLegacyOwner;
if (defaultOwner) {
await assignChannelAccountToAgent(defaultOwner, channelType, 'default');
}
if (!explicitLegacyOwner) return;

Copilot uses AI. Check for mistakes.
Comment thread electron/utils/agent-config.ts Outdated
match.accountId = normalizedAccountId;
}
nextBindings.push({ agentId, match });
nextBindings.push({ agentId: normalizedAgentId || agentId, match });
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upsertBindingsForChannel now persists agentId as the normalized (trimmed/lowercased) value (normalizedAgentId || agentId). Elsewhere (e.g., agent deletion) bindings are removed by exact binding.agentId === agentId, so if an existing config contains a non-normalized agent id (e.g., mixed case), this change can leave orphaned bindings that won’t be cleaned up when deleting the agent. To avoid this mismatch, either keep the original agentId string in persisted bindings (use normalization only for comparisons) or ensure agent IDs are normalized consistently across config entries and all binding cleanup paths.

Suggested change
nextBindings.push({ agentId: normalizedAgentId || agentId, match });
nextBindings.push({ agentId, match });

Copilot uses AI. Check for mistakes.
@ashione ashione force-pushed the codex/issue-836-gateway-timeout branch from f94db5b to 27eb7b7 Compare April 13, 2026 06:00
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27eb7b7cda

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread electron/api/routes/channels.ts Outdated
Comment on lines +334 to +337
const defaultOwner = agents.channelAccountOwners?.[`${channelType}:default`] || legacyOwner;

if (defaultOwner) {
await assignChannelAccountToAgent(defaultOwner, channelType, 'default');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve agent ID casing during legacy binding migration

Fresh evidence: this new migration path now reads agents.channelAccountOwners and reuses it as defaultOwner. Those owner IDs are normalized to lowercase in snapshot building, so for mixed-case agent IDs (which are now explicitly supported in tests) assignChannelAccountToAgent can throw Agent "..." not found during migration because it checks IDs case-sensitively. In practice, saving or rebinding a non-default account with a legacy channel-wide binding will fail instead of migrating, leaving routing drift unresolved.

Useful? React with 👍 / 👎.

@ashione ashione force-pushed the codex/issue-836-gateway-timeout branch from 27eb7b7 to 389fd2b Compare April 13, 2026 06:53
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 389fd2bd0e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

await assignChannelAccountToAgent(body.agentId, resolveStoredChannelType(body.channelType), body.accountId);
const storedChannelType = resolveStoredChannelType(body.channelType);
if (body.accountId !== 'default') {
await migrateLegacyChannelWideBinding(storedChannelType);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate target agent before mutating legacy bindings

For PUT /api/channels/binding, migrateLegacyChannelWideBinding runs before assignChannelAccountToAgent, so a request with an invalid/stale agentId returns 500 but can still rewrite bindings (create default owner and clear channel-wide fallback). This is a behavioral regression from the previous flow where failed binds were non-mutating, and it can silently change routing when clients send stale agent IDs (for example after an agent was deleted or renamed).

Useful? React with 👍 / 👎.

@ashione ashione force-pushed the codex/issue-836-gateway-timeout branch from 389fd2b to 5d3b99e Compare April 13, 2026 07:04
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d3b99e17a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1203 to +1204
if (body.accountId !== 'default') {
await migrateLegacyChannelWideBinding(storedChannelType);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard missing account IDs before legacy migration

PUT /api/channels/binding now calls migrateLegacyChannelWideBinding whenever body.accountId !== 'default', but validateAccountIdOrReply currently accepts undefined account IDs (it only rejects empty trimmed strings). If a client omits accountId, this branch still runs, migrates/clears legacy bindings, and then assignChannelAccountToAgent crashes on accountId.trim(), returning 500 after a partial config mutation. This is a regression from the previous flow, where the same malformed request failed without rewriting bindings.

Useful? React with 👍 / 👎.

Comment thread electron/api/routes/channels.ts Outdated
Comment on lines +331 to +334
const defaultOwner = explicitDefaultOwner || legacyOwner;

if (defaultOwner) {
await assignChannelAccountToAgent(defaultOwner, channelType, 'default');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fallback to legacy owner when explicit default is stale

The migration prefers explicitDefaultOwner over legacyOwner unconditionally; if the explicit default binding points to a deleted/nonexistent agent, assignChannelAccountToAgent(defaultOwner, ...) throws and blocks all non-default binding saves/migrations for that channel. Because this path is meant to repair legacy routing drift, it should tolerate stale explicit defaults and fall back to the still-valid legacy owner instead of hard-failing.

Useful? React with 👍 / 👎.

@ashione ashione force-pushed the codex/issue-836-gateway-timeout branch from 5d3b99e to fedeb9c Compare April 13, 2026 07:15
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fedeb9c351

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread electron/api/routes/channels.ts Outdated
Comment on lines +339 to +341
const defaultOwner = explicitDefaultOwner && validAgentIds.has(explicitDefaultOwner)
? explicitDefaultOwner
: legacyOwner;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip invalid legacy owners during fallback migration

Validate legacyOwner against the current agent list before reassigning default. Right now the ternary falls back to legacyOwner even when that agent was deleted, so assignChannelAccountToAgent(defaultOwner, ...) throws and the whole non-default bind/save request returns 500. This is a regression for configs that still contain stale channel-wide bindings (for example after manual edits or older mixed-case cleanup issues), because rebinding any non-default account on that channel is blocked instead of migrating and continuing.

Useful? React with 👍 / 👎.

@ashione ashione force-pushed the codex/issue-836-gateway-timeout branch from fedeb9c to dc481cf Compare April 13, 2026 07:24
@ashione ashione changed the title [codex] Fix multi-account channel binding drift for #836 Fix multi-account channel binding drift for #836 Apr 13, 2026
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc481cfea6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return;
}

await migrateLegacyChannelWideBinding(storedChannelType);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh gateway after legacy migration on no-change saves

ensureScopedChannelBinding now calls migrateLegacyChannelWideBinding for non-default accounts, and that helper can rewrite bindings (assignChannelAccountToAgent + clearChannelBinding). When this runs via POST /api/channels/config's isSameConfigValues(...) path, the handler returns { noChange: true } without calling scheduleGatewayChannelSaveRefresh, so the file is mutated but the running gateway keeps old routing until a later restart/reload. This is a regression introduced by the new migration call and can leave users seeing unchanged live behavior after a “successful” save.

Useful? React with 👍 / 👎.

@ashione ashione merged commit b2c478d into ValueCell-ai:main Apr 13, 2026
5 checks passed
DigitalNomad-Chat added a commit to DigitalNomad-Chat/ClawX that referenced this pull request Apr 26, 2026
…overy (ValueCell-ai#855 ValueCell-ai#862 ValueCell-ai#867 ValueCell-ai#892 ValueCell-ai#841)

Channel health banner, diagnostics snapshot, gateway restart button,
status indicator styling, multi-account binding fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants