Skip to content

fix: prevent toolDiscovery from overwriting pinned channel registry#87333

Open
klioen wants to merge 3 commits into
openclaw:mainfrom
klioen:fix/tool-discovery-channel-registry-overwrite
Open

fix: prevent toolDiscovery from overwriting pinned channel registry#87333
klioen wants to merge 3 commits into
openclaw:mainfrom
klioen:fix/tool-discovery-channel-registry-overwrite

Conversation

@klioen

@klioen klioen commented May 27, 2026

Copy link
Copy Markdown

When tools.catalog is called from the frontend, it triggers ensureStandalonePluginToolRegistryLoaded with surface=channel and toolDiscovery=true. In toolDiscovery mode, the loaded registry contains no channels, but installStandaloneRegistry still overwrites the pinned channel registry with this empty registry, causing 'Outbound not configured for channel: feishu' errors.

Add a guard: if toolDiscovery is enabled and the registry has no channels, return early before the surface switch that would overwrite the channel registry.

Summary

What problem does this PR solve?

  • After gateway restart, any frontend page load calls tools.catalog, which triggers ensureStandalonePluginToolRegistryLoaded(surface="channel", toolDiscovery=true). In toolDiscovery mode the loaded registry has zero channels, but installStandaloneRegistry still calls pinActivePluginChannelRegistry with that empty registry, overwriting the pinned channel registry that was correctly set during gateway startup. This causes all outbound channel operations (feishu, discord, slack, etc.) to fail with "Outbound not configured for channel: ".

Why does this matter now?

  • This is a reliability regression: every time a user opens the frontend UI after a gateway restart, the channel registry is silently wiped. Any cron jobs, outbound messages, or channel-dependent features break until the next gateway restart — and even then, opening the frontend again re-triggers the wipe.

What is the intended outcome?

  • When toolDiscovery=true and the loaded registry has no channels, installStandaloneRegistry returns early after setActivePluginRegistry but before the switch (surface) block that would overwrite the pinned channel registry. The active registry is still updated (preserving tool discovery behavior), but the channel registry pin is left intact.

What is intentionally out of scope?

  • Changes to ensureStandalonePluginToolRegistryLoaded's hardcoded surface: "channel" parameter.
  • Changes to the cache key logic that causes toolDiscovery and full-mode loads to produce different cache keys.
  • Defensive fallbacks in getActivePluginChannelRegistry or the outbound delivery path.
  • The params.manifestRecord.manifest.id vs params.manifestRecord.id inconsistency in loader.ts (separate issue).

What does success look like?

  • After gateway restart, opening the frontend and calling tools.catalog no longer wipes the channel registry. Outbound channel operations (feishu, etc.) continue to work without requiring a second restart.

What should reviewers focus on?

  • Whether the guard condition toolDiscovery && registry.channels.length === 0 is the correct and complete check, or if there are other scenarios where a channel-less registry should still be allowed to overwrite the pin.
  • Whether returning after setActivePluginRegistry but before the switch (surface) block could break any downstream consumers that expect the channel pin to be updated even in toolDiscovery mode.
Summary guidance

This PR description is the contributor's durable explanation of the change. Write it for human maintainers first; ClawSweeper and Barnacle use the same text to understand intent, proof, risk, and current review state.

Describe the intent and outcome in 2-5 bullets. Avoid restating the diff; reviewers and bots can read the changed files.

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

Linked context

Which issue does this close?

Closes #

Which issues, PRs, or discussions are related?

Related #

Was this requested by a maintainer or owner?

No — discovered through live debugging of a production incident.

Linked context guidance

Link the issue, PR, discussion, maintainer request, or owner request that explains why this PR should exist. Maintainer context helps reviewers and automation distinguish intended work from drive-by churn.

Real behavior proof

  • Behavior or issue addressed:
    tools.catalog standalone tool-discovery loads should not overwrite the startup-pinned channel registry with an empty registry.

  • Real environment tested:
    Real OpenClaw gateway setup with a configured feishu channel.

  • Exact steps or command run after this patch:

    1. Configure the feishu channel in a real OpenClaw environment.
    2. Run openclaw gateway call channels.status.
    3. Confirm channels.status returns the configured/running feishu channel snapshot.
    4. Run openclaw gateway call tools.catalog.
    5. Run openclaw gateway call channels.status again.
    6. Compare the channel snapshot before and after tools.catalog.
  • Evidence after fix:
    The same real environment shows the regression before the fix and stable channel state after the fix.

    Real terminal output before/after fix
    # Before fix: channels.status is healthy before tools.catalog
    $ openclaw gateway call channels.status
    {
      "channelOrder": ["feishu"],
      "channelLabels": {
        "feishu": "Feishu"
      },
      "channelMeta": [
        {
          "id": "feishu",
          "label": "Feishu",
          "detailLabel": "Lark/Feishu (飞书)"
        }
      ],
      "channels": {
        "feishu": {
          "configured": true,
          "running": true,
          "lastError": null
        }
      },
      "channelAccounts": {
        "feishu": [
          {
            "accountId": "a-mppjjfy4ydga2m",
            "configured": true,
            "running": true,
            "brand": "feishu"
          },
          {
            "accountId": "default",
            "configured": true,
            "running": true,
            "brand": "feishu"
          }
        ]
      },
      "channelDefaultAccountId": {
        "feishu": "a-mppjjfy4ydga2m"
      }
    }
    
    # Before fix: tools.catalog triggers the bug
    $ openclaw gateway call tools.catalog
    { "...": "tools catalog returned" }
    
    $ openclaw gateway call channels.status
    {
      "channelOrder": [],
      "channelLabels": {},
      "channelMeta": [],
      "channels": {},
      "channelAccounts": {},
      "channelDefaultAccountId": {}
    }
    
    # After fix: tools.catalog no longer wipes the pinned channel registry
    $ openclaw gateway call tools.catalog
    { "...": "tools catalog returned" }
    
    $ openclaw gateway call channels.status
    {
      "channelOrder": ["feishu"],
      "channelLabels": {
        "feishu": "Feishu"
      },
      "channelMeta": [
        {
          "id": "feishu",
          "label": "Feishu",
          "detailLabel": "Lark/Feishu (飞书)"
        }
      ],
      "channels": {
        "feishu": {
          "configured": true,
          "running": true,
          "lastError": null
        }
      },
      "channelAccounts": {
        "feishu": [
          {
            "accountId": "a-mppjjfy4ydga2m",
            "configured": true,
            "running": true,
            "brand": "feishu"
          },
          {
            "accountId": "default",
            "configured": true,
            "running": true,
            "brand": "feishu"
          }
        ]
      },
      "channelDefaultAccountId": {
        "feishu": "a-mppjjfy4ydga2m"
      }
    }
    
  • Observed result after fix:
    After the patch, tools.catalog no longer clears the pinned channel registry. channels.status continues to return the configured/running feishu channel snapshot after tools.catalog runs.

  • What was not tested:
    I did not include a screen recording or multi-host deployment proof in this PR body.

  • Proof limitations or environment constraints:
    This proof is from a real manually configured environment and focuses on the feishu path that reproduced the bug.

Tests and validation

Which commands did you run?

  • pnpm build — verified the patch compiles without errors
  • Manual integration test on live deployment with diagnostic logging

What regression coverage was added or updated?

  • None — this is a guard against a runtime state corruption that is difficult to reproduce in unit tests without mocking the entire plugin loading pipeline

What failed before this fix, if known?

  • All outbound channel operations (feishu message delivery, cron.run) failed with "Outbound not configured for channel: feishu" after frontend page load triggered tools.catalog

If no test was added, why not?

  • The bug requires a full gateway runtime with plugin loading, channel binding, and a frontend WebSocket call to reproduce. The existing test infrastructure for standalone-runtime-registry-loader does not cover the interaction between toolDiscovery mode and pinActivePluginChannelRegistry. Adding a meaningful unit test would require significant mocking of the plugin loading pipeline, which may not provide reliable regression coverage.
Testing guidance

List focused commands, not every incidental check. CI is useful support, but external PRs still need real behavior proof above when behavior changes.

Risk checklist

Did user-visible behavior change? (Yes/No)

No — the fix prevents silent state corruption; no user-facing API or output changes

Did config, environment, or migration behavior change? (Yes/No)

No

Did security, auth, secrets, network, or tool execution behavior change? (Yes/No)

No

What is the highest-risk area?

  • The guard could theoretically prevent a legitimate channel registry update in toolDiscovery mode. However, toolDiscovery mode by design does not load channel surfaces (runtimeChannel=false), so a registry with channels.length === 0 in toolDiscovery mode is expected behavior, not a missing update.

How is that risk mitigated?

  • The guard only applies when both conditions are true: toolDiscovery=true AND registry.channels.length === 0. In full mode (toolDiscovery=false), the channel registry is still updated normally. In toolDiscovery mode with a non-empty channels array (which should not happen but is theoretically possible), the registry would also be updated normally.
Risk guidance

Use this for author judgment that is not obvious from the diff. ClawSweeper can see touched files, but it cannot know which behavior you think is risky, why the risk is acceptable, or what mitigation reviewers should verify.

Current review state

What is the next action?

  • Awaiting maintainer review

What is still waiting on author, maintainer, CI, or external proof?

  • CI checks need to pass; maintainer review needed

Which bot or reviewer comments were addressed?

  • N/A — first submission
Review state guidance

Keep this as the durable state for review progress. If useful information appears in comments, fold the current next action or blocker back here so maintainers and ClawSweeper do not need to reconstruct state from comment history.

Real behavior proof (Test output)
$ pnpm test src/plugins/runtime/standalone-runtime-registry-loader.test.ts

 RUN  v4.1.7 /Users/bytedance/Code/openclaw

 ✓ |plugins| src/plugins/runtime/standalone-runtime-registry-loader.test.ts (3 tests) 14ms
     ✓ reuses a compatible gateway startup registry for gateway-bindable dispatch load options 10ms
     ✓ loads a fresh registry when dispatch config is not startup-compatible 2ms
     ✓ does not pin a tool-discovery registry when channels are empty 2ms

 Test Files  1 passed (1)
      Tests  3 passed (3)
   Start at  20:32:02
   Duration  1.15s (transform 692ms, setup 110ms, import 950ms, tests 14ms, environment 0ms)

[test] starting test/vitest/vitest.plugins.config.ts
[test] passed 1 Vitest shard in 3.45s

@openclaw-barnacle openclaw-barnacle Bot added size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 11, 2026, 4:19 PM ET / 20:19 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +7, Tests +68. Total +75 across 2 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model internal, reasoning high; reviewed against 9827490f5f73.

Label changes

Label changes:

  • add rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
  • remove proof: sufficient: Current real behavior proof status is not_applicable, not sufficient.
  • remove P1: Current review triage priority is none.
  • remove rating: 🦞 diamond lobster: Current PR rating is rating: 🌊 off-meta tidepool, so this older rating label is no longer current.
  • remove merge-risk: 🚨 message-delivery: Current PR review selected no merge-risk labels.
  • remove merge-risk: 🚨 availability: Current PR review selected no merge-risk labels.
  • remove status: 👀 ready for maintainer look: Current PR status no longer selects a status label.

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +7, Tests +68. Total +75 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 7 0 +7
Tests 1 69 1 +68
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 76 1 +75

What I checked:

  • failure reason: retryable codex transport failure.
  • codex failure detail: Codex review failed for this PR with exit 1.
  • codex stderr: repo openclaw/clawsweeper --workflow "repair comment router" .

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@klioen

klioen commented May 27, 2026

Copy link
Copy Markdown
Author

@DmitryPogodaev Hi, could you help review this PR? It fixes a bug in your code

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@DmitryPogodaev DmitryPogodaev left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior proof: the early-return guard at line 39 prevents installStandaloneRegistry from reaching the surface switch when toolDiscovery=true and registry.channels is empty. Before the fix, the switch would fall through to overwrite the pinned channel registry with an empty one, causing Outbound not configured for channel: feishu errors.

The fix is minimal and correctly scoped — it only short-circuits when both conditions hold (toolDiscovery AND no channels), so normal plugin loading is unaffected.

LGTM.

@klioen

klioen commented May 28, 2026

Copy link
Copy Markdown
Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 28, 2026
@klioen

klioen commented May 28, 2026

Copy link
Copy Markdown
Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. labels May 28, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 29, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 29, 2026
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 3, 2026
@klioen

klioen commented Jun 3, 2026

Copy link
Copy Markdown
Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels Jun 8, 2026
@klioen klioen force-pushed the fix/tool-discovery-channel-registry-overwrite branch from a2b0901 to 25aedc0 Compare June 8, 2026 13:07
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 8, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 8, 2026
keliangliang added 3 commits June 8, 2026 21:49
When tools.catalog is called from the frontend, it triggers
ensureStandalonePluginToolRegistryLoaded with surface=channel and
toolDiscovery=true. In toolDiscovery mode, the loaded registry contains
no channels, but installStandaloneRegistry still overwrites the pinned
channel registry with this empty registry, causing 'Outbound not
configured for channel: feishu' errors.

Add a guard: if toolDiscovery is enabled and the registry has no
channels, return early before the surface switch that would overwrite
the channel registry.
@klioen klioen force-pushed the fix/tool-discovery-channel-registry-overwrite branch from 25aedc0 to a3b8244 Compare June 8, 2026 13:49
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 8, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 8, 2026
@klioen

klioen commented Jun 9, 2026

Copy link
Copy Markdown
Author

@frankekn Ready for merge. Please review when possible. Thank you!

@klioen

klioen commented Jun 11, 2026

Copy link
Copy Markdown
Author

@vincentkoc Ready for merge. Please review when possible. Thank you!

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants