Skip to content

fix(message): make channel capability probing secret-safe#47520

Closed
vrknetha wants to merge 1 commit into
openclaw:mainfrom
vrknetha:fix/message-tool-secretref-probing
Closed

fix(message): make channel capability probing secret-safe#47520
vrknetha wants to merge 1 commit into
openclaw:mainfrom
vrknetha:fix/message-tool-secretref-probing

Conversation

@vrknetha

Copy link
Copy Markdown
Contributor

Summary

  • Problem: the message tool could crash during schema construction when an unrelated channel plugin eagerly resolved account secrets and hit an unresolved SecretRef.
  • Why it matters: an explicit send like channel=whatsapp could fail because Telegram/Slack capability probing ran first and threw before the send path.
  • What changed: explicit sends now skip unrelated channel inspection, and message capability probing now degrades safely when a plugin cannot resolve secrets.
  • What did NOT change (scope boundary): actual message dispatch still throws real send-time errors; this PR only hardens capability/schema probing.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Explicit message sends no longer fail just because another configured channel has an unresolved token SecretRef.
  • The message tool schema/capability probing now falls back safely instead of crashing the tool.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (Yes)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:
    • This changes how unresolved secret-backed channel capability probes fail: instead of crashing the whole tool, probing falls back safely and logs once per key. Actual dispatch remains strict, so real send-time errors are still surfaced.

Repro + Verification

Environment

  • OS: macOS 15.2 (arm64)
  • Runtime/container: local OpenClaw dev build
  • Model/provider: N/A
  • Integration/channel (if any): WhatsApp explicit send with unrelated Telegram/Slack SecretRef failures present
  • Relevant config (redacted): Telegram/Slack tokens resolved through file:localcache: refs that were unresolved in the active runtime snapshot

Steps

  1. Configure an unrelated channel account with an unresolved SecretRef token.
  2. Keep another channel (for example WhatsApp) otherwise working.
  3. Invoke the message tool for an explicit send on the healthy channel.

Expected

  • Explicit sends should not inspect unrelated channel configs.
  • Capability probing should not crash the whole message tool.

Actual

  • Tool/schema construction probed unrelated channel plugins first and crashed with unresolved SecretRef errors.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • pnpm test -- src/infra/outbound/channel-selection.test.ts src/channels/plugins/message-actions.test.ts
    • pnpm build
    • restarted the local gateway after rebuild
    • message.send WhatsApp dry-run succeeded after the fix while unresolved SecretRef warnings remained on unrelated channels
  • Edge cases checked:
    • explicit sends skip unrelated channel inspection
    • capability probing falls back safely instead of throwing
    • real dispatch path was left strict
  • What you did not verify:
    • end-to-end live sends on every channel plugin
    • full repo test suite clean pass; pnpm test currently hits an unrelated existing failure in src/media/store.redirect.test.ts on file-mode expectations (384 vs 420)

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this PR commit
  • Files/config to restore: src/channels/plugins/message-actions.ts, src/infra/outbound/channel-selection.ts
  • Known bad symptoms reviewers should watch for: message tool capability lists missing optional actions for a misconfigured channel instead of surfacing probe-time errors

Risks and Mitigations

  • Risk: safe capability fallback could hide a misconfigured channel during schema generation.
    • Mitigation: probe failures are still logged, and actual dispatch remains strict.

@greptile-apps

greptile-apps Bot commented Mar 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR hardens the message tool's capability probing and channel-selection paths against unresolved SecretRef errors thrown by unrelated channel plugins. It is a well-scoped, targeted bug fix: explicit/fallback channel sends no longer inspect unrelated plugins, and all action-probe calls are now wrapped in a deduplication-logged try/catch that falls back gracefully instead of crashing the whole tool. The actual dispatch path (dispatchChannelMessageAction) is intentionally left strict, so real send-time errors still surface.

Key changes and observations:

  • channel-selection.ts: For explicit and fallback resolutions, listConfiguredMessageChannels is no longer called (returns configured: []). All existing callers only consume .channel from the result, so this semantic change has no downstream impact.
  • message-actions.ts: runSafeActionProbe wraps every capability probe in a catch block and logs failures once per pluginId:probe:errorMessage key via a module-level Set. The deduplication is correct for production, but the Set is never cleared — future tests that spy on defaultRuntime.error may see unexpected silence after a first-run log.
  • The catch-all in runSafeActionProbe swallows all probe errors, not just SecretRef errors. A plugin programming bug (e.g. TypeError) after the first log will be silently suppressed on every subsequent probe call. Narrowing the tolerance to known/expected error types would keep the fix scoped to its stated purpose.
  • Test coverage is solid: both new test suites cover the exact failure scenarios described in the PR.

Confidence Score: 4/5

  • Safe to merge; the fix is logically correct and well-tested, with two minor style concerns around error tolerance scope and test-state isolation.
  • The core logic is correct: explicit/fallback channel sends now skip unrelated plugin inspection, and probe failures fall back safely. All callers only use .channel from the resolved selection, so the configured: [] change is safe. The two style concerns (catch-all error swallowing that could mask non-SecretRef bugs; module-level dedup Set that persists across tests) do not affect current correctness but could cause subtle issues as the codebase grows.
  • src/channels/plugins/message-actions.ts — review the catch-all error tolerance and module-level Set lifecycle.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/channels/plugins/message-actions.ts
Line: 101

Comment:
**Module-level error log Set is never cleared**

`loggedActionProbeErrors` is process-lifetime state that persists across the entire test suite run. The `afterEach` in `message-actions.test.ts` resets the plugin registry but not this Set. While the current tests pass (they don't spy on `defaultRuntime.error`), any future test that asserts `defaultRuntime.error` is called after a probe failure will silently fail if a previous test already inserted the same key.

Consider exporting a reset helper (behind a test flag or as a separate module export for test utilities), or clearing the Set as part of the test teardown:

```ts
// Exported only for test teardown
export function _resetActionProbeErrorLogForTest() {
  loggedActionProbeErrors.clear();
}
```

This keeps the deduplication intact for production while preserving test isolation.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/channels/plugins/message-actions.ts
Line: 103-115

Comment:
**Catch-all swallows non-SecretRef programming errors during probing**

`runSafeActionProbe` catches every thrown value, including bugs in plugin code (e.g. `TypeError: cannot read properties of undefined`). After the first occurrence, subsequent identical errors are silently dropped (deduplication key is in the module-level Set). This means a plugin developer could introduce a real programming error in `listActions` / `supportsButtons` / `supportsCards` and never see it again after the first restart.

Since the only intended tolerance is for unresolved `SecretRef` errors, consider narrowing the catch to a known error type or marker, and re-throwing unrecognised errors:

```ts
function isToleratedProbeError(err: unknown): boolean {
  const msg = err instanceof Error ? err.message : String(err);
  return msg.includes("SecretRef") || msg.includes("unresolved");
}

function runSafeActionProbe<T>(params: { ... }): T {
  try {
    return params.fn();
  } catch (err) {
    if (!isToleratedProbeError(err)) {
      throw err; // still surface real bugs
    }
    logActionProbeError(params.pluginId, params.probe, err);
    return params.fallback;
  }
}
```

This keeps the fix scoped to its stated purpose and avoids silently swallowing unexpected plugin errors.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 055251d

Comment thread src/channels/plugins/message-actions.ts
Comment thread src/channels/plugins/message-actions.ts
@vrknetha vrknetha force-pushed the fix/message-tool-secretref-probing branch from 8b35708 to 0666c84 Compare March 16, 2026 12:03
@vrknetha

Copy link
Copy Markdown
Contributor Author

Rebased onto latest main (ccba943). Resolved conflicts in all 3 files:

  • src/channels/plugins/message-actions.ts — merged upstream capability API with PR's SecretRef-safe probing
  • src/channels/plugins/message-actions.test.ts — kept dedup-log test with proper try/finally for spy cleanup
  • src/infra/outbound/channel-selection.ts — merged upstream availableExplicit path with PR's configured-channel optimization

All 20 tests in the changed files pass locally. The second fix commit (8b35708) was incorporated into the merge resolution and git dropped it as empty — changes are fully represented in the single rebased commit.

@joshavant

Copy link
Copy Markdown
Contributor

Thanks for the PR and for narrowing in on capability probing behavior. We landed #48728 as the broader cluster fix, which includes secret-safe probing plus scoped message SecretRef resolution and related doctor/status hardening. Closing this PR as superseded by #48728.

@joshavant joshavant closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

message tool crashes on unrelated channel SecretRef failures during schema probing

2 participants