Skip to content

fix(gateway): mark unconfigured platforms as non-retryable to stop reconnect loop (#31049)#31057

Open
dskwe wants to merge 2 commits into
NousResearch:mainfrom
dskwe:fix/unconfigured-plugin-reconnect
Open

fix(gateway): mark unconfigured platforms as non-retryable to stop reconnect loop (#31049)#31057
dskwe wants to merge 2 commits into
NousResearch:mainfrom
dskwe:fix/unconfigured-plugin-reconnect

Conversation

@dskwe

@dskwe dskwe commented May 23, 2026

Copy link
Copy Markdown
Contributor

Problem

When a platform plugin (e.g. hermes-discord) is listed in config.yaml but has no credentials configured (no bot token), the gateway enters an infinite reconnect loop instead of skipping the platform silently.

Closes #31049

Root Cause

Platform adapters (Discord, Telegram, Slack) return False from connect() when credentials are missing, but don't call _set_fatal_error(retryable=False). The gateway's startup code treats a bare False return (no fatal error set) as a transient failure and queues the platform for background reconnection with growing backoff — even though missing credentials will never self-resolve.

Fix

Call _set_fatal_error(retryable=False) before returning False in credential/dependency checks. The gateway already handles non-retryable fatal errors correctly — it logs the error and skips reconnection.

Affected platforms:

  • Discord: missing discord.py or bot token
  • Telegram: missing python-telegram-bot or bot token
  • Slack: missing slack-bolt, SLACK_BOT_TOKEN, or SLACK_APP_TOKEN

No behavioral change for platforms that have valid credentials or encounter transient network errors.

Changes

  • plugins/platforms/discord/adapter.py_set_fatal_error before return False for missing dependency/token
  • gateway/platforms/telegram.py_set_fatal_error before return False for missing dependency/token
  • gateway/platforms/slack.py_set_fatal_error before return False for missing dependency/tokens

How to Test

# New regression tests
python -m pytest tests/gateway/test_discord_connect.py -k "no_token or no_discord_lib" -v
python -m pytest tests/gateway/test_telegram_connect.py -v
python -m pytest tests/gateway/test_slack_connect.py -v

dskwe added 2 commits May 24, 2026 01:41
…connect loop (NousResearch#31049)

When platform adapters (Discord, Telegram, Slack) have no credentials
configured (missing token/dependency), their connect() returns False
without setting a fatal error. The gateway treats this as a transient
failure and queues the platform for background reconnection, causing
infinite retry loops with growing backoff.

Fix: call _set_fatal_error(retryable=False) before returning False in
credential/dependency checks so the gateway recognizes these as
permanent misconfigurations and skips them silently instead of retrying.

Affected platforms:
- Discord: missing discord.py or bot token
- Telegram: missing python-telegram-bot or bot token
- Slack: missing slack-bolt, SLACK_BOT_TOKEN, or SLACK_APP_TOKEN
…error (NousResearch#31049)

7 new tests across three platforms:
- Discord (2): missing discord.py, missing bot token
- Telegram (2): missing python-telegram-bot, missing bot token
- Slack (3): missing slack-bolt, missing SLACK_BOT_TOKEN, missing SLACK_APP_TOKEN

All verify that connect() sets has_fatal_error=True and
fatal_error_retryable=False so the gateway skips reconnection.
@jsboige

jsboige commented May 23, 2026

Copy link
Copy Markdown

PR Review - #31057

Summary

This PR fixes an infinite reconnect loop when a platform plugin is listed in config.yaml but lacks credentials or its dependency is not installed. The fix inserts _set_fatal_error(retryable=False) calls before each return False in the early-exit paths of connect() for Discord, Telegram, and Slack adapters. The gateway already handles non-retryable fatal errors by logging and skipping reconnection, so this is a minimal, well-targeted fix.

Analysis

Correctness: The pattern _set_fatal_error(code, message, retryable=False) is already used extensively across the codebase (base.py, sms.py, wecom.py, weixin.py, whatsapp.py, irc, teams, line, google_chat, qqbot, feishu, yuanbao). The seven insertion points in this PR follow the exact same signature and placement convention. The fix is correct -- without it, connect() returns False without setting a fatal error, and the gateway's startup code interprets that as a transient failure eligible for background reconnection with exponential backoff.

Edge cases verified:

  • Slack has two separate token checks (bot token from config, app token from env var). Both are covered individually in the production code and in tests.
  • The ordering is correct: _set_fatal_error is called before return False, ensuring the error state is visible to the caller.
  • No risk of double-setting: the early returns are mutually exclusive (dependency check fires before token checks, and token checks return independently).

Issues Found

WARNING (1)

File Line (approx) Issue Recommendation
tests/gateway/test_slack_connect.py 46-52 test_no_bot_token_sets_non_retryable_fatal both sets token="" in config AND deletes SLACK_BOT_TOKEN env var. The config token is raw_token = self.config.token which is already empty, so the env var deletion is redundant but harmless. Consider removing monkeypatch.delenv("SLACK_BOT_TOKEN", raising=False) for clarity -- or add a comment explaining it guards against env leakage in test parallelism. Nit, non-blocking.

INFO (2)

File Suggestion
All three test files Tests use asyncio.get_event_loop().run_until_complete() which is deprecated since Python 3.10. If the project targets 3.10+, consider asyncio.run() or a pytest-asyncio fixture in future PRs. Not blocking.
gateway/platforms/slack.py The error message for the bot token check says "SLACK_BOT_TOKEN not set" but the value comes from self.config.token, not directly from the env var. If config loading maps the env var to config.token this is fine, but the message could be slightly misleading if config can come from other sources. Low priority.

Points Positifs

  • Minimal and surgical. 7 lines of production code added, 0 changed or removed. Zero risk of regression for working configurations.
  • Test coverage is thorough. 7 new tests covering all 7 insertion points, each verifying the three fatal-error properties (has_fatal_error, retryable=False, correct error code).
  • PR body is excellent. Clear problem statement, root cause analysis, affected platforms enumerated, and reproduction steps provided.
  • Consistent pattern. The _set_fatal_error approach is already used in 10+ other adapters, so this follows established convention.

CI Status (partial)

  • ruff + ty lint: PASS
  • e2e: PASS
  • nix (ubuntu-latest): PASS
  • Build amd64/arm64, tests (1-6), nix (macos): PENDING at time of review

Recommendation

  • APPROVE -- Clean, minimal fix with proper test coverage for a real user-facing bug. The WARNING is cosmetic and non-blocking.

@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery comp/plugins Plugin system and bundled plugins platform/discord Discord bot adapter platform/telegram Telegram bot adapter platform/slack Slack app adapter P2 Medium — degraded but workaround exists labels May 23, 2026

@hal-blinc hal-blinc left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coach review: ready to merge. Verified clean merge state, no failing/pending checks, targeted review/tests passed or were confirmed by CI. Low-risk housekeeping fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery comp/plugins Plugin system and bundled plugins P2 Medium — degraded but workaround exists platform/discord Discord bot adapter platform/slack Slack app adapter platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unconfigured platform plugins should skip silently instead of reconnecting forever

4 participants