Skip to content

fix(gateway,plugins): make discord disable actually disable, stop reconnect storm (#30736)#30762

Open
xxxigm wants to merge 3 commits into
NousResearch:mainfrom
xxxigm:fix/30736-discord-disabled-still-connects
Open

fix(gateway,plugins): make discord disable actually disable, stop reconnect storm (#30736)#30762
xxxigm wants to merge 3 commits into
NousResearch:mainfrom
xxxigm:fix/30736-discord-disabled-still-connects

Conversation

@xxxigm

@xxxigm xxxigm commented May 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes #30736: when a user runs hermes plugins disable platforms/discord, the gateway still loaded the discord adapter, spammed ERROR hermes_plugins.discord_platform.adapter: [Discord] No bot token configured on every start AND every reconnect attempt, and queued discord into the reconnect watcher until the per-platform circuit breaker paused it after 10 failures.

Root cause is a plugin-key mismatch between the CLI and the loader. hermes plugins list / hermes plugins disable use the path-derived key platforms/discord, while PluginManager.discover_and_load scanned plugins/platforms/ with no category prefix and therefore registered the bundled adapter under its manifest name (discord-platform). The disabled-check OR'ed both forms but neither matched, so the plugin loaded as if nothing happened. The bug affects every bundled platform plugin (discord, teams, irc, line, simplex, google_chat).

This PR fixes the root cause and hardens the gateway against the same class of misconfiguration so future drift cannot reproduce the original symptom.

Related Issue

Closes #30736

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • hermes_cli/plugins.py — fix bundled platform key derivation + add helper (+39 / −10):

    • Drop "platforms" from skip_names so the recursive scanner picks up plugins/platforms/ as a category and produces path-derived keys (platforms/discord, platforms/teams, …).
    • Remove the now-redundant scan_directory(plugins/platforms) block that would otherwise produce duplicate entries under both keys.
    • Add is_platform_plugin_disabled(name) helper that resolves both the new path-derived key and the legacy manifest-name key, so callers don't re-implement the matching logic.
  • gateway/run.py — skip disabled platform plugins cleanly at startup (+32):

    • Before _create_adapter, call is_platform_plugin_disabled(platform.value). If true: log one INFO line that names the platform and the command to re-enable it (hermes plugins enable platforms/<name>) or silence the notice (platforms.<name>.enabled=false), mark runtime status as disabled, and skip the adapter entirely so no _failed_platforms entry / no reconnect watcher activity / no ERROR log spam.
    • Wrapped in try/except so a corrupt config can't crash startup — the gateway falls through to the legacy adapter path in that case.
  • tests/gateway/test_disabled_platform_plugin_30736.py — 17 new test cases across 3 classes (+472 lines):

    • TestBundledPlatformKeyAlignment — pins every bundled platform adapter under platforms/<name> so any future refactor that re-introduces the manifest-name key fails immediately.
    • TestDisableHonoursBothKeyForms — covers the new path-derived key AND the legacy manifest-name key (back-compat for hand-edited configs), plus is_platform_plugin_disabled() argument hardening ("", None).
    • TestGatewaySkipsDisabledPlatformPlugin — boots GatewayRunner with a sentinel _create_adapter that raises if invoked for the disabled platform; asserts the platform is skipped before adapter creation, a single actionable INFO log fires, no ERROR/WARNING leaks, _failed_platforms stays clean, the platform-config enabled=False short-circuit still runs first, and exceptions inside the lookup are caught.

How to Test

  1. Check out this branch and ensure .venv is set up: python3 -m venv .venv && source .venv/bin/activate && pip install -e ".[all,dev]"
  2. Run the new regression tests on their own:
    scripts/run_tests.sh tests/gateway/test_disabled_platform_plugin_30736.py -v
    
    Expected: 17 passed.
  3. Run the broader plugin + startup + discord adapter suite to confirm no cross-file regressions:
    scripts/run_tests.sh tests/hermes_cli/test_plugins.py tests/hermes_cli/test_plugins_cmd.py tests/gateway/test_runner_startup_failures.py tests/gateway/test_discord_connect.py tests/test_plugin_skills.py
    
    Expected: 196 passed.
  4. End-to-end manual verification of the original symptom:
    # Before the fix:
    hermes plugins disable platforms/discord
    hermes plugins list | grep discord     # prints `disabled`
    hermes gateway restart
    tail -f ~/.hermes/logs/gateway.log     # ERROR per startup + reconnect storm
    
    # After the fix:
    hermes plugins disable platforms/discord
    hermes plugins list | grep discord     # prints `disabled`
    hermes gateway restart
    tail -f ~/.hermes/logs/gateway.log
    # → single INFO: "Skipping platform 'discord': its plugin is disabled
    #    (run 'hermes plugins enable platforms/discord' to re-enable, or set
    #     platforms.discord.enabled=false in config.yaml to silence this notice)."
    # → no ERROR, no reconnect attempts, no circuit breaker.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(plugins): …, fix(gateway): …, test(gateway): …)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've run scripts/run_tests.sh tests/gateway/test_disabled_platform_plugin_30736.py and all tests pass
  • I've added tests for my changes (17 new test cases across 3 classes)
  • I've tested on my platform: macOS 15.2 (Darwin 24.6.0), Python 3.12

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A (no user-facing CLI/config change)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A (no new config keys)
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — tests are hermetic (monkeypatch + mocks, no real filesystem I/O outside tmp_path)
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

$ scripts/run_tests.sh tests/gateway/test_disabled_platform_plugin_30736.py -v
17 passed in 4.71s

$ scripts/run_tests.sh tests/hermes_cli/test_plugins.py tests/hermes_cli/test_plugins_cmd.py \
    tests/gateway/test_runner_startup_failures.py tests/gateway/test_discord_connect.py \
    tests/test_plugin_skills.py
196 passed in 5.18s

$ scripts/run_tests.sh tests/gateway/test_disabled_platform_plugin_30736.py \
    tests/gateway/test_runner_startup_failures.py tests/gateway/test_discord_connect.py \
    tests/gateway/test_discord_free_response.py tests/gateway/test_discord_channel_controls.py
90 passed in 6.63s

xxxigm added 3 commits May 23, 2026 12:05
…usResearch#30736)

`hermes plugins list` and `hermes plugins disable` always referred to
bundled platform plugins by their path-derived key (`platforms/discord`,
`platforms/teams`, …). But the loader scanned `plugins/platforms/` with
no category prefix, which keyed those same plugins by their manifest
name (`discord-platform`, `teams-platform`, …). The mismatch meant
`hermes plugins disable platforms/discord` saved `platforms/discord` to
`plugins.disabled`, the loader looked up `discord-platform`, no match
fired, and the plugin loaded as if nothing happened.

Drop `platforms` from `skip_names` and remove the redundant
`scan_directory(plugins/platforms)` so the recursive scanner picks the
directory up as a category. Bundled platform plugins now register under
`platforms/<name>`, matching what the CLI reports and what the disable
command writes.

Also expose `is_platform_plugin_disabled(name)` so the gateway runtime
can check either key form (path-derived or legacy manifest name)
without re-implementing the matching logic.
…Research#30736)

When a platform's `enabled` flag in config.yaml is still true but the
user has explicitly disabled the matching plugin (e.g.
`hermes plugins disable platforms/discord`), the startup loop now
short-circuits before `_create_adapter`. Without this guard the
adapter would be instantiated, `connect()` would fail with
"No bot token configured", the platform would be queued into the
reconnect watcher, and the user would see an ERROR-per-attempt log
storm forever after — exactly the behaviour reported in NousResearch#30736.

The skip path:

* logs one INFO line that names the platform and the command needed to
  re-enable it (`hermes plugins enable platforms/<name>`) or to silence
  the notice (`platforms.<name>.enabled=false`),
* records the runtime status as `disabled` so `gateway status` /
  dashboards reflect the actual situation,
* never enters `_failed_platforms`, so the reconnect watcher never
  fires for it.

Wrapped in `try/except` so a corrupt config can't crash startup — the
gateway falls back to the existing adapter path in that case.
…usResearch#30736)

17 tests across three classes:

* `TestBundledPlatformKeyAlignment` — pins every bundled platform
  adapter under `platforms/<name>` so any future refactor that
  reintroduces the manifest-name key fails immediately.
* `TestDisableHonoursBothKeyForms` — covers the new path-derived key
  AND the legacy manifest-name key (back-compat for hand-edited
  configs), plus `is_platform_plugin_disabled()` argument hardening.
* `TestGatewaySkipsDisabledPlatformPlugin` — boots `GatewayRunner` with
  a sentinel `_create_adapter` that raises if invoked for the disabled
  platform. Asserts the platform is skipped before adapter creation, a
  single actionable INFO log fires, no ERROR/WARNING leaks, the
  `_failed_platforms` reconnect queue stays clean, and that lookup
  failures fall through to the legacy adapter path.
@grepsuzette

grepsuzette commented May 24, 2026

Copy link
Copy Markdown

Sick of discord. But this PR is a bit too big...
Ideologically and practically, works as well rm -fr plugins/platforms/discord

@melroy89

Copy link
Copy Markdown

Sick of discord. But this PR is a bit too big... Ideologically and practically, works as well rm -fr plugins/platforms/discord

I'm afraid a simple hermes update will restore the discord platform bot stuff.

@aguysomewhere

aguysomewhere commented May 24, 2026

Copy link
Copy Markdown

Sick of discord. But this PR is a bit too big... Ideologically and practically, works as well rm -fr plugins/platforms/discord

So I tried this in my container and went out of my way to remove perms for the user:
Screenshot 2026-05-24 at 13 46 45

They restored when I brought the container up. I guess the init processes run as root. I would appreciate the merge but could also go for something leaner. I have zero knowledge of this code base but would love for Discord to go tf away.

Thanks all...

Edit: for now I volume mounted an empty directory as read only and that works - in case this helps anyone
Screenshot 2026-05-24 at 13 56 39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery comp/plugins Plugin system and bundled plugins P2 Medium — degraded but workaround exists platform/discord Discord bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: discord disabled, still trying to connect

5 participants