fix(gateway): trust adapter-owned access policy over env default-deny (#34515) by teknium1 · Pull Request #34572 · NousResearch/hermes-agent

teknium1 · 2026-05-29T10:54:22Z

Summary

Config-driven platform access policies (dm_policy / group_policy / allow_from / group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without also setting a PLATFORM_ALLOWED_USERS env var.

Root cause: these adapters enforce their access policy at intake — a message is dropped inside the adapter and never dispatched unless it already passed the policy. The gateway's env-based check (_is_user_authorized) ran afterward and, with no env allowlist set, fell through to an env-only default-deny — silently rejecting dm_policy: open and config-only allowlists the adapter had already authorized.

Salvage of #34515 (@Frowtek) — same bug, rebuilt fix. Instead of re-implementing each adapter's policy a second time in run.py (a duplicate decision surface that drifts), adapters that own their gate declare it via one capability flag and the gateway trusts it.

Changes

gateway/platforms/base.py: new enforces_own_access_policy property (default False) — the single contract.
gateway/platforms/{wecom,weixin,yuanbao,qqbot}: one-line override → True.
gateway/run.py: _adapter_enforces_own_access_policy() helper + trust branch in _is_user_authorized — when no env allowlist is set and the adapter owns its policy, honor the adapter's decision instead of default-denying. Env allowlists still win when set.
gateway/run.py: _get_unauthorized_dm_behavior reads config dm_policy so allowlist/disabled drop unauthorized DMs silently (no pairing-code leak) while pairing opts back in.
tests/gateway/test_config_driven_access_policy.py: 20 tests across the base contract, adapter overrides, gateway trust, env-precedence, and DM-behavior resolution.
scripts/release.py: AUTHOR_MAP entry for contributor credit.

Why not the original shape

The original PR copied each adapter's policy evaluation (prefix stripping, case-folding, per-group sender allowlists, the four policy verbs) into run.py — 437 lines, a second source of truth. This version is ~85 lines of logic and trusts the gate the adapter already ran. Adding a future own-policy adapter is one property override, not another copy.

Validation

Suite	Result
New `test_config_driven_access_policy.py`	20/20
`unauthorized_dm` + wecom/weixin/qqbot/platform_registry	337/337
`test_yuanbao_pipeline.py`	94/94
E2E (real adapter classes + gateway wiring)	own-policy→authorized, non-owning→denied, env-allowlist-excludes→denied, dm_policy resolution all correct

Closes #34515.

Infographic

…#34515) Config-driven platform policies (dm_policy / group_policy / allow_from / group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without also setting a PLATFORM_ALLOWED_USERS env var. These adapters enforce their access policy at intake — a message is dropped inside the adapter and never dispatched unless it already passed the policy. The gateway's env-based check (_is_user_authorized) ran afterward and, with no env allowlist set, fell through to an env-only default-deny — silently rejecting `dm_policy: open` and config-only allowlists the adapter had already authorized. Rather than re-implement each adapter's policy a second time in run.py (which would drift), adapters that own their gate now declare it via a new BasePlatformAdapter.enforces_own_access_policy property (default False). The gateway trusts that flag and skips the env-only default-deny for those platforms. Env allowlists still take precedence when set. Also resolves unauthorized DM behavior from config dm_policy so allowlist / disabled policies drop unauthorized DMs silently instead of leaking pairing codes, while an explicit pairing policy opts back in. Co-authored-by: Frowtek <frowte3k@gmail.com>

github-actions · 2026-05-29T10:55:17Z

🔎 Lint report: `hermes/hermes-0675370b` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9449 on HEAD, 9447 on base (🆕 +2)

🆕 New issues (2):

Rule	Count
`unresolved-import`	1
`invalid-argument-type`	1

First entries

tests/gateway/test_config_driven_access_policy.py:25: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/gateway/test_config_driven_access_policy.py:96: [invalid-argument-type] invalid-argument-type: Argument to function `BasePlatformAdapter.enforces_own_access_policy` is incorrect: Expected `BasePlatformAdapter`, found `object`

✅ Fixed issues: none

Unchanged: 4904 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

… tests _adapter_enforces_own_access_policy accessed self.adapters directly, but several auth tests build a bare GatewayRunner via object.__new__ without setting .adapters (pitfalls.md #17). Read it defensively with getattr so a missing/empty adapter map means "no adapter owns the policy" instead of raising AttributeError. Fixes 4 tests: test_feishu_bot_auth_bypass, test_discord_bot_auth_bypass (x2), test_signal::test_signal_in_allowlist_maps.

Two CI flakes surfaced on PR #34572 (both in files this PR doesn't touch; pre-existing host-dependent flakes): 1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL), falling back to proc.kill() only on ProcessLookupError/PermissionError/ OSError. When the fake PID happens to exist on a busy host, os.getpgid succeeds, os.killpg fires against an UNRELATED real process group, and proc.kill() is never reached -> flaky AssertionError (and a real risk of SIGKILLing an innocent process group from a unit test). Patch os.getpgid to raise ProcessLookupError so the fallback path runs deterministically and no real killpg is ever issued. 2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls the blocking conn.receive_bytes() with no exception guard. Once the child prints its winsize and exits, the PTY closes; on a missed-marker run the next recv blocks until the 30s pytest-timeout instead of failing fast. Add a try/except break (matching the working sibling tests) and bump the child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first. Verified: 4/4 pass across 3 consecutive runs; root cause for #1 reproduced (os.getpgid(1) succeeds -> old code skips proc.kill).

Two CI flakes surfaced on PR NousResearch#34572 (both in files this PR doesn't touch; pre-existing host-dependent flakes): 1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL), falling back to proc.kill() only on ProcessLookupError/PermissionError/ OSError. When the fake PID happens to exist on a busy host, os.getpgid succeeds, os.killpg fires against an UNRELATED real process group, and proc.kill() is never reached -> flaky AssertionError (and a real risk of SIGKILLing an innocent process group from a unit test). Patch os.getpgid to raise ProcessLookupError so the fallback path runs deterministically and no real killpg is ever issued. 2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls the blocking conn.receive_bytes() with no exception guard. Once the child prints its winsize and exits, the PTY closes; on a missed-marker run the next recv blocks until the 30s pytest-timeout instead of failing fast. Add a try/except break (matching the working sibling tests) and bump the child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first. Verified: 4/4 pass across 3 consecutive runs; root cause for NousResearch#1 reproduced (os.getpgid(1) succeeds -> old code skips proc.kill).

Two CI flakes surfaced on PR NousResearch#34572 (both in files this PR doesn't touch; pre-existing host-dependent flakes): 1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL), falling back to proc.kill() only on ProcessLookupError/PermissionError/ OSError. When the fake PID happens to exist on a busy host, os.getpgid succeeds, os.killpg fires against an UNRELATED real process group, and proc.kill() is never reached -> flaky AssertionError (and a real risk of SIGKILLing an innocent process group from a unit test). Patch os.getpgid to raise ProcessLookupError so the fallback path runs deterministically and no real killpg is ever issued. 2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls the blocking conn.receive_bytes() with no exception guard. Once the child prints its winsize and exits, the PTY closes; on a missed-marker run the next recv blocks until the 30s pytest-timeout instead of failing fast. Add a try/except break (matching the working sibling tests) and bump the child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first. Verified: 4/4 pass across 3 consecutive runs; root cause for NousResearch#1 reproduced (os.getpgid(1) succeeds -> old code skips proc.kill). (cherry picked from commit 1c53d39)

teknium1 added 2 commits May 29, 2026 03:59

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/wecom WeCom / WeChat Work adapter platform/qqbot QQ Bot adapter labels May 29, 2026

teknium1 merged commit 1c53d39 into main May 29, 2026
25 checks passed

teknium1 deleted the hermes/hermes-0675370b branch May 29, 2026 11:22

teknium1 mentioned this pull request May 29, 2026

fix(gateway): honor config-driven access policies over legacy env fallbacks #34515

Closed

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): trust adapter-owned access policy over env default-deny (#34515)#34572

fix(gateway): trust adapter-owned access policy over env default-deny (#34515)#34572
teknium1 merged 3 commits into
mainfrom
hermes/hermes-0675370b

teknium1 commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented May 29, 2026

Summary

Changes

Why not the original shape

Validation

Infographic

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: hermes/hermes-0675370b vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 29, 2026 •

edited

Loading

🔎 Lint report: `hermes/hermes-0675370b` vs `origin/main`