fix(gateway): trust adapter-owned access policy over env default-deny (#34515)#34572
Merged
Conversation
…#34515) Config-driven platform policies (dm_policy / group_policy / allow_from / group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without also setting a PLATFORM_ALLOWED_USERS env var. These adapters enforce their access policy at intake — a message is dropped inside the adapter and never dispatched unless it already passed the policy. The gateway's env-based check (_is_user_authorized) ran afterward and, with no env allowlist set, fell through to an env-only default-deny — silently rejecting `dm_policy: open` and config-only allowlists the adapter had already authorized. Rather than re-implement each adapter's policy a second time in run.py (which would drift), adapters that own their gate now declare it via a new BasePlatformAdapter.enforces_own_access_policy property (default False). The gateway trusts that flag and skips the env-only default-deny for those platforms. Env allowlists still take precedence when set. Also resolves unauthorized DM behavior from config dm_policy so allowlist / disabled policies drop unauthorized DMs silently instead of leaking pairing codes, while an explicit pairing policy opts back in. Co-authored-by: Frowtek <frowte3k@gmail.com>
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-import |
1 |
invalid-argument-type |
1 |
First entries
tests/gateway/test_config_driven_access_policy.py:25: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/gateway/test_config_driven_access_policy.py:96: [invalid-argument-type] invalid-argument-type: Argument to function `BasePlatformAdapter.enforces_own_access_policy` is incorrect: Expected `BasePlatformAdapter`, found `object`
✅ Fixed issues: none
Unchanged: 4904 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
… tests _adapter_enforces_own_access_policy accessed self.adapters directly, but several auth tests build a bare GatewayRunner via object.__new__ without setting .adapters (pitfalls.md #17). Read it defensively with getattr so a missing/empty adapter map means "no adapter owns the policy" instead of raising AttributeError. Fixes 4 tests: test_feishu_bot_auth_bypass, test_discord_bot_auth_bypass (x2), test_signal::test_signal_in_allowlist_maps.
Two CI flakes surfaced on PR #34572 (both in files this PR doesn't touch; pre-existing host-dependent flakes): 1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL), falling back to proc.kill() only on ProcessLookupError/PermissionError/ OSError. When the fake PID happens to exist on a busy host, os.getpgid succeeds, os.killpg fires against an UNRELATED real process group, and proc.kill() is never reached -> flaky AssertionError (and a real risk of SIGKILLing an innocent process group from a unit test). Patch os.getpgid to raise ProcessLookupError so the fallback path runs deterministically and no real killpg is ever issued. 2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls the blocking conn.receive_bytes() with no exception guard. Once the child prints its winsize and exits, the PTY closes; on a missed-marker run the next recv blocks until the 30s pytest-timeout instead of failing fast. Add a try/except break (matching the working sibling tests) and bump the child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first. Verified: 4/4 pass across 3 consecutive runs; root cause for #1 reproduced (os.getpgid(1) succeeds -> old code skips proc.kill).
17 tasks
KKT-OPT
pushed a commit
to KKT-OPT/hermes-agent
that referenced
this pull request
May 31, 2026
Two CI flakes surfaced on PR NousResearch#34572 (both in files this PR doesn't touch; pre-existing host-dependent flakes): 1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL), falling back to proc.kill() only on ProcessLookupError/PermissionError/ OSError. When the fake PID happens to exist on a busy host, os.getpgid succeeds, os.killpg fires against an UNRELATED real process group, and proc.kill() is never reached -> flaky AssertionError (and a real risk of SIGKILLing an innocent process group from a unit test). Patch os.getpgid to raise ProcessLookupError so the fallback path runs deterministically and no real killpg is ever issued. 2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls the blocking conn.receive_bytes() with no exception guard. Once the child prints its winsize and exits, the PTY closes; on a missed-marker run the next recv blocks until the 30s pytest-timeout instead of failing fast. Add a try/except break (matching the working sibling tests) and bump the child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first. Verified: 4/4 pass across 3 consecutive runs; root cause for NousResearch#1 reproduced (os.getpgid(1) succeeds -> old code skips proc.kill).
hechuyi
pushed a commit
to hechuyi/hermes-agent
that referenced
this pull request
Jun 6, 2026
Two CI flakes surfaced on PR NousResearch#34572 (both in files this PR doesn't touch; pre-existing host-dependent flakes): 1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL), falling back to proc.kill() only on ProcessLookupError/PermissionError/ OSError. When the fake PID happens to exist on a busy host, os.getpgid succeeds, os.killpg fires against an UNRELATED real process group, and proc.kill() is never reached -> flaky AssertionError (and a real risk of SIGKILLing an innocent process group from a unit test). Patch os.getpgid to raise ProcessLookupError so the fallback path runs deterministically and no real killpg is ever issued. 2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls the blocking conn.receive_bytes() with no exception guard. Once the child prints its winsize and exits, the PTY closes; on a missed-marker run the next recv blocks until the 30s pytest-timeout instead of failing fast. Add a try/except break (matching the working sibling tests) and bump the child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first. Verified: 4/4 pass across 3 consecutive runs; root cause for NousResearch#1 reproduced (os.getpgid(1) succeeds -> old code skips proc.kill). (cherry picked from commit 1c53d39)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Config-driven platform access policies (
dm_policy/group_policy/allow_from/group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without also setting aPLATFORM_ALLOWED_USERSenv var.Root cause: these adapters enforce their access policy at intake — a message is dropped inside the adapter and never dispatched unless it already passed the policy. The gateway's env-based check (
_is_user_authorized) ran afterward and, with no env allowlist set, fell through to an env-only default-deny — silently rejectingdm_policy: openand config-only allowlists the adapter had already authorized.Salvage of #34515 (@Frowtek) — same bug, rebuilt fix. Instead of re-implementing each adapter's policy a second time in
run.py(a duplicate decision surface that drifts), adapters that own their gate declare it via one capability flag and the gateway trusts it.Changes
gateway/platforms/base.py: newenforces_own_access_policyproperty (defaultFalse) — the single contract.gateway/platforms/{wecom,weixin,yuanbao,qqbot}: one-line override →True.gateway/run.py:_adapter_enforces_own_access_policy()helper + trust branch in_is_user_authorized— when no env allowlist is set and the adapter owns its policy, honor the adapter's decision instead of default-denying. Env allowlists still win when set.gateway/run.py:_get_unauthorized_dm_behaviorreads configdm_policysoallowlist/disableddrop unauthorized DMs silently (no pairing-code leak) whilepairingopts back in.tests/gateway/test_config_driven_access_policy.py: 20 tests across the base contract, adapter overrides, gateway trust, env-precedence, and DM-behavior resolution.scripts/release.py: AUTHOR_MAP entry for contributor credit.Why not the original shape
The original PR copied each adapter's policy evaluation (prefix stripping, case-folding, per-group sender allowlists, the four policy verbs) into
run.py— 437 lines, a second source of truth. This version is ~85 lines of logic and trusts the gate the adapter already ran. Adding a future own-policy adapter is one property override, not another copy.Validation
test_config_driven_access_policy.pyunauthorized_dm+ wecom/weixin/qqbot/platform_registrytest_yuanbao_pipeline.pyCloses #34515.
Infographic