[security] fix(gateway): require Graph webhook clientState#22353
[security] fix(gateway): require Graph webhook clientState#22353Hinotoi-agent wants to merge 3 commits into
Conversation
|
CI note: the The actual lint enforcement job passed, and the diff-report step in that same job reported no Ruff findings. The type-check diagnostics are advisory and the workflow log labels them as warnings / non-blocking. Local validation for this patch: python -m pytest tests/gateway/test_msgraph_webhook.py -q
python -m ruff check gateway/platforms/msgraph_webhook.py tests/gateway/test_msgraph_webhook.py
python -m py_compile gateway/platforms/msgraph_webhook.pyResult: |
e695014 to
444d4b1
Compare
|
Nice catch on the fail-open
Default-localhost (with explicit Happy to either:
|
|
Thanks, this makes sense. I agree the fail-open I think the secure-default shape should be:
I’m treating this as actionable for this PR rather than a separate follow-up, since it’s the same listener trust-boundary story. |
b6229a0 to
8071079
Compare
9b54c2c to
dfc519e
Compare
dfc519e to
8d3fbeb
Compare
…event silent built-in tool replacement The tool_override flag landed in v0.14.0 (NousResearch#26759) so plugins can replace a built-in tool with their own implementation. It works as advertised but there is no trust gate, so any enabled third-party plugin can silently override any built-in like shell_exec, write_file, or web_fetch and exfiltrate everything the agent invokes through it. The only trace is a DEBUG-level log line. Compare with ctx.llm (NousResearch#23194) which does gate the equivalent privilege escalation: overriding the provider requires plugins.entries.<id>.llm.allow_provider_override: true in config.yaml. The policy shape exists, it just was not extended to tool overrides. Fix: * Add PluginToolOverrideError(PermissionError) for the gate failure. * register_tool() now checks _tool_override_allowed(name) when override=True. Bundled plugins (manifest.source == 'bundled') are trusted by default. Every other source requires plugins.entries.<plugin_id>.allow_tool_override: true in config.yaml. * fail-closed: if config.yaml cannot be loaded for any reason, _tool_override_allowed returns False. Same posture as MSGraphWebhookAdapter.connect() in NousResearch#22353. Backwards compatibility: * Bundled plugins: no change (source == 'bundled' short-circuits the gate). * Third-party plugins not using override: no change (gate is only consulted when override=True). * Third-party plugins using override: registration fails until the operator opts in. The error message includes the exact config path to add, so the fix is one config edit away for legitimate use cases. Same migration path users went through for allow_provider_override after NousResearch#23194 landed. Regression tests: * tests/hermes_cli/test_plugins.py::test_register_tool_override_replaces_existing and ::test_register_tool_override_on_new_name_is_noop_path were written before the gate existed. Updated their test configs to include allow_tool_override: true under plugins.entries.<plugin_id>, mirroring how a legitimate operator would now grant the privilege. * New regression test ::test_register_tool_override_blocked_without_operator_opt_in exercises both the PluginManager-catches-error path (built-in tool is preserved, attacker plugin is skipped) and the direct-call path (PluginToolOverrideError is raised with a message that names the config key to set). Verified the test fails without this fix and passes with it. * All 73 tests in test_plugins.py continue to pass.
Summary
This PR hardens the Microsoft Graph webhook listener so externally reachable notification handling fails closed unless the receiver has a configured Graph
clientStateshared secret.client_statebefore themsgraph_webhooklistener starts accepting notification POSTs.clientState.Security issues covered
clientStaterequirement for Graph webhook notificationsmsgraph_webhookenabled but no shared secret could accept arbitrary well-formed notification POSTs that matched the configured resource allowlist and schedule them as internal Hermes events.0.0.0.0, making an incomplete webhook configuration easier to expose directly instead of behind an intentional reverse proxy/tunnel.Before this PR
_verify_client_state()returnedTruewhen no expectedclient_statewas configured.MSGraphWebhookAdapter.connect()could start the listener without a notification shared secret.0.0.0.0.client_state, but the code still allowed that insecure configuration.clientStatecomparisons but did not lock in fail-closed behavior for a missing expected secret.After this PR
connect()validates the security configuration before registering and starting the HTTP listener.client_statenow raises at startup instead of leaving a public POST endpoint in an unauthenticated mode._verify_client_state()returnsFalseif no expected secret exists, preserving fail-closed behavior if the adapter is constructed directly in tests or future code paths.127.0.0.1, keeping the listener local unless an operator explicitly chooses a wider bind address.Explicit operator choice
Secure default behavior:
Intentional public-bind behavior:
The listener can still be exposed through a reverse proxy, tunnel, or explicit public bind when the operator chooses that deployment model. The important change is that the webhook receiver no longer starts without the Graph
clientStatesecret that authenticates notification POSTs.Why this matters
Microsoft Graph webhooks are public HTTPS callbacks.
clientStateis Microsoft's documented shared-secret mechanism for letting a receiver distinguish legitimate Graph notifications from arbitrary internet POSTs. Accepting notifications without that secret turns the endpoint into an unauthenticated inbound event surface.The impact is integrity-focused: a forged notification can be accepted and scheduled as if it came from Graph. This PR does not claim code execution or data disclosure.
Attack flow
Affected code
clientStaterequirementgateway/platforms/msgraph_webhook.py,tests/gateway/test_msgraph_webhook.py,website/docs/user-guide/messaging/msgraph-webhook.mdgateway/platforms/msgraph_webhook.py,tests/gateway/test_msgraph_webhook.py,website/docs/user-guide/messaging/msgraph-webhook.mdRoot cause
Missing fail-closed
clientStaterequirement:Broad default listener bind address:
CVSS assessment
clientStaterequirementCVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:NRationale:
client_state.Safe reproduction steps
1. Missing
clientStatefail-closed behaviormsgraph_webhooklistener withoutclient_state._verify_client_state()returnTrue.client_stateis absent, and the POST verifier also rejects if the adapter is ever instantiated without one.2. Listener default bind address
hostoverride.0.0.0.0.127.0.0.1unless the operator explicitly configures a wider bind address.Expected vulnerable behavior
Before this PR, a deployment that enabled the Graph webhook listener without a
client_statecould accept arbitrary well-formed notification POSTs that matched the configured resource allowlist. The listener also defaulted to all interfaces, which made direct exposure easier when operators did not override the host.After this PR, that missing-secret configuration is rejected at startup, request-path verification fails closed, and the listener defaults to loopback.
Changes in this PR
_validate_security_config()and calls it fromMSGraphWebhookAdapter.connect()before the HTTP listener starts._verify_client_state()to returnFalsewhen no expected secret is configured.DEFAULT_HOSTfrom0.0.0.0to127.0.0.1.client_stateas required and describe the loopback default / explicit public-bind choice.sleep()mock seeing exactly one call.Files changed
gateway/platforms/msgraph_webhook.pytests/gateway/test_msgraph_webhook.pyclient_stateand default host behavior.tests/tools/test_mcp_stability.pytime.sleepcall-count assertion under full xdist CI.website/docs/user-guide/messaging/msgraph-webhook.mdclient_stateas required and clarifies loopback vs explicit public bind behavior.Maintainer impact
MSGRAPH_WEBHOOK_CLIENT_STATE/extra.client_stateare unaffected.host: "0.0.0.0"explicitly.Fix rationale
Requiring
client_stateat startup is the right boundary because it prevents an externally reachable webhook receiver from entering an unauthenticated notification mode. Keeping the same fail-closed check in_verify_client_state()provides defense in depth if future code constructs the adapter directly or bypassesconnect().Changing the default host to loopback makes accidental direct exposure less likely while preserving an explicit operator-controlled path for deployments that intentionally bind publicly or run behind network controls.
Type of change
Test plan
Executed with:
Results:
23 passedruff:All checks passed!py_compile: passedRemote CI note:
testcheck is currently failing and still needs separate CI triage; the PR-body update does not claim full remote CI is green.Disclosure notes