feat(egress): doctor + audit + Anthropic native — follow-up to #30179#35149
feat(egress): doctor + audit + Anthropic native — follow-up to #30179#35149Bartok9 wants to merge 1 commit into
Conversation
|
Really appreciate the One thing worth flagging for plugin authors building observability on top of Hermes: the The audit log schema already has the right field for this: Might be worth a note in the plugin authoring guide once this lands — something like "if you need to know which provider served a request, read |
Polish — improved description bullets• |
…search#30179 Three composable features on top of the iron-proxy egress firewall: 1. hermes egress doctor — 11-check end-to-end health check with --json/--check/--no-network, brew-doctor-style fix-it hints, and credential redaction in failure messages. 2. hermes egress audit — structured audit log viewer (tail -f / grep --since / stats / export json|csv) with first-time-host and >5%-403 anomaly detection. Pure functions iter_audit_log / aggregate_audit_stats / detect_audit_anomalies for testability. 3. Anthropic native (x-api-key) support via 'hermes egress setup --with-anthropic'. Adds a parallel secrets rule matching x-api-key, mints a TokenMapping for api.anthropic.com, and removes Anthropic from the uncovered/blocked sets when opted in. Off by default. TokenMapping gains an auth_header field (back-compatible default 'Authorization'). Tests: +50 (30 doctor, 11 audit, 9 anthropic) + 4 CLI dispatch tests. Existing iron-proxy suite unchanged (101 -> still pass). No new dependencies (stdlib + existing pyyaml/rich/openssl only).
1f7d24a to
ccc1168
Compare
Opened by Bartok at Daniel's request, in response to Teknium's invitation for a security review of the iron-proxy egress work — Teknium asked Catalin (@catalinmpit), who had deployed Hermes on a hardened Hetzner VPS (Tailscale/UFW/Cloudflare/fail2ban), to look over PR #30179. This is a new, complementary PR, not a review of #30179.
This PR answers that by adding operability and closing one of the largest LLM-specific scope cuts in #30179 — without trading away security or ease of use. Every feature is opt-in or read-only, stdlib-only, and composes with #30179 rather than rewriting it.
Why this complements #30179
PR #30179 deliberately scoped out three things to keep the core landing tight:
status+ tail logs to know if egress was actually wired.x-api-key) was left as warn-only "uncovered" — the single biggest LLM-specific gap, since Anthropic doesn't useAuthorization: Bearer.This PR fills exactly those three gaps. It builds on
feat/iron-proxyas a follow-up.What lands
1.
hermes egress doctor— end-to-end health checkA single read-only command that runs 11 checks (binary, CA expiry, config parse, mappings/env, daemon liveness, listening socket, per-host reachability, token-swap correctness, uncovered providers, docker DNS, SSRF/IMDS guard) and prints
brew doctor-style fix-it hints. Flags:--json(stable{checks[], summary{}}, exit 1 on any fail),--check NAME(repeatable),--no-network(CI/hermetic). Credentials in failure messages are redacted to ≤4 chars.2.
hermes egress audit— structured audit viewertail [-n N] [-f],grep PATTERN --since,stats --since,export --format json|csv. Anomaly detection instatssurfaces first-time upstream hosts (catches a quiet DNS-rebind to a newly-allowlisted host) and hosts with >5% 403 rate.-fuses a 250ms polling loop (no new dependency) and survives log rotation.--sincetakes30m/2h/7d/1w,today, or ISO-8601.3. Anthropic native (
x-api-key) — opt-in via--with-anthropicEmits a parallel
secretsrule withmatch_headers: ["x-api-key"], mints aTokenMappingscoped toapi.anthropic.com, and removesANTHROPIC_API_KEYfrom the uncovered/blocked sets when opted in. Off by default — the key may be used via OpenRouter (Bearer) rather than the native endpoint, and we won't break that flow silently.TokenMappinggains anauth_headerfield with a back-compatible"Authorization"default;mappings.jsonround-trips it and old files load cleanly.New surfaces
run_doctor(),DoctorReport,DoctorCheck,DOCTOR_CHECK_NAMESiter_audit_log,aggregate_audit_stats,detect_audit_anomalies,parse_since,audit_log_pathdiscover_xapikey_mappings,auth_headeronTokenMappinghermes egress doctor,hermes egress audit {tail,grep,stats,export},hermes egress setup --with-anthropicwebsite/docs/.../iron-proxy.md,cli-commands.mdFailure modes considered
datetime,socket,ssl,urllib,concurrent.futures,csv,re) plus existingpyyaml/rich/openssl. CA expiry is read viaopenssl x509 -enddate, not thecryptographypackage.doctorandauditnever start/stop/rewrite anything. No auto-rotate, no auto-restart.run_doctorreturns a single clear "not supported on Windows" fail, mirroring_platform_asset_name._redact_secretcaps failure-message tokens at last-4-chars; there's a test asserting the proxy token never appears in a 403 token-swap failure detail.hermes proxyOAuth aggregator, did not bump_IRON_PROXY_VERSION, did not rewrite any feat(egress): iron-proxy credential-injection firewall for sandboxes #30179 file beyond additive edits.Validation
hermes egress doctor --no-networkexit codeCoverage gaps
(real ones, not aspirational)
doctor'sreachabilityandtoken-swapchecks are exercised only with a monkeypatched_https_head_via_proxy— there's no live-network integration test (by design; the existing E2E test is gated behind a marker and uses a real binary). The actual HTTP-proxy wiring of_https_head_via_proxy(ProxyHandler + unverified SSL context) is not asserted end-to-end here.audit tail -ffollow-loop is not unit-tested (it's an infinite poll loop); only the non-follow path is covered.docker-dnscheck has no test that actually runs Docker — it's skipped/mocked. Behavior on a real Linux host with--add-hostis unverified by CI.{_raw}but thestats/anomaly aggregations key onupstream_host/status, which would then be empty.Ambiguity flags
(decisions made without confirming)
auth_headeronTokenMappingrather than a separate parallel list — chosen so a single mapping set can mix Bearer + x-api-key andbuild_proxy_configstays a straight loop. This changes the dataclass signature (additive, defaulted) and themappings.jsonschema (additive key, back-compat read).with_anthropicpersisted inconfig.yamlunderproxy.with_anthropicsodoctorandstartagree on whether Anthropic is covered. New config key.brew doctor.token-swapsemantics: treats 401 (and 200/404) as "swap fired, reached upstream" and 403 as "proxy refused / rule broken." This is the documented contract but depends on the proxy returning 403 specifically on allowlist/rule failure.