Skip to content

Commit fa4e87b

Browse files
committed
fix(egress): v3 round — GodsBoy/stephenschoettler/arshkumarsingh findings
GodsBoy 2nd-round P1 (all 4 addressed): - _detect_docker_bridge_ip: replace `ip.count('.') == 3` heuristic with ipaddress.IPv4Address validation + reject unspecified/loopback/multicast/ reserved/link-local/global addresses. Hostile `ip` shim on PATH used to be able to inject 0.0.0.0 here and re-open INADDR_ANY binding. - cmd_setup credential_source preservation: re-running `hermes egress setup` without --from-bitwarden no longer silently downgrades a previous bitwarden config back to env. Require --no-bitwarden to switch explicitly; otherwise preserve the existing mode and surface the decision. - fail_on_uncovered_providers docstring/default mismatch: docstring used to claim default=True; behavior was default=False. Resolved by truth-in-advertising — docstring now correctly states default=False — AND splitting providers into a strict LLM-specific tier (_LLM_SPECIFIC_NON_BEARER_PROVIDERS, used by start blocking) and a generic uncovered tier (used by wizard warnings). Generic cloud creds (AWS_*, GOOGLE_APPLICATION_CREDENTIALS) no longer trip refuse-start for operators using terraform/gcloud alongside Hermes. New discover_blocked_providers() returns the strict subset. - start_proxy poll-loop must verify listening before pidfile: previously fell through deadline-expired as success and wrote a pidfile for a non-listening daemon. Refactored into a do-while shape, require `listening=True` for success, kill the child + unlink the pidfile on failure paths. GodsBoy 2nd-round P2 (the worth-keeping subset): - O_NOFOLLOW + 0o600 + st_uid check on iron-proxy.log open (symmetric with the pidfile and audit-log paths the same PR hardens). - pidfile O_EXCL: refactored pidfile-write into _write_pidfile_safely which uses O_EXCL to detect concurrent starts. EEXIST with a live pid means "another start in progress" — refuse with actionable message; EEXIST with a dead pid means "stale crash" — unlink and retry once. Discriminates rather than racing. - _VERSION_CACHE: invalidate on install_iron_proxy success; don't cache empty stdout (would poison `hermes egress status` for the lifetime of the process if first probe hit a corrupt binary). - ensure_audit_log now RAISES on OSError instead of swallowing it as a warning. Previous behavior let the daemon create the file under the default umask, exactly the world-readable scenario the helper was built to prevent. cmd_setup catches the new RuntimeError and surfaces "✗" with the actionable message. - SIGINT/SIGTERM handler scoped around the start_proxy poll loop: Ctrl-C while waiting for `hermes egress start` no longer leaks an orphan daemon with the port bound. Handler kills the child + unlinks the pidfile before re-raising. - pidfile written IMMEDIATELY after Popen, BEFORE the listening verification. Parent dying during the poll loop now leaves a pidfile pointing at the orphan so the next `hermes egress stop` can clean up. Failure paths in the poll loop explicitly unlink. - _DEFAULT_UPSTREAM_DENY_CIDRS: add ::ffff:0:0/96 (IPv4-mapped IPv6 — closes the v6-resolved IMDS bypass), 100.64.0.0/10 (CGNAT / cloud overlays / K8s pod networks), 198.18.0.0/15 (RFC2544 benchmark). - _NON_BEARER_PROVIDERS split into LLM-specific (Anthropic / Azure / Gemini — block when strict) vs generic-cloud (AWS_*, GCP appdefault — warn-only). - docker.py except narrowing: load_config can raise yaml.YAMLError on a malformed config.yaml, not just ImportError. Two callsites (collision check + precedence resolution) now catch yaml.YAMLError via a sentinel `import yaml` and fail-safe to enforced mode. GodsBoy 2nd-round P3: - _reset_for_tests: was a no-op claiming symmetry with bitwarden; now actually clears _VERSION_CACHE and _proxy_nonce so in-process callers (notebooks, pytest -p no:xdist) don't see state leakage. - tests/test_iron_proxy_cli.py: replaced hardcoded Path("/tmp/...") with hermes_home/-derived fixtures. Matches the same cleanup we did for test_iron_proxy.py in the previous round. - --rotate-tokens confirmation gate: when there are existing tokens, prompt for "rotate" confirmation (skipped when stdin isn't a tty so CI/scripted use still works) AND back up the mappings to a timestamped sibling before overwriting. Surface a no-op note when rotate is requested with no existing tokens. stephenschoettler (runtime-boundary review): - #1 BWS silent degrade at proxy start: when credential_source=bitwarden but the BWS access token or project_id is missing OR the fetch returns no values for mapped providers, raise instead of silently falling back to host env. cmd_start also pre-checks at the wizard layer for actionable error messages. Opt-in escape hatch via new `proxy.allow_env_fallback: true` config for migration scenarios. - #2 docker_env collision detection extended: `docker_env: {OPENROUTER_API_KEY: sk-real}` in config.yaml with enforce_on_docker: true now raises just like an HTTPS_PROXY collision would. The collision check pulls mapped provider names from load_mappings() at call time. - #3 PID nonce persisted to disk: cross-CLI-invocation stale-pidfile defense now works. start_proxy writes the nonce next to the pidfile (sibling 0o600), stop_proxy reads it back via _read_persisted_nonce() and uses it as a _pid_alive signal in the new process. Falls back to argv0 basename matching when the file is missing (legacy install). arshkumarsingh: - #1 NODE_OPTIONS append-merge: egress dict no longer sets NODE_OPTIONS directly (would clobber the operator's --max-old-space-size etc.). Carry the egress flag in a sentinel key _HERMES_EGRESS_NODE_OPTIONS_APPEND; DockerEnvironment merges into the existing NODE_OPTIONS in env_args computation with de-duplication. - #2 docs: structured per-request audit log is at audit.log, not iron-proxy.log (the latter is daemon stdout/stderr). Diagram and step-7 text corrected; both file roles are now documented separately. Tests - Added 12 new tests in test_iron_proxy.py covering bridge-IP rejection (parametrized over 8 dangerous inputs), default deny-list adjacency (IPv4-mapped-v6 + CGNAT), blocked-providers strict-subset property, _pid_proc_starttime parser with paren-containing comm, stop_proxy SIGKILL suppression on starttime drift, _reset_for_tests clear behavior, iron_proxy_version don't-cache-empty, NODE_OPTIONS sentinel verification, ensure_audit_log raise-on-OSError, and persisted-nonce roundtrip. - Added 1 new test in test_iron_proxy_cli.py covering cmd_start BWS-token-missing fail-loud. - All 100 tests in test_iron_proxy + test_iron_proxy_cli pass; all 78 tests in test_docker_environment + test_config still pass. Acknowledged but not addressed: - GodsBoy P3 dead-code `extra_env` kwarg: kept (removing is a breaking change for any out-of-tree caller; the kwarg is documented and works). - Residual risks GodsBoy called out: iron-proxy in-memory secret zeroisation (Go-binary territory, out of scope); _PROXY_SUBPROCESS_ENV _ALLOWLIST cosmetic gaps (RUST_LOG, GOMAXPROCS); follow-up.
1 parent 4833acf commit fa4e87b

7 files changed

Lines changed: 1086 additions & 105 deletions

File tree

0 commit comments

Comments
 (0)