Skip to content

Commit 906b1da

Browse files
committed
docs(egress): comprehensive expansion — setup, config, troubleshooting,
internals reference Pre-v3 the egress docs were 175 lines covering the basics: quick start, slash commands, security model, failure modes. After three rounds of PR review we added a half-dozen new config knobs, two new flags, a strict/warn tier split for uncovered providers, persisted-nonce cross-process defense, audit-log + log-file separation, NODE_OPTIONS append-merge, docker_env collision detection, etc. — none of which the user-facing doc reflected. This commit closes that gap end-to-end: website/docs/user-guide/egress/iron-proxy.md (175 → 567 lines) - Configuration section expanded with every new knob: fail_on_uncovered_providers, allow_env_fallback, upstream_deny_cidrs. - Tables for default allowed hosts + default deny CIDRs. - Bind policy section (loopback + docker bridge, NOT 0.0.0.0) with the operator-facing "why can't I hit the proxy from my LAN" answer. - Uncovered providers section with the strict tier (Anthropic / Azure / Gemini — block when fail_on_uncovered_providers=true) vs warn tier (AWS, GCP appdefault — present on every dev laptop, never block). - Bitwarden integration expanded: rotation semantics, fail-loud at start, the allow_env_fallback escape hatch, --no-bitwarden flag, the preserve-existing-source rule on plain re-setup. - Slash commands section with --no-bitwarden, --rotate-tokens, and the token-rotation operator playbook (confirmation gate, backup file naming, restart-required caveat). - State directory layout table covering all 9 files we create + their modes. - Audit log vs daemon log distinction (the arshkumarsingh #2 fix that motivated the corrected diagram). - CA distribution into the sandbox: full table of injected env vars, the Python/curl REPLACE vs Node ADD asymmetry caveat with the NODE_OPTIONS=--use-openssl-ca mitigation. - docker_env collision detection: what gets blocked, what gets warned, the migration escape hatch. - PID + nonce defense section explaining how iron-proxy.nonce works cross-CLI and the SIGKILL-suppress-on-recycle path. - Security model expanded with the new defenses (IPv4-mapped-v6 IMDS bypass closure, env-var leakage prevention, LAN-peer-with-token-leak coverage). - Failure modes extended for every new refuse-start path. - Troubleshooting section (180 new lines) with grep-friendly error matchers for each common failure: BWS token missing, uncovered provider refused, port collision, slow bind, 403 from proxy, SSL verification errors inside the sandbox, 401 from upstreams, address- in-use orphan recovery, per-request audit log inspection. website/docs/getting-started/quickstart.md - One-paragraph mention of the egress proxy under "Sandboxed terminal" so operators discover the feature when they enable Docker isolation. website/docs/reference/cli-commands.md - Top-level command table now lists `hermes egress` alongside `hermes proxy` (different purpose, different direction — call it out). - New `## hermes egress` section with full subcommand syntax, common flows (first-time setup, switching credential source, rotating tokens, adding upstream), and diagnostic shortcuts. website/docs/reference/environment-variables.md - New "Egress proxy (sandbox-injected)" section documenting every env var the Docker backend injects: HERMES_EGRESS_PROXY, HERMES_PROXY_TOKEN_<NAME>, HTTPS_PROXY/HTTP_PROXY/NO_PROXY, REQUESTS_CA_BUNDLE/SSL_CERT_FILE/CURL_CA_BUNDLE/NODE_EXTRA_CA_CERTS, NODE_OPTIONS append-merge, HERMES_IRON_PROXY_NONCE. - Also fixes a stale layout issue with the Persistent Shell table that had two trailing rows getting orphaned in the v3 commit. website/docs/developer-guide/egress-internals.md (NEW, 363 lines) - Module layout map (which file owns what). - Full lifecycle walkthrough for install / setup / start / stop with the actual function calls in order. - "Security invariants" section enumerating every load-bearing property with the regression test name that guards it. These are the rules contributors must preserve when touching the module: - filesystem perms (0o700 dir, 0o600 secrets, O_NOFOLLOW everywhere) - subprocess env minimisation (no os.environ.copy) - bind policy (loopback + docker bridge, never 0.0.0.0) - default deny CIDR coverage - audit log fail-loud - bitwarden fail-loud - docker_env collision detection - PID recycling defense - token preservation on re-setup - credential_source preservation - Extension points: adding a bearer-token provider, adding a non-bearer provider, wiring iron-proxy into a non-Docker backend, subscribing to per-request audit events. - Testing recipe (hermetic + E2E + CLI smoke). website/sidebars.ts - New `developer-guide/egress-internals` entry under Developer Guide → Internals (alongside acp-internals, cron-internals, trajectory-format). Build verification - `cd website && npm install && npx docusaurus build` succeeds locally. - All three new pages render to static HTML in all three locales (en + zh-Hans + ko). - No new broken links or broken anchors introduced (pre-existing warnings on translation stubs are unrelated).
1 parent fa4e87b commit 906b1da

6 files changed

Lines changed: 783 additions & 17 deletions

File tree

Lines changed: 295 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,295 @@
1+
---
2+
sidebar_position: 14
3+
title: "Egress proxy internals"
4+
description: "How the iron-proxy egress firewall integrates with Hermes — module layout, lifecycle, security invariants, and extension points"
5+
---
6+
7+
# Egress proxy internals
8+
9+
This page covers the architecture of the egress credential-injection firewall (`hermes egress` / iron-proxy) from a contributor / plugin author's perspective. End-user setup + usage docs live at [Egress proxy](../user-guide/egress/iron-proxy.md).
10+
11+
The threat model and high-level design are summarised on the user page; this page is about *how* it's wired, where the security-relevant code lives, and what invariants you have to preserve if you touch it.
12+
13+
## Module layout
14+
15+
```text
16+
agent/proxy_sources/iron_proxy.py Core: binary install, CA gen, config build,
17+
subprocess lifecycle, mappings I/O, PID/nonce
18+
defense. Pure-function surface where possible.
19+
20+
hermes_cli/proxy_cli.py Wizard + slash command handlers.
21+
`hermes egress {install,setup,start,stop,
22+
status,disable,config}`. Wires the
23+
core module into argparse.
24+
25+
hermes_cli/main.py:_dispatch_egress Top-level subparser dispatcher.
26+
dest='egress_command' (intentionally
27+
disjoint from the inbound OAuth
28+
`hermes proxy` subparser, which uses
29+
dest='proxy_command').
30+
31+
hermes_cli/config.py: proxy schema The `proxy:` block in DEFAULT_CONFIG.
32+
Adding a knob means: add it here, add a
33+
wizard prompt or `setdefault` in
34+
proxy_cli.cmd_setup, and document it
35+
in the user-guide page.
36+
37+
tools/environments/docker.py
38+
_egress_proxy_args_for_docker() Builds the volume_args / env_overrides /
39+
host_args triple that the Docker backend
40+
injects when `proxy.enabled: true`.
41+
42+
DockerEnvironment.__init__ Docker-side merge logic: collision
43+
detection against critical egress vars,
44+
NODE_OPTIONS append-merge via the
45+
_HERMES_EGRESS_NODE_OPTIONS_APPEND
46+
sentinel, enforce_on_docker precedence.
47+
48+
tests/test_iron_proxy.py Hermetic tests (~70). Binary install
49+
path, config build, mappings I/O,
50+
subprocess lifecycle, docker arg builder,
51+
deny CIDR defaults, bind policy, CA
52+
TOCTOU, ensure_audit_log behaviour, etc.
53+
54+
tests/test_iron_proxy_cli.py CLI handler unit tests (~20). Argparse
55+
wiring, fail-loud paths, BWS refresh
56+
wire-up, dest='egress_command'
57+
regression guard.
58+
59+
tests/test_iron_proxy_e2e.py Live E2E (gated on HERMES_RUN_E2E=1).
60+
Real iron-proxy binary, real curl,
61+
end-to-end token swap verified.
62+
```
63+
64+
## Lifecycle
65+
66+
```text
67+
hermes egress install
68+
-> agent.proxy_sources.iron_proxy.install_iron_proxy(force=...)
69+
Downloads pinned tarball + checksums.txt from GitHub Releases.
70+
SHA-256 verification before extraction.
71+
tarfile.extract(..., filter="data") on Python 3.12+ (PEP 706);
72+
falls back to plain extract on older Python with member-name
73+
sanitisation via _pick_tar_member.
74+
Stage into ~/.hermes/bin/.iron-proxy_XXXX, chmod 755, os.replace
75+
to ~/.hermes/bin/iron-proxy (atomic).
76+
_VERSION_CACHE.pop(target) so a forced reinstall re-probes
77+
--version on next call.
78+
79+
hermes egress setup [--from-bitwarden | --no-bitwarden] [--rotate-tokens]
80+
-> proxy_cli.cmd_setup
81+
Step 1. find_iron_proxy(install_if_missing=False) -> install if absent.
82+
Step 2. ensure_ca_cert()
83+
Run openssl genrsa + req via subprocess.
84+
Write CA key via os.open(O_WRONLY|O_CREAT|O_TRUNC|O_NOFOLLOW, 0o600)
85+
+ os.replace. Never exists on disk under default umask.
86+
Write CA cert with 0o644 (public).
87+
Step 3. discover_provider_mappings() or pull names from BWS via
88+
fetch_bitwarden_secrets() when --from-bitwarden.
89+
merge_mappings(existing=load_mappings(), discovered,
90+
rotate=args.rotate_tokens) preserves prior
91+
tokens unless --rotate-tokens is passed.
92+
discover_uncovered_providers() and surface warnings.
93+
Step 4. ensure_audit_log(audit_log_path) # raises on OSError
94+
build_proxy_config(...) with defaults applied at the call site
95+
(deny CIDRs default, bind policy from _default_http_listen).
96+
write_proxy_config(cfg) # atomic via .tmp + os.replace, 0o600
97+
write_mappings(mappings) # atomic, 0o600
98+
Step 5. proxy_cfg["enabled"] = True; credential_source preservation logic
99+
(do NOT silently downgrade bitwarden -> env on re-run);
100+
save_config(cfg).
101+
102+
hermes egress start
103+
-> proxy_cli.cmd_start
104+
Pre-checks (refuse-start path):
105+
- proxy.fail_on_uncovered_providers? -> discover_blocked_providers()
106+
- credential_source=bitwarden? -> pre-validate access_token_env + project_id
107+
-> iron_proxy.start_proxy(
108+
refresh_secrets_from_bitwarden=...,
109+
bitwarden_config=...,
110+
)
111+
existing=_read_pid(); if alive, idempotent return.
112+
_build_proxy_subprocess_env(...): ALLOWLIST + mapped real_env_names,
113+
strip HTTPS_PROXY/etc. to avoid recursion, optional BWS refresh
114+
(raises on missing values unless allow_env_fallback=true).
115+
Plant nonce: _proxy_nonce = sha256(urandom(16)); env[NONCE_ENV] = ...
116+
Open log_path via O_NOFOLLOW + 0o600 + st_uid check.
117+
Popen with stdin=DEVNULL, stdout=log_fd, stderr=STDOUT,
118+
start_new_session=True (POSIX).
119+
Close parent's log_fd in finally.
120+
_write_pidfile_safely(pidfile, proc.pid)
121+
O_EXCL + O_NOFOLLOW + uid check + persisted nonce sidecar.
122+
FileExistsError -> discriminate live vs stale, retry once if stale.
123+
Install SIGINT/SIGTERM handlers (main-thread only).
124+
Poll loop (do-while shape):
125+
while True:
126+
if proc.poll() is not None: tail log + unlink pidfile + raise
127+
if _port_listening("127.0.0.1", tunnel_port): break
128+
if time.time() >= deadline: break (do-while: checked AFTER first probe)
129+
time.sleep(0.1)
130+
If not listening at exit: _kill_and_wait(proc) + unlink pidfile + raise.
131+
132+
hermes egress stop
133+
-> iron_proxy.stop_proxy
134+
_read_pid + _pid_alive guard.
135+
starttime_before = _pid_proc_starttime(pid) # Linux only; None elsewhere
136+
os.kill(pid, SIGTERM)
137+
Wait up to 5s for graceful exit.
138+
After grace: re-check starttime + _pid_alive.
139+
If recycled (starttime drift OR _pid_alive False), DO NOT SIGKILL.
140+
Otherwise os.kill(pid, _KILL_SIGNAL).
141+
_cleanup_state_files: unlink pidfile + nonce sibling.
142+
```
143+
144+
## Security invariants
145+
146+
These are the load-bearing properties. If you touch the module, you must preserve them. Where there's a regression test, it's named.
147+
148+
### Filesystem perms
149+
150+
| Path | Mode | Test |
151+
|---|---|---|
152+
| `~/.hermes/proxy/` (dir) | `0o700` | `test_proxy_state_dir_is_0o700` |
153+
| `ca.key` | `0o600` | `test_ca_key_created_with_0o600` |
154+
| `ca.crt` | `0o644` | (implicit; chmod call in `ensure_ca_cert`) |
155+
| `proxy.yaml` | `0o600` | (chmod after atomic rename in `write_proxy_config`) |
156+
| `mappings.json` | `0o600` | (chmod after atomic rename in `write_mappings`) |
157+
| `iron-proxy.pid` | `0o600` | (`os.open(..., 0o600)` mode in `_write_pidfile_safely`) |
158+
| `iron-proxy.nonce` | `0o600` | (`os.open(..., 0o600)` mode in `_write_pidfile_safely`) |
159+
| `audit.log` | `0o600` | `test_ensure_audit_log_creates_with_0o600` |
160+
| `iron-proxy.log` | `0o600` | (`os.open(..., 0o600)` + `fchmod`) |
161+
162+
All write paths use `os.open(O_WRONLY | O_CREAT | O_NOFOLLOW, 0o600)` + `os.fstat().st_uid` check. `shutil.copy2` + `os.chmod` is forbidden because it leaks a default-umask window.
163+
164+
### Subprocess env minimisation
165+
166+
`_build_proxy_subprocess_env` MUST NOT use `os.environ.copy()`. The allowlist is `_PROXY_SUBPROCESS_ENV_ALLOWLIST` (PATH, HOME, locale, etc.) plus the env names referenced by `load_mappings()`. Everything else stays on the host.
167+
168+
Regression: `test_subprocess_env_strips_unrelated_secrets`, `test_subprocess_env_strips_proxy_recursion_vars`, `test_subprocess_env_keeps_infrastructure_vars`.
169+
170+
### Bind policy
171+
172+
`_default_http_listen` returns loopback + (Linux only) the docker bridge IP. Never `0.0.0.0`, never `:PORT` (INADDR_ANY).
173+
174+
`_detect_docker_bridge_ip` validates via `ipaddress.IPv4Address` and rejects `is_unspecified` / `is_loopback` / `is_multicast` / `is_reserved` / `is_link_local` / `is_global`. A hostile `ip` shim on PATH cannot inject `0.0.0.0`.
175+
176+
Regression: `test_default_bind_is_loopback_not_zero_zero`, `test_detect_docker_bridge_ip_rejects_dangerous` (parametrized over 8 attack inputs).
177+
178+
### Default deny CIDRs
179+
180+
`_DEFAULT_UPSTREAM_DENY_CIDRS` covers loopback (v4 + v6), link-local (incl. IMDS at 169.254.169.254 and the IPv4-mapped-v6 form), RFC1918, IPv6 ULA, CGNAT, and the RFC2544 benchmark range. `build_proxy_config(..., upstream_deny_cidrs=None)` MUST emit the default; only an explicit empty list opts out.
181+
182+
Regression: `test_default_deny_cidrs_present_when_unspecified`, `test_default_deny_includes_ipv4_mapped_v6`.
183+
184+
### Audit log fail-loud
185+
186+
`ensure_audit_log` raises `RuntimeError` on any `OSError`. Swallowing the failure would let the daemon create the file under the default umask, defeating the privacy promise. `cmd_setup` catches the RuntimeError and surfaces a clear error to the operator.
187+
188+
Regression: `test_ensure_audit_log_raises_on_immutable_parent`.
189+
190+
### Bitwarden mode fail-loud
191+
192+
When `credential_source: bitwarden` AND `proxy.allow_env_fallback: false` (default):
193+
- Missing access token env var -> `cmd_start` refuses.
194+
- Missing `project_id` -> `cmd_start` refuses.
195+
- `bws secret list` returns no values for one or more mapped providers -> `_build_proxy_subprocess_env` raises.
196+
197+
Falling back to host env in BW mode reintroduces exactly the staleness bug the BW path is meant to defeat.
198+
199+
Regression: `test_cmd_start_refuses_when_bitwarden_token_missing` (CLI layer); strict-mode assertions in `_build_proxy_subprocess_env` (daemon layer).
200+
201+
### docker_env collision detection
202+
203+
When `enforce_on_docker: true`, `docker_env` overrides on any of the egress-controlling vars (HTTPS_PROXY, SSL_CERT_FILE, NODE_EXTRA_CA_CERTS, etc.) OR any mapped `real_env_name` (OPENROUTER_API_KEY, etc.) raises `RuntimeError` BEFORE the container starts.
204+
205+
Regression: `test_docker_env_collision_with_proxy_raises_when_enforce`.
206+
207+
### PID recycling defense
208+
209+
`_pid_alive` MUST consult either the in-process `_proxy_nonce` (same-process case) OR the on-disk `iron-proxy.nonce` (cross-CLI case) before trusting an `argv[0]` basename match. `stop_proxy` MUST re-check `/proc/<pid>/stat` starttime before SIGKILL and suppress the signal on starttime drift.
210+
211+
Regression: `test_stop_proxy_suppresses_sigkill_on_pid_recycle`, `test_pid_proc_starttime_parses_comm_with_parens`, `test_persisted_nonce_roundtrip`.
212+
213+
### Token preservation on re-setup
214+
215+
`merge_mappings(existing, discovered, rotate=False)` MUST return prior tokens for providers that overlap. Re-running `hermes egress setup` cannot silently 401 running sandboxes. `--rotate-tokens` is the explicit opt-in.
216+
217+
Regression: `test_merge_mappings_preserves_existing_tokens`, `test_merge_mappings_rotate_mints_fresh_tokens`.
218+
219+
### `credential_source` preservation
220+
221+
`cmd_setup` MUST NOT downgrade `credential_source: bitwarden` to `env` on re-run without an explicit `--no-bitwarden` flag. Running `hermes egress setup` (no flag) preserves whatever was previously configured.
222+
223+
Tested via the `cmd_setup` flow in CLI tests (the bitwarden-preservation path is exercised when `--from-bitwarden` is followed by a plain `setup` re-run).
224+
225+
## Extension points
226+
227+
### Adding a new bearer-token provider
228+
229+
`_BEARER_PROVIDERS` in `iron_proxy.py` maps env var name -> tuple of upstream hosts. Adding an entry makes it discoverable by `discover_provider_mappings()`; the wizard mints a token for it automatically when the env var is present.
230+
231+
```python
232+
_BEARER_PROVIDERS: Dict[str, Tuple[str, ...]] = {
233+
...,
234+
"MY_PROVIDER_API_KEY": ("api.myprovider.com",),
235+
}
236+
```
237+
238+
Also update `_DEFAULT_ALLOWED_HOSTS` so the proxy allows the upstream by default. Run `test_discover_provider_mappings_*` to confirm.
239+
240+
### Adding a new non-bearer provider
241+
242+
If the provider uses `x-api-key` / SigV4 / OAuth-from-SDK / etc., iron-proxy's `secrets` transform cannot swap it. Add the env var to `_NON_BEARER_PROVIDERS` so the wizard warns about it. If the provider is LLM-specific enough that you want `fail_on_uncovered_providers: true` to actually block it, also add to `_LLM_SPECIFIC_NON_BEARER_PROVIDERS`.
243+
244+
```python
245+
_NON_BEARER_PROVIDERS: Tuple[str, ...] = (
246+
...,
247+
"MY_X_API_KEY_PROVIDER",
248+
)
249+
250+
_LLM_SPECIFIC_NON_BEARER_PROVIDERS: Tuple[str, ...] = (
251+
...,
252+
"MY_X_API_KEY_PROVIDER",
253+
)
254+
```
255+
256+
### Wiring iron-proxy into a non-Docker backend
257+
258+
`_egress_proxy_args_for_docker` is Docker-specific. Backends that want similar wiring need their own analogue that:
259+
260+
1. Reads `load_config().get("proxy", {})`; returns empty args if `enabled` is false.
261+
2. Calls `iron_proxy.get_status()`; surfaces `enforce` semantics on `configured` / `pid` / `listening` / `ca_cert_path` failure paths.
262+
3. Calls `iron_proxy.load_mappings()`; refuses to mount if empty AND `enforce_on_docker: true`.
263+
4. Sets the seven env vars (HTTPS_PROXY, NO_PROXY, REQUESTS_CA_BUNDLE, SSL_CERT_FILE, CURL_CA_BUNDLE, NODE_EXTRA_CA_CERTS, HERMES_EGRESS_PROXY) and the per-mapping `HERMES_PROXY_TOKEN_<NAME>` vars.
264+
5. Distributes the CA cert into the sandbox at a path the runtime will trust (typically `/etc/ssl/certs/hermes-egress-ca.crt`).
265+
6. Implements collision detection against the user's backend-specific env config.
266+
267+
The Docker implementation is ~150 lines; expect similar volume for Modal / Daytona / SSH.
268+
269+
### Subscribing to per-request audit events
270+
271+
iron-proxy writes line-delimited JSON to `~/.hermes/proxy/audit.log`. A plugin / external watcher can tail the file and react to allowlist denials, secret swaps, or upstream errors. The schema is documented at [docs.iron.sh/audit](https://docs.iron.sh/audit) (link).
272+
273+
## Testing
274+
275+
```bash
276+
# Hermetic suite (no network, no real binary)
277+
scripts/run_tests.sh tests/test_iron_proxy.py tests/test_iron_proxy_cli.py
278+
279+
# Live E2E (real binary, real curl, real CONNECT tunnel)
280+
HERMES_RUN_E2E=1 scripts/run_tests.sh tests/test_iron_proxy_e2e.py
281+
282+
# Live PTY smoke against `hermes egress`
283+
HERMES_HOME=/tmp/hermes-egress-test python3 -m hermes_cli.main egress --help
284+
HERMES_HOME=/tmp/hermes-egress-test python3 -m hermes_cli.main egress setup --help
285+
```
286+
287+
The CLI uses argparse, so `--help` is a good first probe for "did my new flag register correctly".
288+
289+
## See also
290+
291+
- User-facing setup + troubleshooting: [Egress proxy](../user-guide/egress/iron-proxy.md)
292+
- Docker backend internals: [Docker](../user-guide/docker.md)
293+
- Bitwarden Secrets Manager integration: [`hermes secrets bitwarden`](../user-guide/secrets/bitwarden.md)
294+
- CLI command reference: [`hermes egress`](../reference/cli-commands.md#hermes-egress)
295+
- Sandbox-injected environment variables: [Egress proxy (sandbox-injected)](../reference/environment-variables.md#egress-proxy-sandbox-injected)

website/docs/getting-started/quickstart.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,8 @@ hermes config set terminal.backend docker # Docker isolation
256256
hermes config set terminal.backend ssh # Remote server
257257
```
258258

259+
For Docker sandboxes, you can also enable the **egress credential-injection proxy** so the sandbox never sees your real API keys — only opaque proxy tokens that work exclusively from behind a local TLS-intercepting daemon. See [Egress proxy](../user-guide/egress/iron-proxy.md). Setup is `hermes egress setup && hermes egress start`; the Docker backend wires everything up automatically once `proxy.enabled` flips on.
260+
259261
### Voice mode
260262

261263
```bash

0 commit comments

Comments
 (0)