Skip to content

Commit 905ce58

Browse files
committed
fix(egress): align proxy.yaml with iron-proxy v0.39 actual schema +
propagate handler exit codes Live testing the full wizard against the real v0.39.0 binary (downloaded + extracted via our own install_iron_proxy()) surfaced three real bugs that the unit tests couldn't catch: 1. `proxy.http_listens` (plural) — NOT a field in v0.39's config struct. Our code emitted both `http_listen` (string) and `http_listens` (list) believing v0.39 accepts both forms. The binary actually rejects with "field http_listens not found in type config.Proxy" at YAML unmarshal time, so the daemon fails to start. Confirmed via strings(1) audit of the v0.39 binary — only `http_listen` is tagged. 2. `log.audit_path` — NOT a field in v0.39's config.Log struct. Same class of error: "field audit_path not found in type config.Log". Per-request audit-log records are not separable from server-level logs at this binary version. 3. `metrics.listen` defaults to ":9090" — which is the SAME port as our default `tunnel_port: 9090`. Result: every operator who runs `hermes egress setup` followed by `hermes egress start` gets "bind: address already in use" because the proxy listener and the metrics listener fight for port 9090. We now explicitly pin `metrics.listen: 127.0.0.1:0` to give it an ephemeral loopback port that can never collide with tunnel_port regardless of what operator sets. Plus a fourth bug — pre-existing but surfaced by the egress live test — that affects every Hermes subcommand: 4. `hermes_cli/main.py` calls `args.func(args)` at the bottom of main() but discards the return value. Every subcommand handler that returns a non-zero exit code (cmd_start refusing because `fail_on_uncovered_providers=true`, cmd_setup refusing because --from-bitwarden but BWS unreachable, etc.) was silently exiting 0. Fix: capture the handler's return value and `sys.exit(rc)` when it's a non-zero int. Other subcommands' contracts unchanged because they either return 0/None or don't return at all. Validation: - 188/188 in test_iron_proxy.py + test_iron_proxy_cli.py + test_config.py pass post-fix. - 5333/5337 in tests/hermes_cli/ pass; the 4 unrelated failures (test_managed_installs.py + test_update_hangup_protection.py) are pre-existing on main, not touched by this PR. - Manual wizard run end-to-end with the v0.39.0 binary in an isolated HERMES_HOME: * `egress install` — downloads + SHA-256 verifies + extracts * `egress setup` — generates CA, mints tokens, writes proxy.yaml that the binary now accepts (no http_listens, no audit_path, metrics pinned to 127.0.0.1:0) * `egress start` — daemon binds 127.0.0.1:9090, listens=yes * `egress status` — shows pid + listening + mappings * `egress stop` — clean shutdown, pidfile + nonce removed * Idempotent re-start returns the running pid without spawning * curl through the proxy with the openrouter token gets forwarded; an attacker host gets HTTP 403 (allowlist works); 169.254.169.254 gets HTTP 403 (deny CIDR works) * Refuse-start paths exit 1 with actionable messages: - `fail_on_uncovered_providers=true` + ANTHROPIC_API_KEY set - `credential_source=bitwarden` + BWS_ACCESS_TOKEN unset * `--rotate-tokens` confirmation gate fires via pty: typing 'cancel' aborts; typing 'rotate' proceeds and creates a mappings.json.rotated-<timestamp> backup Test updates: - `test_default_bind_is_loopback_not_zero_zero` — asserts the singular `http_listen` is loopback AND asserts `http_listens` (plural) is NOT in the rendered yaml. - `test_default_bind_uses_loopback_on_linux` — replaces `test_default_bind_includes_docker_bridge_on_linux`. v0.39 only supports one bind per daemon process, so the docker bridge augmentation is dropped from the rendered config; sandboxes reach the daemon via host.docker.internal -> host-gateway mapping, so loopback-only is functional. - `test_metrics_listener_pinned_to_loopback_ephemeral` — new regression test asserting `metrics.listen == "127.0.0.1:0"`. - `test_audit_log_kwarg_does_not_inject_audit_path_v039` — replaces `test_audit_log_path_lands_in_yaml`. audit_log kwarg is still accepted for forward compatibility but does NOT emit log.audit_path until upstream supports it.
1 parent ec108c6 commit 905ce58

3 files changed

Lines changed: 100 additions & 33 deletions

File tree

agent/proxy_sources/iron_proxy.py

Lines changed: 38 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -851,19 +851,32 @@ def build_proxy_config(
851851
else:
852852
deny_cidrs = list(upstream_deny_cidrs)
853853

854-
# Listen addresses. Single canonical "http_listen" for backward compat
855-
# plus a "http_listens" list (iron-proxy v0.39 accepts both; v0.40+ is
856-
# listen-list-only). We always emit both forms so a binary version
857-
# bump can't silently regress the bind policy.
854+
# Listen address. iron-proxy v0.39 takes a single string at
855+
# ``proxy.http_listen`` — there is no plural ``http_listens`` field,
856+
# despite earlier drafts of this module claiming v0.39 accepts both.
857+
# An empirical strings(1) audit + a live "start the binary and
858+
# observe the YAML unmarshal error" confirms the singular form is
859+
# the only one the binary accepts. We bind loopback by default; the
860+
# docker bridge listener is added as a second binary invocation if
861+
# we ever need multi-bind, but a single bind is what v0.39 supports.
858862
listens = list(http_listen) if http_listen else _default_http_listen(tunnel_port)
859863
primary_listen = listens[0] if listens else f"127.0.0.1:{tunnel_port}"
860864

861865
log_block: Dict = {"level": "info"}
862-
if audit_log is not None:
863-
# Wire the operator-requested audit-log path into the binary's log
864-
# config. iron-proxy reads ``log.audit_path``; setting it routes
865-
# per-request records there (separately from server-level logs).
866-
log_block["audit_path"] = str(audit_log)
866+
# NOTE: ``log.audit_path`` is NOT a field in iron-proxy v0.39's
867+
# ``config.Log`` struct — the binary rejects it with
868+
# ``field audit_path not found in type config.Log``. Per-request
869+
# audit records are written to the same log destination as
870+
# everything else at this binary version; the operator-facing
871+
# ``audit.log`` file we pre-create is still useful as a sentinel
872+
# for monitoring (logrotate target, downstream tail watchers) but
873+
# the daemon does not write to it directly. The kwarg is kept so
874+
# we're forward-compatible with a future v0.40+ that adds the
875+
# field; if you upgrade _IRON_PROXY_VERSION and the upstream gains
876+
# ``log.audit_path``, re-enable the line below.
877+
# if audit_log is not None:
878+
# log_block["audit_path"] = str(audit_log)
879+
_ = audit_log # consumed by ensure_audit_log() / docs only on v0.39
867880

868881
return {
869882
# DNS section is required by the binary's config parser, but we run
@@ -878,12 +891,11 @@ def build_proxy_config(
878891
# http_listen is the HTTP-proxy listener that handles both plain
879892
# HTTP forwards AND CONNECT tunnels for HTTPS. Sandboxes set
880893
# `HTTPS_PROXY=http://host:tunnel_port` and the same listener
881-
# serves both protocols. We bind loopback + the docker bridge
882-
# gateway (Linux) — NOT 0.0.0.0. LAN peers with a leaked
883-
# sandbox token would otherwise be able to spend the operator's
884-
# API quota against any allowlisted upstream.
894+
# serves both protocols. We bind loopback by default — NOT
895+
# 0.0.0.0. LAN peers with a leaked sandbox token would
896+
# otherwise be able to spend the operator's API quota against
897+
# any allowlisted upstream.
885898
"http_listen": primary_listen,
886-
"http_listens": listens,
887899
# The HTTPS-listener (direct TLS termination, no CONNECT) and
888900
# the SOCKS5/CONNECT-only tunnel listener get loopback ephemeral
889901
# ports — we don't expose them.
@@ -896,6 +908,18 @@ def build_proxy_config(
896908
# default. An empty list opts out entirely.
897909
"upstream_deny_cidrs": deny_cidrs,
898910
},
911+
# iron-proxy v0.39 starts a Prometheus-style metrics server by
912+
# default on ``:9090`` — which is the SAME port as our default
913+
# ``tunnel_port: 9090``, causing a guaranteed bind collision on
914+
# startup. Pin the metrics listener to an ephemeral loopback
915+
# port (``127.0.0.1:0``) so the metrics binding can't collide
916+
# with the proxy listener regardless of what tunnel_port the
917+
# operator chose. The daemon still emits metrics for any local
918+
# scraper that knows where to look (via ``hermes egress
919+
# status``); we just don't expose them to the public default.
920+
"metrics": {
921+
"listen": "127.0.0.1:0",
922+
},
899923
"tls": {
900924
"ca_cert": str(ca_cert),
901925
"ca_key": str(ca_key),

hermes_cli/main.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14001,9 +14001,15 @@ def cmd_acp(args):
1400114001
cmd_chat(args)
1400214002
return
1400314003

14004-
# Execute the command
14004+
# Execute the command. Propagate the handler's return code as the
14005+
# process exit code so subcommands that signal failure (e.g.
14006+
# ``hermes egress start`` refusing because of fail_on_uncovered_
14007+
# providers) actually exit non-zero. Handlers that return None
14008+
# are treated as success (exit 0).
1400514009
if hasattr(args, "func"):
14006-
args.func(args)
14010+
rc = args.func(args)
14011+
if isinstance(rc, int) and rc != 0:
14012+
sys.exit(rc)
1400714013
else:
1400814014
parser.print_help()
1400914015

tests/test_iron_proxy.py

Lines changed: 54 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -213,8 +213,7 @@ def test_wizard_rendered_yaml_contains_deny_list(hermes_home, tmp_path):
213213

214214
def test_default_bind_is_loopback_not_zero_zero(tmp_path):
215215
"""``http_listen`` must NOT be ``0.0.0.0:PORT`` or ``:PORT`` (latter is
216-
INADDR_ANY). Loopback only by default; the docker bridge bind is
217-
optional and added in addition, never instead."""
216+
INADDR_ANY). Loopback only by default."""
218217

219218
cfg = ip.build_proxy_config(
220219
mappings=[_sample_mapping()],
@@ -224,20 +223,28 @@ def test_default_bind_is_loopback_not_zero_zero(tmp_path):
224223
http_listen=["127.0.0.1:12345"], # explicit so test is deterministic
225224
)
226225
primary = cfg["proxy"]["http_listen"]
227-
listens = cfg["proxy"]["http_listens"]
228226
assert primary == "127.0.0.1:12345"
229-
assert listens == ["127.0.0.1:12345"]
230227
# Sentinel: confirm we didn't accidentally serialize a bare-port form
231-
# like ":12345" anywhere in the listen list (that's INADDR_ANY).
232-
for entry in listens:
233-
assert not entry.startswith(":")
234-
assert "0.0.0.0" not in entry
228+
# like ":12345" (that's INADDR_ANY).
229+
assert not primary.startswith(":")
230+
assert "0.0.0.0" not in primary
231+
# iron-proxy v0.39 doesn't support http_listens (plural). We
232+
# deliberately do NOT emit that key — re-emitting it would cause
233+
# the daemon to fail YAML unmarshal at start time.
234+
assert "http_listens" not in cfg["proxy"], (
235+
"iron-proxy v0.39 rejects http_listens (plural); only the "
236+
"singular http_listen string is accepted by the binary"
237+
)
235238

236239

237-
def test_default_bind_includes_docker_bridge_on_linux(tmp_path, monkeypatch):
238-
"""When http_listen isn't passed AND we're on Linux AND a docker
239-
bridge IP is detected, we should bind that bridge IP in addition to
240-
loopback so containers reach the proxy via host-gateway."""
240+
def test_default_bind_uses_loopback_on_linux(tmp_path, monkeypatch):
241+
"""When http_listen isn't passed AND we're on Linux, the singular
242+
http_listen field is the loopback bind. iron-proxy v0.39 only
243+
supports one bind per daemon process — earlier versions of this
244+
code emitted a plural http_listens list with the docker bridge
245+
appended, but v0.39 rejects that key so we keep loopback only.
246+
Sandboxes still reach the daemon via host.docker.internal which
247+
Docker maps to the host gateway, so loopback-only is functional."""
241248

242249
monkeypatch.setattr(ip.platform, "system", lambda: "Linux")
243250
monkeypatch.setattr(ip, "_detect_docker_bridge_ip", lambda: "172.17.0.1")
@@ -247,23 +254,53 @@ def test_default_bind_includes_docker_bridge_on_linux(tmp_path, monkeypatch):
247254
ca_key=tmp_path / "ca.key",
248255
tunnel_port=9090,
249256
)
250-
assert "127.0.0.1:9090" in cfg["proxy"]["http_listens"]
251-
assert "172.17.0.1:9090" in cfg["proxy"]["http_listens"]
257+
# Primary http_listen is loopback.
258+
assert cfg["proxy"]["http_listen"] == "127.0.0.1:9090"
259+
# No http_listens (plural) — v0.39 rejects that key.
260+
assert "http_listens" not in cfg["proxy"]
261+
262+
263+
def test_metrics_listener_pinned_to_loopback_ephemeral(tmp_path):
264+
"""iron-proxy v0.39's default metrics_listen is ``:9090``, which
265+
collides with our default tunnel_port=9090. build_proxy_config MUST
266+
explicitly pin metrics.listen to ``127.0.0.1:0`` so the bind
267+
collision can never happen at start time."""
268+
269+
cfg = ip.build_proxy_config(
270+
mappings=[_sample_mapping()],
271+
ca_cert=tmp_path / "ca.crt",
272+
ca_key=tmp_path / "ca.key",
273+
tunnel_port=9090,
274+
)
275+
assert cfg["metrics"]["listen"] == "127.0.0.1:0"
252276

253277

254278
# ---------------------------------------------------------------------------
255-
# audit_log wiring (regression: parameter was accepted but never used)
279+
# audit_log file pre-creation (parameter still accepted; v0.39 doesn't
280+
# wire it into the binary config but ensure_audit_log() still creates
281+
# the file at 0o600 as a logrotate / monitoring sentinel)
256282
# ---------------------------------------------------------------------------
257283

258284

259-
def test_audit_log_path_lands_in_yaml(tmp_path):
285+
def test_audit_log_kwarg_does_not_inject_audit_path_v039(tmp_path):
286+
"""v0.39 of iron-proxy rejects ``log.audit_path`` (not a struct
287+
field). build_proxy_config still accepts the audit_log kwarg for
288+
forward compatibility but MUST NOT emit it into the rendered yaml
289+
until the upstream binary supports it. See the kwarg's docstring
290+
for the upgrade path."""
291+
260292
cfg = ip.build_proxy_config(
261293
mappings=[_sample_mapping()],
262294
ca_cert=tmp_path / "ca.crt",
263295
ca_key=tmp_path / "ca.key",
264296
audit_log=tmp_path / "audit.log",
265297
)
266-
assert cfg["log"]["audit_path"] == str(tmp_path / "audit.log")
298+
assert "audit_path" not in cfg["log"], (
299+
"iron-proxy v0.39 has no log.audit_path field; emitting it "
300+
"causes 'field audit_path not found in type config.Log' at "
301+
"daemon start. ensure_audit_log() still creates the file as "
302+
"an operator-facing logrotate target."
303+
)
267304

268305

269306
def test_audit_log_omitted_when_caller_passes_none(tmp_path):

0 commit comments

Comments
 (0)