feat(egress): CA rotation (hermes egress rotate-ca) — follow-up to #30179#35188
Open
Bartok9 wants to merge 1 commit into
Open
feat(egress): CA rotation (hermes egress rotate-ca) — follow-up to #30179#35188Bartok9 wants to merge 1 commit into
hermes egress rotate-ca) — follow-up to #30179#35188Bartok9 wants to merge 1 commit into
Conversation
Closes the v1 scope cut in PR NousResearch#30179 — "the CA is a 10-year self-signed cert. Rotation is manual for now." `hermes egress rotate-ca`: 1. Archives the live ca.crt to ~/.hermes/proxy/ca-archive/ca-<stamp>.crt BEFORE replacing it, so a crash mid-rotation never leaves a missing live cert. Archive dir keeps the 5 most recent; older pruned. 2. Mints a fresh CA + key via ensure_ca_cert(force=True) — same atomic 0o600-key + os.replace path as first-boot generation. 3. If the daemon is running and --no-restart isn't set: stop_proxy() then start_proxy(refresh_secrets_from_bitwarden=..., bitwarden_config=...) so the live proxy picks up the new signing key AND keeps honouring the operator's existing credential-source choice. 4. Appends a structured record to ~/.hermes/proxy/rotation-history.jsonl (ts / old+new SHA-256 fingerprints / reason / operator / subject / valid_until). Flags: --dry-run preview (new subject, archive path, sandboxes to restart) with no file changes --reason free-text reason recorded in rotation-history.jsonl --no-restart mint + archive but leave the daemon on the old key (staged rollouts) --force skip the running-sandbox confirmation prompt The companion `ca-rotation` doctor check that warns on rotation age and a `verify-trust` subcommand that inspects sandbox CAs are documented as planned follow-ups — they land once the broader `doctor` / `audit` work (currently on a separate stacked branch as a different PR) merges upstream. Stdlib-only. No new dependencies. Bugbot review fixes baked in (caught on the original fork PR, retargeted to upstream here as one clean commit): * Daemon restart after rotation now propagates the operator's credential-source choice (refresh_secrets_from_bitwarden, bitwarden_config) from config.yaml through rotate_ca to start_proxy. A bare start_proxy() call would have silently degraded a Bitwarden-mode proxy to "running but unable to inject credentials" — caught by Cursor Bugbot (medium severity). Regression test added: test_rotate_passes_bitwarden_args_through_to_start_proxy. * The duplicated _ca_not_before / _ca_not_after pair that Bugbot flagged (low severity / refactor) is resolved by simple elision — _ca_not_before was only used by the companion doctor check that's deferred to the doctor-stack follow-up, so it doesn't ship here at all. _ca_not_after stays as the single openssl-shelling date helper. Validation: 109 passed, 1 skipped in 3.88 s (tests/test_iron_proxy*.py — full iron-proxy suite, the skip is the E2E test that needs HERMES_RUN_E2E=1) 8/8 in tests/test_iron_proxy_rotate_ca.py specifically: - rotate with no daemon: writes + archives, no stop/start called - rotate with running daemon: stop_proxy then start_proxy in order - --dry-run: zero mtime changes (filesystem stat-before/after) - --reason recorded with valid 64-hex fingerprints (old != new) - archive pruning: 7 archives in → only the 5 most recent retained - --no-restart: daemon left alone (no stop/start calls) - bitwarden args propagated through rotate_ca to start_proxy (regression test for Bugbot finding) - fingerprint redaction: failures in fingerprint reads never echo a full SHA-256 in error messages — first-8/last-8 only Author: Bartok9 (Daniel Pike), opened in response to Teknium's invitation to @catalinmpit for a security review of NousResearch#30179. Complements NousResearch#30179 by turning the manual openssl-by-hand rotation step into a first-class audited command.
Contributor
Author
Polish — improved description bullets• |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this complements #30179
#30179 explicitly cut one corner in its Scope cuts (v1) section:
This PR closes that gap: a first-class, audited rotation workflow so operators never have to hand-run
openssl genrsa, and the recovery path is exercised in CI instead of being first attempted under pressure.A companion host-hardening survey PR (
hermes egress harden) is open separately against the same base.What lands
hermes egress rotate-ca:ca.crtto~/.hermes/proxy/ca-archive/ca-<YYYYMMDD-HHMMSS>.crtbefore touching the live cert. Archive dir keeps the 5 most recent; older ones are pruned.ensure_ca_cert(force=True)— the same atomic0o600-key write +os.replacepath as first-boot generation.--no-restartisn't set and the daemon is alive:stop_proxy()thenstart_proxy(refresh_secrets_from_bitwarden=..., bitwarden_config=...)so the live proxy picks up the new signing key and keeps honouring the operator's existing credential-source choice.~/.hermes/proxy/rotation-history.jsonl:{"ts": "2026-05-30T05:32:00Z", "old_fingerprint_sha256": "...", "new_fingerprint_sha256": "...", "reason": "annual rotation", "operator": "alice", "subject": "/CN=hermes iron-proxy CA", "valid_until": "2036-05-28T05:32:00Z"}Flags:
--dry-run--reason TEXTrotation-history.jsonl(defaults tonull).--no-restart--forcehermes egress rotate-calists labeledhermes.sandbox=truecontainers and prompts by default when any are running — containers mountca.crtat create time, so a sandbox started before the rotation keeps trusting the old CA until recreated.New surfaces
hermes egress rotate-ca--dry-run,--reason,--no-restart,--forcerotate_ca()/plan_ca_rotation()__all__RotationPlanlast_rotation_entry()/list_hermes_sandboxes()~/.hermes/proxy/rotation-history.jsonl0o644)~/.hermes/proxy/ca-archive/0o700, last-5 retentionFailure modes considered
list_hermes_sandboxes()returns[], never raises_redact_fingerprint; rotation still completes (recordsnullfingerprint)os.replace)rotate-calists them and prompts by default (bypass with--force); console output tells the operator to restart them--dry-runplan_ca_rotation()is read-only; verified by stat-before/after in tests--no-restartrotation-history.jsonllinelast_rotation_entry()scans from the end, skips unparseable linesBugbot review fixes baked in
🟡 Medium severity — daemon restart silently degraded Bitwarden-mode proxies
The original implementation called
start_proxy()bare in the rotation restart path. Sincestart_proxydefaultsrefresh_secrets_from_bitwarden=Falseandbitwarden_config=None, an operator oncredential_source: bitwardenwould have come back up with a proxy that's running but unable to inject credentials — the worst kind of silent degradation. Caught by Cursor Bugbot on the original cross-fork iteration.Fix:
rotate_ca()now acceptsrefresh_secrets_from_bitwardenandbitwarden_configkwargs and forwards them tostart_proxy().cmd_rotate_cacomputes these fromproxy_cfgidentically tocmd_start(credential_source: bitwarden+secrets.bitwarden.enabled→ propagate). Regression test added:test_rotate_passes_bitwarden_args_through_to_start_proxy.🟢 Low severity — duplicated
_ca_not_before/_ca_not_afterBugbot flagged a ~25-line duplication between two openssl-shelling date helpers. Resolved by simple elision —
_ca_not_beforewas only used by the companion doctor check that's deferred to a follow-up PR (it needs aDoctorChecktype from a separate stack)._ca_not_afterstays as the single date helper in this PR.Validation
The 1 skip is the existing E2E test gated behind
HERMES_RUN_E2E=1(unchanged).feat/iron-proxy)tests/test_iron_proxy_rotate_ca.py)8/8 in the new test file:
stop_proxy/start_proxycalledstop_proxythenstart_proxy, in order--dry-run: zero mtime changes (filesystem stat-before/after)--reasonrecorded with valid 64-hex fingerprints (old ≠ new)--no-restart: daemon left alone (no stop/start calls)rotate_catostart_proxyManual end-to-end smoke (isolated
HERMES_HOME):--dry-runchanged nothing; realrotate-ca --reason "annual rotation"archived the old cert, minted a 10-year CA, and wrote a parseable history line.Coverage gaps
verify-trustsubcommand (verifying live sandboxes still trust the new CA after restart). Deliberately deferred — documented as a known follow-up in the user guide.rotate_caengine is fully tested, the prompt is a thin CLI wrapper).ca-rotationdoctor check that warns on rotation age depends on aDoctorChecktype that lives on a separate stacked branch (broaderdoctor/auditwork). Lands once that work merges upstream.Ambiguity flags
refresh_secrets_from_bitwarden,bitwarden_config) torotate_ca()rather than introducing aRotationOptionsdataclass. Two args, mirrorsstart_proxysignature, keeps the call site readable. If you want a single options object, that's a refactor we can do here or later._CA_ARCHIVE_KEEPconstant.input(), which means non-TTY callers (CI, scripts) have to pass--force. Documented in the help text. Open to flipping to "default proceed,--promptto ask" if that fits the project conventions better.Attribution
Opened by Bartok9 (Daniel Pike) at Daniel's request, in response to Teknium's invitation on X to @catalinmpit for a security review of #30179. Teknium's prompt was essentially "can we get a security review of how Hermes' egress proxy holds up in a real deployment?" — and this PR turns the manual
opensslrotation step #30179 mentions into a first-class audited command.