Install GitHub App PEM at boot from AWS Secrets Manager by vigo-agent[bot] · Pull Request #7 · AmbulnzLLC/hermes-agent

vigo-agent · 2026-05-20T14:02:26Z

What & why

Wires the AmbulnzLLC pilot's GitHub App private key into the container at boot so Hermes's built-in GitHubAuth._try_github_app() (in tools/skills_hub.py) can mint installation tokens automatically. With this in place, outbound git / GitHub API auth — hermes skills tap add AmbulnzLLC/<repo>, gh api, raw git clone https://x-access-token:${TOKEN}@github.com/... — Just Works inside a pilot pod, no per-session setup.

Companion to PR #6 (the ambulnz-github-app-auth skill, which documents the manual path for ad-hoc agent sessions). This PR does the provisioning half: opt-in via env var, no code changes elsewhere.

How it works

Operator sets GITHUB_APP_PEM_SECRET_ID (and optional GITHUB_APP_PEM_DEST_PATH, default /opt/data/secrets/github-app.pem) on the pod.
docker/entrypoint.sh (root section, before the gosu hermes drop) invokes docker/install_github_app_pem.py via the venv's Python.
The script fetches the named secret from AWS Secrets Manager, extracts PRIVATE_KEY from the JSON envelope, normalizes any literal \n two-byte sequences to real newlines (known secret-format quirk), validates the PEM header, and atomically writes to the destination at mode 0400 owned by hermes:hermes.
Operator separately sets GITHUB_APP_ID, GITHUB_APP_INSTALLATION_ID, and GITHUB_APP_PRIVATE_KEY_PATH (pointing at the file we just wrote). Hermes's existing skills-hub auth does the rest — JWT mint, install-token exchange, 3500s caching — all unchanged.

Contract

GITHUB_APP_PEM_SECRET_ID unset → silent no-op. Generic / non-pilot deploys are not burdened. Same shape as seed_admin_config.py and HERMES_AUTH_JSON_BOOTSTRAP.
GITHUB_APP_PEM_SECRET_ID set → install is REQUIRED. Any failure (boto3 missing, AWS API error, malformed JSON, missing PRIVATE_KEY field, empty PRIVATE_KEY, non-PEM garbage, write failure) returns non-zero, set -e in the entrypoint crashes the boot. Loud > silent.
Idempotent. Re-running with the same secret content skips the rewrite (mtime-stable). Mode/owner are still re-asserted to correct any drift.
Atomic write. Tmp file + os.replace so a crash mid-write leaves the previous PEM intact.

Threat model

PEM is plaintext at rest. An attacker with shell as hermes already has the pod's IRSA / instance role and can call secretsmanager:GetSecretValue directly, so chmod 0400 + per-pod filesystem isolation is sufficient for the pilot. Stronger protection (KMS-encrypted on-disk store, in-memory-only key) is out of scope here and would require corresponding code changes in _try_github_app().

Files

File	Purpose
`docker/install_github_app_pem.py`	New. Standalone install script. Mirrors the `seed_admin_config.py` design (env-driven, opt-in, crash-loud, silent no-op when unset).
`docker/entrypoint.sh`	+22 lines after the SOUL.md block in the root section. Calls the script via `${INSTALL_DIR}/.venv/bin/python3` because the venv is not yet activated and `boto3` lives in the `bedrock` extra.
`tests/scripts/test_install_github_app_pem.py`	New. 13 tests: silent no-op (unset, empty), happy path (install + 0400 + PEM trailer normalization), `\n` literal normalization, idempotency, region propagation from `AWS_REGION`, and every crash-loud failure mode (AWS error, missing `SecretString`, invalid JSON, missing `PRIVATE_KEY`, empty `PRIVATE_KEY`, non-PEM, non-object payload).

Verification

Unit tests: pytest tests/scripts/test_install_github_app_pem.py → 13 passed.
Shell syntax: bash -n docker/entrypoint.sh clean.
Live smoke in a Vigo dev pod against the real vigo-github-app-key secret in us-west-2:
- First run: fetches secret → writes 1679-byte PEM at 0o400 hermes:hermes → cryptography.serialization.load_pem_private_key parses it cleanly as a 2048-bit RSA private key.
- Second run: detects matching contents, skips the write (mtime unchanged), re-asserts mode/owner.
- Hermes's _try_github_app() mints a ghs_… installation token from the on-disk PEM.

Operator runbook (AmbulnzLLC pilot)

Set on the pod:

GITHUB_APP_PEM_SECRET_ID=arn:aws:secretsmanager:us-west-2:<acct>:secret:vigo-github-app-key-XXXXXX
GITHUB_APP_ID=3780171
GITHUB_APP_INSTALLATION_ID=134007007
GITHUB_APP_PRIVATE_KEY_PATH=/opt/data/secrets/github-app.pem
AWS_REGION=us-west-2

(Optional override: GITHUB_APP_PEM_DEST_PATH if you want the PEM somewhere other than the default.)

The pod's IRSA role needs secretsmanager:GetSecretValue on the specific secret ARN — that's it. Boot the pod; the entrypoint logs [install_github_app_pem] installed PEM at /opt/data/secrets/github-app.pem (0400 hermes:hermes) once on first start, then ... already current; re-asserting mode/owner on subsequent boots.

Out of scope

Provisioning the GitHub App itself (already done — App ID 3780171, Installation 134007007).
Rotation. The secret can be rotated externally; the next pod restart picks up the new key. No live-reload — App tokens cache for 3500s in-process via the existing _try_github_app() path, so a restart picks up the new key within ~1h regardless.
Stronger at-rest protection (KMS/in-memory). See Threat model above.

Adds an opt-in entrypoint hook that fetches a GitHub App private-key PEM from AWS Secrets Manager and drops it on disk before the gosu drop, so Hermes's built-in GitHubAuth._try_github_app() (tools/skills_hub.py) can mint installation tokens for outbound git/GitHub API auth (private taps, gh CLI, raw `git clone`). Triggered by GITHUB_APP_PEM_SECRET_ID; silent no-op when unset (generic deploys aren't burdened). When set, install is REQUIRED and any failure crashes the boot via `set -e` — a misconfigured pilot pod must crash-loop, not serve traffic without working GitHub auth. - docker/install_github_app_pem.py Standalone script (mirrors seed_admin_config.py's design). Reads GITHUB_APP_PEM_SECRET_ID, fetches the secret via boto3, extracts PRIVATE_KEY from the JSON envelope, normalizes literal \\n two-byte sequences to real newlines (a known secret-format quirk), validates the PEM header, atomically writes to GITHUB_APP_PEM_DEST_PATH (default /opt/data/secrets/github-app.pem) at mode 0400 owned by hermes:hermes. Idempotent: skips the rewrite when the destination already matches. - docker/entrypoint.sh Calls the script from the root section after the SOUL.md block. Uses /opt/hermes/.venv/bin/python3 directly because boto3 lives in the bedrock extra and the venv is not yet activated at this point in the entrypoint. - tests/scripts/test_install_github_app_pem.py 13 tests covering silent no-op, happy path, region selection, \\n normalization, idempotency, and every crash-loud failure mode (AWS API error, missing SecretString, invalid JSON, missing PRIVATE_KEY, empty PRIVATE_KEY, non-PEM garbage, non-object payload). Smoke-tested end-to-end against the real AmbulnzLLC vigo-github-app-key secret in us-west-2: fetches, normalizes, writes a 2048-bit RSA PEM at 0400 hermes:hermes that cryptography.load_pem_private_key parses cleanly; second run is a mtime-stable no-op.

github-actions · 2026-05-20T14:03:16Z

🔎 Lint report: `feat/ambulnz-github-app-pem-install` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8349 on HEAD, 8344 on base (🆕 +5)

🆕 New issues (5):

Rule	Count
`unresolved-import`	3
`unresolved-attribute`	2

First entries

tests/scripts/test_seed_skills.py:62: [unresolved-attribute] unresolved-attribute: Unresolved attribute `do_install` on type `ModuleType`
tests/scripts/test_seed_skills.py:64: [unresolved-attribute] unresolved-attribute: Unresolved attribute `skills_hub` on type `ModuleType`
tests/scripts/test_seed_taps.py:26: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/scripts/test_install_github_app_pem.py:27: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/scripts/test_seed_skills.py:28: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 4351 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Pairs with install_github_app_pem.py: that commit drops the GitHub App PEM at boot so Hermes can mint installation tokens for outbound git/API auth. This commit closes the loop by registering the tap entries that use those tokens, so the AmbulnzLLC pilot's shared-skills repo (and any private skill repos in similar deployments) appears in 'hermes skills browse / search / inspect / install' without operators having to run 'hermes skills tap add' after every deploy. Behavior: - HERMES_DEFAULT_TAPS unset/empty -> silent no-op (generic deploys) - Set -> each entry is normalized and appended to taps.json if missing. Format is comma- or newline-separated 'owner/repo[@path]' or full URL. - Idempotent: re-running on the same env doesn't duplicate entries and doesn't rewrite an already-correct file. - Preserves user-added taps; only appends, never deletes. This is a SEED, not a lock — operators can 'hermes skills tap remove' afterwards and the entry is re-seeded on the next boot. - Refuses to clobber a malformed taps.json (non-zero exit), so a hand-edited file with a typo crash-loops rather than silently losing the user's customizations. - Refuses URLs with embedded credentials (user:token@host) — those belong in the App PEM, not in taps.json on the data volume. - Atomic write (tmp + rename) so a crash mid-write leaves the previous taps.json intact. Wired into entrypoint.sh after the gosu drop so the resulting taps.json is owned by the hermes user from the start (it lives at $HERMES_HOME/skills/.hub/taps.json which the hermes user already owns). Tests cover: parsing of all entry forms, separator handling, invalid entries, auth-URL rejection, no-op-on-empty, idempotence, dedup against trailing-slash variants, preservation of user-added taps, malformed-file crash, wrong-shape-file crash, multi-entry seeding, and atomic-write cleanup.

Pairs with seed_taps.py: that registers private skill repos as taps; this commit closes the loop by installing specific skills from those taps (or any other source 'hermes skills install' accepts — well-known indexes, direct SKILL.md URLs, etc.) so the AmbulnzLLC pilot's required skills are present on every pod immediately after boot, instead of the operator having to run 'hermes skills install' for each one post-deploy. Behavior: - HERMES_DEFAULT_SKILLS unset/empty -> silent no-op (generic deploys) - Set -> each entry is fed to hermes_cli.skills_hub.do_install with skip_confirm=True (boot is non-interactive) and force=False. The admin-baked path explicitly does NOT bypass scan verdicts; that's a manual decision. - Format: comma- or newline-separated identifiers (e.g. 'AmbulnzLLC/hermes-shared-skills/skills/data-engineering/airflow-dag') with optional '@<name>' suffix for URL-sourced skills whose SKILL.md has no 'name:' frontmatter field. - Idempotent: skills already in lock.json (matched by recorded identifier, trailing-slash insensitive) are skipped without invoking the installer. - Non-fatal: a single failed install (network blip, scan blocked, malformed SKILL.md, do_install calling sys.exit) emits a warning and the script exits 0 so boot continues. A bad private-skill repo must not crash-loop the pod and take chat down with it. - Refuses URLs with embedded credentials — those belong in the App PEM, not in HERMES_DEFAULT_SKILLS (visible in process tables and pod logs). - Malformed lock.json is warned about, not fatal — installs will still attempt and surface their own errors. Wired into entrypoint.sh after seed_taps.py so any tap-relative identifiers resolve. Runs as the hermes user (post-gosu) so installed skill files end up with correct ownership without an extra chown. This is a SEED, not a lock. After boot the hermes user can 'hermes skills uninstall <name>' for ad-hoc debugging — but on the next container boot the skill will be re-installed. Admin desired-state wins across reboots. Tests cover: parsing of bare/URL forms, '@<name>' suffix, separator handling, invalid entries, auth-URL rejection, no-op-on-empty, idempotence via lockfile match, trailing-slash dedup, partial-failure continuation, SystemExit catch, missing lockfile, and malformed-lockfile non-fatal handling. 17 tests, all passing.

Admin-curated entries in HERMES_DEFAULT_SKILLS are reviewed out-of-band before being added to the seed list, so the scanner's per-pod caution verdicts shouldn't block them at boot. Without this, any default skill that trips scan heuristics (e.g. ambulnz-github-app-auth, which the scanner flags for env-var reads and a redacted git URL example) is silently dropped on every fresh pod, requiring an operator to manually re-run `hermes skills install --force` for each one. The forbid-force comment in the module docstring is replaced with the new contract, including the escape hatch (drop the entry from HERMES_DEFAULT_SKILLS if you want scan verdicts respected for it).

Hawk Newton and others added 4 commits May 20, 2026 10:45

Install bedrock

675d964

hawknewton merged commit bfb2c4f into main May 20, 2026
15 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install GitHub App PEM at boot from AWS Secrets Manager#7

Install GitHub App PEM at boot from AWS Secrets Manager#7
hawknewton merged 5 commits into
mainfrom
feat/ambulnz-github-app-pem-install

vigo-agent Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vigo-agent Bot commented May 20, 2026

What & why

How it works

Contract

Threat model

Files

Verification

Operator runbook (AmbulnzLLC pilot)

Out of scope

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: feat/ambulnz-github-app-pem-install vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 20, 2026 •

edited

Loading

🔎 Lint report: `feat/ambulnz-github-app-pem-install` vs `origin/main`