Skip to content

Install GitHub App PEM at boot from AWS Secrets Manager#7

Merged
hawknewton merged 5 commits into
mainfrom
feat/ambulnz-github-app-pem-install
May 20, 2026
Merged

Install GitHub App PEM at boot from AWS Secrets Manager#7
hawknewton merged 5 commits into
mainfrom
feat/ambulnz-github-app-pem-install

Conversation

@vigo-agent

@vigo-agent vigo-agent Bot commented May 20, 2026

Copy link
Copy Markdown

What & why

Wires the AmbulnzLLC pilot's GitHub App private key into the container at boot so Hermes's built-in GitHubAuth._try_github_app() (in tools/skills_hub.py) can mint installation tokens automatically. With this in place, outbound git / GitHub API auth — hermes skills tap add AmbulnzLLC/<repo>, gh api, raw git clone https://x-access-token:${TOKEN}@github.com/... — Just Works inside a pilot pod, no per-session setup.

Companion to PR #6 (the ambulnz-github-app-auth skill, which documents the manual path for ad-hoc agent sessions). This PR does the provisioning half: opt-in via env var, no code changes elsewhere.

How it works

  1. Operator sets GITHUB_APP_PEM_SECRET_ID (and optional GITHUB_APP_PEM_DEST_PATH, default /opt/data/secrets/github-app.pem) on the pod.
  2. docker/entrypoint.sh (root section, before the gosu hermes drop) invokes docker/install_github_app_pem.py via the venv's Python.
  3. The script fetches the named secret from AWS Secrets Manager, extracts PRIVATE_KEY from the JSON envelope, normalizes any literal \n two-byte sequences to real newlines (known secret-format quirk), validates the PEM header, and atomically writes to the destination at mode 0400 owned by hermes:hermes.
  4. Operator separately sets GITHUB_APP_ID, GITHUB_APP_INSTALLATION_ID, and GITHUB_APP_PRIVATE_KEY_PATH (pointing at the file we just wrote). Hermes's existing skills-hub auth does the rest — JWT mint, install-token exchange, 3500s caching — all unchanged.

Contract

  • GITHUB_APP_PEM_SECRET_ID unset → silent no-op. Generic / non-pilot deploys are not burdened. Same shape as seed_admin_config.py and HERMES_AUTH_JSON_BOOTSTRAP.
  • GITHUB_APP_PEM_SECRET_ID set → install is REQUIRED. Any failure (boto3 missing, AWS API error, malformed JSON, missing PRIVATE_KEY field, empty PRIVATE_KEY, non-PEM garbage, write failure) returns non-zero, set -e in the entrypoint crashes the boot. Loud > silent.
  • Idempotent. Re-running with the same secret content skips the rewrite (mtime-stable). Mode/owner are still re-asserted to correct any drift.
  • Atomic write. Tmp file + os.replace so a crash mid-write leaves the previous PEM intact.

Threat model

PEM is plaintext at rest. An attacker with shell as hermes already has the pod's IRSA / instance role and can call secretsmanager:GetSecretValue directly, so chmod 0400 + per-pod filesystem isolation is sufficient for the pilot. Stronger protection (KMS-encrypted on-disk store, in-memory-only key) is out of scope here and would require corresponding code changes in _try_github_app().

Files

File Purpose
docker/install_github_app_pem.py New. Standalone install script. Mirrors the seed_admin_config.py design (env-driven, opt-in, crash-loud, silent no-op when unset).
docker/entrypoint.sh +22 lines after the SOUL.md block in the root section. Calls the script via ${INSTALL_DIR}/.venv/bin/python3 because the venv is not yet activated and boto3 lives in the bedrock extra.
tests/scripts/test_install_github_app_pem.py New. 13 tests: silent no-op (unset, empty), happy path (install + 0400 + PEM trailer normalization), \n literal normalization, idempotency, region propagation from AWS_REGION, and every crash-loud failure mode (AWS error, missing SecretString, invalid JSON, missing PRIVATE_KEY, empty PRIVATE_KEY, non-PEM, non-object payload).

Verification

  • Unit tests: pytest tests/scripts/test_install_github_app_pem.py → 13 passed.
  • Shell syntax: bash -n docker/entrypoint.sh clean.
  • Live smoke in a Vigo dev pod against the real vigo-github-app-key secret in us-west-2:
    • First run: fetches secret → writes 1679-byte PEM at 0o400 hermes:hermescryptography.serialization.load_pem_private_key parses it cleanly as a 2048-bit RSA private key.
    • Second run: detects matching contents, skips the write (mtime unchanged), re-asserts mode/owner.
    • Hermes's _try_github_app() mints a ghs_… installation token from the on-disk PEM.

Operator runbook (AmbulnzLLC pilot)

Set on the pod:

GITHUB_APP_PEM_SECRET_ID=arn:aws:secretsmanager:us-west-2:<acct>:secret:vigo-github-app-key-XXXXXX
GITHUB_APP_ID=3780171
GITHUB_APP_INSTALLATION_ID=134007007
GITHUB_APP_PRIVATE_KEY_PATH=/opt/data/secrets/github-app.pem
AWS_REGION=us-west-2

(Optional override: GITHUB_APP_PEM_DEST_PATH if you want the PEM somewhere other than the default.)

The pod's IRSA role needs secretsmanager:GetSecretValue on the specific secret ARN — that's it. Boot the pod; the entrypoint logs [install_github_app_pem] installed PEM at /opt/data/secrets/github-app.pem (0400 hermes:hermes) once on first start, then ... already current; re-asserting mode/owner on subsequent boots.

Out of scope

  • Provisioning the GitHub App itself (already done — App ID 3780171, Installation 134007007).
  • Rotation. The secret can be rotated externally; the next pod restart picks up the new key. No live-reload — App tokens cache for 3500s in-process via the existing _try_github_app() path, so a restart picks up the new key within ~1h regardless.
  • Stronger at-rest protection (KMS/in-memory). See Threat model above.

Adds an opt-in entrypoint hook that fetches a GitHub App private-key PEM
from AWS Secrets Manager and drops it on disk before the gosu drop, so
Hermes's built-in GitHubAuth._try_github_app() (tools/skills_hub.py) can
mint installation tokens for outbound git/GitHub API auth (private taps,
gh CLI, raw `git clone`).

Triggered by GITHUB_APP_PEM_SECRET_ID; silent no-op when unset (generic
deploys aren't burdened).  When set, install is REQUIRED and any failure
crashes the boot via `set -e` — a misconfigured pilot pod must
crash-loop, not serve traffic without working GitHub auth.

- docker/install_github_app_pem.py
    Standalone script (mirrors seed_admin_config.py's design).  Reads
    GITHUB_APP_PEM_SECRET_ID, fetches the secret via boto3, extracts
    PRIVATE_KEY from the JSON envelope, normalizes literal \\n two-byte
    sequences to real newlines (a known secret-format quirk), validates
    the PEM header, atomically writes to GITHUB_APP_PEM_DEST_PATH
    (default /opt/data/secrets/github-app.pem) at mode 0400 owned by
    hermes:hermes.  Idempotent: skips the rewrite when the destination
    already matches.

- docker/entrypoint.sh
    Calls the script from the root section after the SOUL.md block.
    Uses /opt/hermes/.venv/bin/python3 directly because boto3 lives in
    the bedrock extra and the venv is not yet activated at this point
    in the entrypoint.

- tests/scripts/test_install_github_app_pem.py
    13 tests covering silent no-op, happy path, region selection, \\n
    normalization, idempotency, and every crash-loud failure mode (AWS
    API error, missing SecretString, invalid JSON, missing PRIVATE_KEY,
    empty PRIVATE_KEY, non-PEM garbage, non-object payload).

Smoke-tested end-to-end against the real AmbulnzLLC vigo-github-app-key
secret in us-west-2: fetches, normalizes, writes a 2048-bit RSA PEM at
0400 hermes:hermes that cryptography.load_pem_private_key parses
cleanly; second run is a mtime-stable no-op.
@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown

🔎 Lint report: feat/ambulnz-github-app-pem-install vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8349 on HEAD, 8344 on base (🆕 +5)

🆕 New issues (5):

Rule Count
unresolved-import 3
unresolved-attribute 2
First entries
tests/scripts/test_seed_skills.py:62: [unresolved-attribute] unresolved-attribute: Unresolved attribute `do_install` on type `ModuleType`
tests/scripts/test_seed_skills.py:64: [unresolved-attribute] unresolved-attribute: Unresolved attribute `skills_hub` on type `ModuleType`
tests/scripts/test_seed_taps.py:26: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/scripts/test_install_github_app_pem.py:27: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/scripts/test_seed_skills.py:28: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 4351 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Hawk Newton and others added 4 commits May 20, 2026 10:45
Pairs with install_github_app_pem.py: that commit drops the GitHub App
PEM at boot so Hermes can mint installation tokens for outbound git/API
auth.  This commit closes the loop by registering the tap entries that
use those tokens, so the AmbulnzLLC pilot's shared-skills repo (and any
private skill repos in similar deployments) appears in 'hermes skills
browse / search / inspect / install' without operators having to run
'hermes skills tap add' after every deploy.

Behavior:
- HERMES_DEFAULT_TAPS unset/empty -> silent no-op (generic deploys)
- Set -> each entry is normalized and appended to taps.json if missing.
  Format is comma- or newline-separated 'owner/repo[@path]' or full URL.
- Idempotent: re-running on the same env doesn't duplicate entries and
  doesn't rewrite an already-correct file.
- Preserves user-added taps; only appends, never deletes.  This is a
  SEED, not a lock — operators can 'hermes skills tap remove' afterwards
  and the entry is re-seeded on the next boot.
- Refuses to clobber a malformed taps.json (non-zero exit), so a
  hand-edited file with a typo crash-loops rather than silently losing
  the user's customizations.
- Refuses URLs with embedded credentials (user:token@host) — those
  belong in the App PEM, not in taps.json on the data volume.
- Atomic write (tmp + rename) so a crash mid-write leaves the previous
  taps.json intact.

Wired into entrypoint.sh after the gosu drop so the resulting
taps.json is owned by the hermes user from the start (it lives at
$HERMES_HOME/skills/.hub/taps.json which the hermes user already owns).

Tests cover: parsing of all entry forms, separator handling, invalid
entries, auth-URL rejection, no-op-on-empty, idempotence, dedup against
trailing-slash variants, preservation of user-added taps, malformed-file
crash, wrong-shape-file crash, multi-entry seeding, and atomic-write
cleanup.
Pairs with seed_taps.py: that registers private skill repos as taps;
this commit closes the loop by installing specific skills from those
taps (or any other source 'hermes skills install' accepts — well-known
indexes, direct SKILL.md URLs, etc.) so the AmbulnzLLC pilot's required
skills are present on every pod immediately after boot, instead of the
operator having to run 'hermes skills install' for each one post-deploy.

Behavior:
- HERMES_DEFAULT_SKILLS unset/empty -> silent no-op (generic deploys)
- Set -> each entry is fed to hermes_cli.skills_hub.do_install with
  skip_confirm=True (boot is non-interactive) and force=False.  The
  admin-baked path explicitly does NOT bypass scan verdicts; that's a
  manual decision.
- Format: comma- or newline-separated identifiers (e.g.
  'AmbulnzLLC/hermes-shared-skills/skills/data-engineering/airflow-dag')
  with optional '@<name>' suffix for URL-sourced skills whose SKILL.md
  has no 'name:' frontmatter field.
- Idempotent: skills already in lock.json (matched by recorded
  identifier, trailing-slash insensitive) are skipped without invoking
  the installer.
- Non-fatal: a single failed install (network blip, scan blocked,
  malformed SKILL.md, do_install calling sys.exit) emits a warning and
  the script exits 0 so boot continues.  A bad private-skill repo must
  not crash-loop the pod and take chat down with it.
- Refuses URLs with embedded credentials — those belong in the App PEM,
  not in HERMES_DEFAULT_SKILLS (visible in process tables and pod logs).
- Malformed lock.json is warned about, not fatal — installs will still
  attempt and surface their own errors.

Wired into entrypoint.sh after seed_taps.py so any tap-relative
identifiers resolve.  Runs as the hermes user (post-gosu) so installed
skill files end up with correct ownership without an extra chown.

This is a SEED, not a lock.  After boot the hermes user can
'hermes skills uninstall <name>' for ad-hoc debugging — but on the next
container boot the skill will be re-installed.  Admin desired-state
wins across reboots.

Tests cover: parsing of bare/URL forms, '@<name>' suffix, separator
handling, invalid entries, auth-URL rejection, no-op-on-empty,
idempotence via lockfile match, trailing-slash dedup, partial-failure
continuation, SystemExit catch, missing lockfile, and malformed-lockfile
non-fatal handling.  17 tests, all passing.
Admin-curated entries in HERMES_DEFAULT_SKILLS are reviewed out-of-band
before being added to the seed list, so the scanner's per-pod caution
verdicts shouldn't block them at boot. Without this, any default skill
that trips scan heuristics (e.g. ambulnz-github-app-auth, which the
scanner flags for env-var reads and a redacted git URL example) is
silently dropped on every fresh pod, requiring an operator to manually
re-run `hermes skills install --force` for each one.

The forbid-force comment in the module docstring is replaced with the
new contract, including the escape hatch (drop the entry from
HERMES_DEFAULT_SKILLS if you want scan verdicts respected for it).
@hawknewton hawknewton merged commit bfb2c4f into main May 20, 2026
15 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant