Install GitHub App PEM at boot from AWS Secrets Manager#7
Merged
Conversation
Adds an opt-in entrypoint hook that fetches a GitHub App private-key PEM
from AWS Secrets Manager and drops it on disk before the gosu drop, so
Hermes's built-in GitHubAuth._try_github_app() (tools/skills_hub.py) can
mint installation tokens for outbound git/GitHub API auth (private taps,
gh CLI, raw `git clone`).
Triggered by GITHUB_APP_PEM_SECRET_ID; silent no-op when unset (generic
deploys aren't burdened). When set, install is REQUIRED and any failure
crashes the boot via `set -e` — a misconfigured pilot pod must
crash-loop, not serve traffic without working GitHub auth.
- docker/install_github_app_pem.py
Standalone script (mirrors seed_admin_config.py's design). Reads
GITHUB_APP_PEM_SECRET_ID, fetches the secret via boto3, extracts
PRIVATE_KEY from the JSON envelope, normalizes literal \\n two-byte
sequences to real newlines (a known secret-format quirk), validates
the PEM header, atomically writes to GITHUB_APP_PEM_DEST_PATH
(default /opt/data/secrets/github-app.pem) at mode 0400 owned by
hermes:hermes. Idempotent: skips the rewrite when the destination
already matches.
- docker/entrypoint.sh
Calls the script from the root section after the SOUL.md block.
Uses /opt/hermes/.venv/bin/python3 directly because boto3 lives in
the bedrock extra and the venv is not yet activated at this point
in the entrypoint.
- tests/scripts/test_install_github_app_pem.py
13 tests covering silent no-op, happy path, region selection, \\n
normalization, idempotency, and every crash-loud failure mode (AWS
API error, missing SecretString, invalid JSON, missing PRIVATE_KEY,
empty PRIVATE_KEY, non-PEM garbage, non-object payload).
Smoke-tested end-to-end against the real AmbulnzLLC vigo-github-app-key
secret in us-west-2: fetches, normalizes, writes a 2048-bit RSA PEM at
0400 hermes:hermes that cryptography.load_pem_private_key parses
cleanly; second run is a mtime-stable no-op.
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-import |
3 |
unresolved-attribute |
2 |
First entries
tests/scripts/test_seed_skills.py:62: [unresolved-attribute] unresolved-attribute: Unresolved attribute `do_install` on type `ModuleType`
tests/scripts/test_seed_skills.py:64: [unresolved-attribute] unresolved-attribute: Unresolved attribute `skills_hub` on type `ModuleType`
tests/scripts/test_seed_taps.py:26: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/scripts/test_install_github_app_pem.py:27: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/scripts/test_seed_skills.py:28: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
✅ Fixed issues: none
Unchanged: 4351 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
Pairs with install_github_app_pem.py: that commit drops the GitHub App PEM at boot so Hermes can mint installation tokens for outbound git/API auth. This commit closes the loop by registering the tap entries that use those tokens, so the AmbulnzLLC pilot's shared-skills repo (and any private skill repos in similar deployments) appears in 'hermes skills browse / search / inspect / install' without operators having to run 'hermes skills tap add' after every deploy. Behavior: - HERMES_DEFAULT_TAPS unset/empty -> silent no-op (generic deploys) - Set -> each entry is normalized and appended to taps.json if missing. Format is comma- or newline-separated 'owner/repo[@path]' or full URL. - Idempotent: re-running on the same env doesn't duplicate entries and doesn't rewrite an already-correct file. - Preserves user-added taps; only appends, never deletes. This is a SEED, not a lock — operators can 'hermes skills tap remove' afterwards and the entry is re-seeded on the next boot. - Refuses to clobber a malformed taps.json (non-zero exit), so a hand-edited file with a typo crash-loops rather than silently losing the user's customizations. - Refuses URLs with embedded credentials (user:token@host) — those belong in the App PEM, not in taps.json on the data volume. - Atomic write (tmp + rename) so a crash mid-write leaves the previous taps.json intact. Wired into entrypoint.sh after the gosu drop so the resulting taps.json is owned by the hermes user from the start (it lives at $HERMES_HOME/skills/.hub/taps.json which the hermes user already owns). Tests cover: parsing of all entry forms, separator handling, invalid entries, auth-URL rejection, no-op-on-empty, idempotence, dedup against trailing-slash variants, preservation of user-added taps, malformed-file crash, wrong-shape-file crash, multi-entry seeding, and atomic-write cleanup.
Pairs with seed_taps.py: that registers private skill repos as taps; this commit closes the loop by installing specific skills from those taps (or any other source 'hermes skills install' accepts — well-known indexes, direct SKILL.md URLs, etc.) so the AmbulnzLLC pilot's required skills are present on every pod immediately after boot, instead of the operator having to run 'hermes skills install' for each one post-deploy. Behavior: - HERMES_DEFAULT_SKILLS unset/empty -> silent no-op (generic deploys) - Set -> each entry is fed to hermes_cli.skills_hub.do_install with skip_confirm=True (boot is non-interactive) and force=False. The admin-baked path explicitly does NOT bypass scan verdicts; that's a manual decision. - Format: comma- or newline-separated identifiers (e.g. 'AmbulnzLLC/hermes-shared-skills/skills/data-engineering/airflow-dag') with optional '@<name>' suffix for URL-sourced skills whose SKILL.md has no 'name:' frontmatter field. - Idempotent: skills already in lock.json (matched by recorded identifier, trailing-slash insensitive) are skipped without invoking the installer. - Non-fatal: a single failed install (network blip, scan blocked, malformed SKILL.md, do_install calling sys.exit) emits a warning and the script exits 0 so boot continues. A bad private-skill repo must not crash-loop the pod and take chat down with it. - Refuses URLs with embedded credentials — those belong in the App PEM, not in HERMES_DEFAULT_SKILLS (visible in process tables and pod logs). - Malformed lock.json is warned about, not fatal — installs will still attempt and surface their own errors. Wired into entrypoint.sh after seed_taps.py so any tap-relative identifiers resolve. Runs as the hermes user (post-gosu) so installed skill files end up with correct ownership without an extra chown. This is a SEED, not a lock. After boot the hermes user can 'hermes skills uninstall <name>' for ad-hoc debugging — but on the next container boot the skill will be re-installed. Admin desired-state wins across reboots. Tests cover: parsing of bare/URL forms, '@<name>' suffix, separator handling, invalid entries, auth-URL rejection, no-op-on-empty, idempotence via lockfile match, trailing-slash dedup, partial-failure continuation, SystemExit catch, missing lockfile, and malformed-lockfile non-fatal handling. 17 tests, all passing.
Admin-curated entries in HERMES_DEFAULT_SKILLS are reviewed out-of-band before being added to the seed list, so the scanner's per-pod caution verdicts shouldn't block them at boot. Without this, any default skill that trips scan heuristics (e.g. ambulnz-github-app-auth, which the scanner flags for env-var reads and a redacted git URL example) is silently dropped on every fresh pod, requiring an operator to manually re-run `hermes skills install --force` for each one. The forbid-force comment in the module docstring is replaced with the new contract, including the escape hatch (drop the entry from HERMES_DEFAULT_SKILLS if you want scan verdicts respected for it).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Wires the AmbulnzLLC pilot's GitHub App private key into the container at boot so Hermes's built-in
GitHubAuth._try_github_app()(intools/skills_hub.py) can mint installation tokens automatically. With this in place, outbound git / GitHub API auth —hermes skills tap add AmbulnzLLC/<repo>,gh api, rawgit clone https://x-access-token:${TOKEN}@github.com/...— Just Works inside a pilot pod, no per-session setup.Companion to PR #6 (the
ambulnz-github-app-authskill, which documents the manual path for ad-hoc agent sessions). This PR does the provisioning half: opt-in via env var, no code changes elsewhere.How it works
GITHUB_APP_PEM_SECRET_ID(and optionalGITHUB_APP_PEM_DEST_PATH, default/opt/data/secrets/github-app.pem) on the pod.docker/entrypoint.sh(root section, before thegosu hermesdrop) invokesdocker/install_github_app_pem.pyvia the venv's Python.PRIVATE_KEYfrom the JSON envelope, normalizes any literal\ntwo-byte sequences to real newlines (known secret-format quirk), validates the PEM header, and atomically writes to the destination at mode0400owned byhermes:hermes.GITHUB_APP_ID,GITHUB_APP_INSTALLATION_ID, andGITHUB_APP_PRIVATE_KEY_PATH(pointing at the file we just wrote). Hermes's existing skills-hub auth does the rest — JWT mint, install-token exchange, 3500s caching — all unchanged.Contract
GITHUB_APP_PEM_SECRET_IDunset → silent no-op. Generic / non-pilot deploys are not burdened. Same shape asseed_admin_config.pyandHERMES_AUTH_JSON_BOOTSTRAP.GITHUB_APP_PEM_SECRET_IDset → install is REQUIRED. Any failure (boto3 missing, AWS API error, malformed JSON, missingPRIVATE_KEYfield, emptyPRIVATE_KEY, non-PEM garbage, write failure) returns non-zero,set -ein the entrypoint crashes the boot. Loud > silent.os.replaceso a crash mid-write leaves the previous PEM intact.Threat model
PEM is plaintext at rest. An attacker with shell as
hermesalready has the pod's IRSA / instance role and can callsecretsmanager:GetSecretValuedirectly, sochmod 0400+ per-pod filesystem isolation is sufficient for the pilot. Stronger protection (KMS-encrypted on-disk store, in-memory-only key) is out of scope here and would require corresponding code changes in_try_github_app().Files
docker/install_github_app_pem.pyseed_admin_config.pydesign (env-driven, opt-in, crash-loud, silent no-op when unset).docker/entrypoint.sh${INSTALL_DIR}/.venv/bin/python3because the venv is not yet activated andboto3lives in thebedrockextra.tests/scripts/test_install_github_app_pem.py\nliteral normalization, idempotency, region propagation fromAWS_REGION, and every crash-loud failure mode (AWS error, missingSecretString, invalid JSON, missingPRIVATE_KEY, emptyPRIVATE_KEY, non-PEM, non-object payload).Verification
pytest tests/scripts/test_install_github_app_pem.py→ 13 passed.bash -n docker/entrypoint.shclean.vigo-github-app-keysecret inus-west-2:0o400 hermes:hermes→cryptography.serialization.load_pem_private_keyparses it cleanly as a 2048-bit RSA private key.mtimeunchanged), re-asserts mode/owner._try_github_app()mints aghs_…installation token from the on-disk PEM.Operator runbook (AmbulnzLLC pilot)
Set on the pod:
(Optional override:
GITHUB_APP_PEM_DEST_PATHif you want the PEM somewhere other than the default.)The pod's IRSA role needs
secretsmanager:GetSecretValueon the specific secret ARN — that's it. Boot the pod; the entrypoint logs[install_github_app_pem] installed PEM at /opt/data/secrets/github-app.pem (0400 hermes:hermes)once on first start, then... already current; re-asserting mode/owneron subsequent boots.Out of scope
3780171, Installation134007007)._try_github_app()path, so a restart picks up the new key within ~1h regardless.