Skip to content

rfc(auth): rethink per-codebase GitHub PAT injection — env vars are the wrong primitive #1467

@133Felix

Description

@133Felix

Problem

For GitHub auth, Archon currently relies on per-codebase env vars stored in remote_agent_codebase_env_vars and unconditionally injected as GH_TOKEN (and friends) into the Claude/Codex/Pi subprocess process.env. This works for the happy path but is the wrong primitive on three axes:

1. Plain-text leakage surface

Once a PAT is in process.env, every Bash tool call inside the agent run can read it:

echo $GH_TOKEN              # trivially exfiltratable
git remote -v               # may print credential-rewritten URLs
env | grep -i token         # one tool call from full disclosure

The model itself doesn't have to be malicious — Claude can quote shell output back into chat verbatim, or chain a tool call that prints env in error-handling. There is no narrow channel between "give git access to the token" and "give the agent free read of the token."

2. Cross-repo scope confusion

The token loaded for a workflow run is keyed off conversation.codebase_id (orchestrator-agent.ts:843). When the agent pivots to a different registered repo mid-conversation (e.g. cd /.archon/workspaces/<other-org>/<repo>/source && git pull), the spawned subprocess still inherits the bound codebase's env — wrong scope, often wrong identity. Real example surfaced last week: 133Felix-bound conversation → git pull on nashtrader repo → 403, because the personal-PAT default leaked through after the per-codebase one didn't match.

This is structurally unfixable inside the env-var model. The token's scope is the conversation, but the operation's scope is the working directory — and there's no contract between the two.

3. At-rest plain-text in DB

remote_agent_codebase_env_vars.value is a TEXT column. No encryption-at-rest, no kms wrapping, no rotation hook. Anyone with read access to the Postgres has every PAT for every registered codebase.

4. Setup-time and runtime lifecycle is broken for the server-mode use case

The PAT-as-env-var model assumes a fixed, known-at-setup-time universe of repos and orgs. That is not the actual shape of how Archon gets used:

At first install, the user provides a PAT. Naturally, that PAT is whatever GitHub identity the user happened to be logged in as — a personal-account fine-grained PAT, scoped to the orgs/repos they personally have selected at PAT-creation time. This becomes the implicit "global default" via ~/.archon/.env GH_TOKEN.

Then a second org enters the picture — say the user gets contractor access to <some-org>/, or starts a side project under a different account, or is a member of an org that requires fine-grained PATs to be approved by an org admin (so a personal blanket PAT can't even cover it). What does the user have to do today?

  1. Generate a second PAT, scoped to the new org
  2. Register the codebase in Archon (UI / /register-project)
  3. Manually upsert a per-codebase env var GH_TOKEN for that codebase (UI or PUT /api/codebases/:id/env)
  4. Hope no flow re-uses the global GH_TOKEN from process.env for this org's repos
  5. Repeat the entire dance every time the PAT expires (fine-grained PATs cap at 1 year)

In server mode (long-running Docker container, e.g. on homeserverai), the consequences compound:

  • Token rotation = either editing the DB live or restarting the container (which wipes in-flight conversations and re-runs container init)
  • An expired org-PAT means silent 403s for every workflow against that codebase until somebody notices
  • New repo onboarded into an existing org? User has to remember to either widen the existing PAT (re-issue it on GitHub) or accept it'll fail with an unhelpful 404/403 and dig out which token covers which repo
  • The user can't reason about "which credential is in flight" by inspecting process.env in the running container — it's whatever the global default was at last container build, plus per-codebase overrides applied at request time

Cross-org operations within one conversation are structurally impossible. Today a Telegram conversation is bound to exactly one codebase, and that codebase has exactly one GH_TOKEN. If the agent legitimately needs to e.g. open a PR in <org-A>/repo-1 referencing context in <org-B>/repo-2, there's no path — the env-var primitive can't carry two scoped identities at once into one subprocess.

A GitHub App neutralizes all of this because the trust relationship inverts: the user installs the App on each org/repo (one-time, in the GitHub UI), and from then on Archon mints short-lived per-installation tokens on demand, scoped exactly to the operation. New repos appear on the Archon side via the installation_repositories webhook with zero user-side ceremony. Tokens expire ~1h, so rotation is automatic. Removal is a single click in GitHub. Server restarts are unrelated to credential state.

This is the real cost of the env-var primitive — not just the leakage surface (axes 1–3 above), but the operational cost of every "I added another org" event. For a single-developer tool that wants to scale from "my one repo" to "my repos + a contracting client + a community project I help maintain" without becoming a config-management exercise, the lifecycle question is at least as load-bearing as the security question.

What GitHub actually offers

For server-side automation against GitHub repos, the canonical mechanisms are (in roughly increasing order of properness):

  1. GIT_ASKPASS helper script — git invokes a small script on demand; script reads the right token from a vault and prints it on stdout. Token never enters process.env. Trivial to implement, instantly closes leakage axis Model stucked at response stream text #1.
  2. gh auth login --with-token + credential helper — token persisted in OS keychain / secretstorage / file, git auth handled by the helper. Same property: never in env.
  3. GitHub App + installation tokens — server holds an App private key, mints short-lived (~1h) installation tokens per repo on demand. Tokens scoped at the GitHub-App permission level, automatically rotated, and scoped per-installation = per-org-or-repo. This is what GitHub's own automation pipelines (Actions, Renovate, Dependabot) use.

Of these:

  • (1) is a 50-line patch and a strict improvement over today
  • (2) requires adapter wiring per platform but unblocks gh CLI parity
  • (3) is the real architectural answer for multi-repo / multi-org Archon and would obsolete most of the per-codebase-PAT flow

Proposal

Tracking issue for moving GitHub auth off raw env-var injection, in three steps that can land independently:

Phase A — GIT_ASKPASS shim (low effort, immediate win)

  • New helper at packages/git/src/credentials.ts (or similar) that writes a per-run askpass script and sets GIT_ASKPASS=<path> for git operations only. No GH_TOKEN in the subprocess env at all.
  • Migrate the git pull / git push / syncWorkspaceBeforeCreate paths in @archon/git and @archon/isolation to use it.
  • Keep existing per-codebase env-var table as the backing store — we change the injection mechanism, not the storage model.

Phase B — credential-helper / gh auth setup-git integration

  • Equivalent treatment for the gh CLI tool calls Claude/Codex make.
  • Probably wants a sandboxed-per-run ~/.gitconfig so concurrent workflows don't trample each other.

Phase C — GitHub App support (real architectural fix)

  • New auth.github.app config block (App ID + private key path + optional installation map).
  • Provider integration mints short-lived installation tokens lazily, per repo, per workflow run.
  • Per-codebase PAT becomes opt-in fallback ("I don't want to install a GitHub App"), not the default.

Files affected (Phase A scaffold)

  • packages/git/ — new credentials helper module
  • packages/isolation/src/providers/worktree.ts — use askpass for clone / fetch
  • packages/workflows/src/dag-executor.ts — strip GH_TOKEN etc. from inherited env on bash/script nodes that don't need it (most don't)
  • packages/core/src/orchestrator/orchestrator-agent.ts:851 — stop unconditionally injecting GH_TOKEN into the provider env; route GitHub-specific creds through the askpass path instead
  • packages/server/src/routes/api.ts/api/codebases/:id/env UI hint that token-shaped values get the askpass treatment

Out of scope

  • Provider API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY) — those are different shape, often legitimately env-var-shaped per SDK contract. This issue is specifically about git/GitHub credentials.
  • Encryption-at-rest of the env-var table — adjacent improvement, separate issue if pursued.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High priority - Address soon, next in queuearchitectureArchitectural changes and designfeatureNew functionality (planned)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions