Skip to content

[Feature]: load_hermes_dotenv() uses override=True, breaking 12-factor env precedence and creating a credential-rotation footgun #18705

@vtbjbb-alt

Description

@vtbjbb-alt

Problem or Use Case

Summary

hermes_cli/env_loader.py:load_hermes_dotenv() loads ~/.hermes/.env
with override=True, so values in .env override anything already
present in os.environ — including values intentionally injected by
systemd EnvironmentFile=, Docker secrets, Kubernetes env, and
CI/CD pipelines.

This is the opposite of 12-factor config precedence
and the opposite of python-dotenv's own default. It makes credential
rotation in production deployments very brittle.

Code reference

hermes_cli/env_loader.py (around line 164):

if user_env.exists():
    _load_dotenv_with_fallback(user_env, override=True)   # ← here
    loaded.append(user_env)

if project_env_path and project_env_path.exists():
    _load_dotenv_with_fallback(project_env_path, override=not loaded)  # ← sensible
    loaded.append(project_env_path)

The project_env_path block uses override=not loaded (only override
if no user_env loaded), which is conservative and correct. The
user_env block uses unconditional override=True.

Real-world failure mode (Hermes 0.12.0, commit f903cee)

  1. Operator stores OPENAI_API_KEY in a managed secret store and
    injects it via systemd EnvironmentFile=/path/to/secrets.env.
    Verified: tr '\0' '\n' < /proc/<MainPID>/environ | grep OPENAI_API_KEY
    shows the new key.

  2. Operator rotates the key (revoke old, write new to secret store,
    restart hermes-dashboard.service). Main process env updated.

  3. User opens Web Chat. Hermes spawns a run_agent subprocess.
    Subprocess inherits parent env (new key), then immediately calls
    load_hermes_dotenv() per cli.py:92.

  4. Stale OPENAI_API_KEY=<old-key> line remains in ~/.hermes/.env
    from a months-old hermes setup run. Because override=True,
    the stale key wins in the subprocess.

  5. OpenAI returns 401. Chat fails. The operator has no clue why,
    because the main process env is correct.

The operator has effectively no way to know this is happening
without reading the source — hermes update doesn't migrate
~/.hermes/.env, hermes setup doesn't warn about staleness, and
OpenAI's 401 message redacts the middle of the key, so the
old-key-prefix in errors.log is easy to miss.

Suggested fix

Change to override=False:

if user_env.exists():
    _load_dotenv_with_fallback(user_env, override=False)
    loaded.append(user_env)

Rationale:

  • 12-factor: runtime config (env vars) takes precedence over file config
  • python-dotenv's own default is override=False; the current code
    explicitly opts out of that default
  • Aligns with the project_env block's existing override=not loaded
  • Removes a whole class of "I rotated the key but it's not taking
    effect" bugs

Backward-compat note

Current docstring says:

"~/.hermes/.env overrides stale shell-exported values when present."

This protects users from forgotten export OPENAI_API_KEY=... in
their ~/.bashrc. Reasonable goal, wrong default for non-interactive
deployments.

Two safer alternatives:

A. Just flip the default — most users in production deploy via
systemd/docker/k8s where the runtime injection is intentional.

B. Add an opt-in toggleos.getenv("HERMES_DOTENV_OVERRIDE", "0") == "1"
for users who really want the old behavior.

I'd suggest A; B if backward-compat is critical.

Environment

  • Hermes 0.12.0 (hermes-agent commit f903ceec)
  • Python 3.11.15
  • Linux (Ubuntu 24.04), systemd-managed deployment

Proposed Solution

Alternatives Considered

No response

Feature Type

New tool

Scope

None

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/configConfig system, migrations, profilescomp/cliCLI entry point, hermes_cli/, setup wizardtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions