Skip to content

security: force redaction on in gateway mode, protect config.yaml writes#8738

Open
Dylanwooo wants to merge 1 commit into
NousResearch:mainfrom
Dylanwooo:fix/redact-secrets-gateway-hardening
Open

security: force redaction on in gateway mode, protect config.yaml writes#8738
Dylanwooo wants to merge 1 commit into
NousResearch:mainfrom
Dylanwooo:fix/redact-secrets-gateway-hardening

Conversation

@Dylanwooo

Copy link
Copy Markdown

Summary

The security.redact_secrets: false config option disables all secret redaction globally — logs, tool output, LLM context, and outbound messages — with a single boolean flip. This creates two concrete risks:

  1. Gateway multi-user exposure: In gateway mode (Telegram/Discord/Slack), disabling redaction leaks API keys, bot tokens, and credentials into LLM context that can be reflected to any connected user.
  2. Prompt injection → persistent config write: ~/.hermes/config.yaml is not protected by approval.py, so an LLM manipulated via prompt injection can write security: { redact_secrets: false } to config.yaml. The change takes effect on next startup, silently disabling all 11 redaction call sites.

What this PR does

1. Force redaction ON in gateway mode (agent/redact.py)

Gateway serves multiple users over messaging platforms — secrets in LLM context risk being reflected to any connected user. The config option is now ignored in gateway mode:

_IS_GATEWAY = os.getenv("HERMES_GATEWAY_SESSION", "").lower() in ("1", "true", "yes")
_USER_DISABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("0", "false", "no", "off")
_REDACT_ENABLED = True if _IS_GATEWAY else not _USER_DISABLED

When redaction is disabled (CLI mode) or overridden (gateway mode), a WARNING is now logged so administrators know the state.

2. Protect config.yaml from unreviewed writes (tools/approval.py)

approval.py already protects ~/.hermes/.env from shell writes (tee, redirection). This PR extends the same protection to ~/.hermes/config.yaml, which contains security settings (redact_secrets, tirith_enabled, etc.) that control the agent's own safety mechanisms.

3. Clarify gateway comment (gateway/run.py)

Documents why the config value is still read even though gateway forces redaction — so redact.py can log the override warning.

Why this doesn't affect agent capability

Redaction operates at the output text layer, not the permission layer:

  • The agent can still read any file, run any command, make any API call — unchanged
  • Environment variables used by subprocesses ($OPENAI_API_KEY in curl, etc.) are the real values from the process env, never touched by redaction
  • The partial mask (sk-pro...l012 — first 6 + last 4 chars) is sufficient for the LLM to confirm a key exists, distinguish between different keys, and diagnose format/provider errors
  • CLI mode still respects redact_secrets: false for local debugging

Design notes

The existing import-time snapshot (redact.py:16-18) correctly prevents runtime export mutations from disabling redaction mid-session — this shows the threat was anticipated. This PR closes two remaining gaps:

  • The config.yaml persistence path (write config → restart → redaction off)
  • Gateway mode not being treated as a higher trust boundary

Tests

12 new tests across 2 files:

  • tests/agent/test_redact.pyTestGatewayForcesRedaction: 4 tests verifying gateway override, env var variations, CLI fallback, and default behavior
  • tests/tools/test_approval.pyTestTeePattern + TestSensitiveRedirectConfigYaml + TestSensitiveRedirectPattern: 8 tests verifying config.yaml write detection via tee, redirect, append, and that unrelated config.yaml paths are not flagged

All existing tests pass unchanged (47 redact + 126 approval = 173 total).

The `security.redact_secrets: false` config option disables all secret
redaction globally — logs, tool output, LLM context, and outbound
messages — with a single boolean flip. This creates two risks:

1. Gateway multi-user exposure: in gateway mode, disabling redaction
   leaks API keys and credentials into LLM context that can be
   reflected to any connected user.

2. Prompt injection persistence: ~/.hermes/config.yaml was not
   protected by the approval system, so an LLM manipulated via
   prompt injection could write `security: { redact_secrets: false }`
   and silently disable all 11 redaction call sites on next startup.

Changes:
- agent/redact.py: detect HERMES_GATEWAY_SESSION and force
  _REDACT_ENABLED=True regardless of config; log warnings when
  redaction is disabled or overridden
- tools/approval.py: add ~/.hermes/config.yaml to _SENSITIVE_WRITE_TARGET
  so shell writes require the same approval as .env
- gateway/run.py: clarify comment on why config value is still read
- tests: 12 new tests covering gateway override logic and config.yaml
  approval detection
@alt-glitch alt-glitch added type/security Security vulnerability or hardening P1 High — major feature broken, no workaround comp/gateway Gateway runner, session dispatch, delivery area/config Config system, migrations, profiles labels Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround type/security Security vulnerability or hardening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants