Claude Code exposes secrets from .env / .dev.vars files via grep -n and Read tool, despite CLAUDE.md prohibitions

## Summary

Claude Code will read and echo the contents of secret-bearing files (`.env`, `.dev.vars`, credential files) into the conversation transcript, even when the user's `CLAUDE.md` contains explicit prohibitions against doing so. The model's safety reflex fires on *output it has already produced*, not on commands it is *about to run*, so the violation is detected only **after** the secret has already been written to chat history — at which point rotation is the only remedy.

## Reproduction

1. User has a global `CLAUDE.md` containing rules like:
   > NEVER output secret values into chat. NEVER read `.env` files unless strictly necessary, and never reproduce any values in your response.
2. User asks Claude Code to help populate a `.dev.vars` file with API tokens.
3. User accidentally pastes a malformed value (e.g. a `curl` example command) instead of a bare token.
4. Claude Code attempts to diagnose by running:
   ```
   grep -n curl .dev.vars
   ```
5. `grep -n` prints the **full matching line**, including the real API token, into the tool result — which is rendered into the conversation transcript and persisted in history.
6. Claude Code then notices the violation, apologizes, and instructs the user to rotate the exposed credential.

This is reproducible with any inspection command that can echo file contents: `Read`, `cat`, `head`, `tail`, `grep` without `-c`, `sed`/`awk` patterns that print full lines, etc. The model knows the rules but does not pre-screen its own commands against them.

## Why the existing safeguards don't work

- `CLAUDE.md` instructions are *advisory* — they shape intent but don't block tool calls at execution time.
- The model's safety check runs on generated output it can see, not on the side effects of tool calls it's about to issue.
- "Operational reflexes" (e.g. defaulting to `grep -n` for line numbers) override the rule because the rule is stored as guidance, not as a command-level filter.
- The remediation flow ("apologize and ask the user to rotate") is fired *after* the credential is already in conversation history, server logs, and any downstream telemetry. By then the damage is done.

## Proposed fix

A defense-in-depth approach at the harness level, not the prompt level:

1. **Path-based pre-execution filter.** When a Bash/Read/Grep tool call targets a path matching a known secret pattern (`.env*`, `.dev.vars*`, `*.pem`, `*.key`, `credentials*.json`, `*secret*`, `*token*`), require the command to be on an allowlist of value-stripping operations (`wc -l`, `grep -c`, `stat`, length-only `awk` patterns, blind `Edit` with literal `old_string`). Block everything else by default.
2. **Output-side scrubber.** Before tool results are appended to the conversation, run a regex pass over their stdout looking for high-entropy strings, known token prefixes (`sk-`, `sk-ant-`, `cfut_`, `ghp_`, `xox[bp]-`, AWS access keys, JWT shapes, etc.) and redact them with a `<REDACTED:reason>` placeholder. The model should see the redaction marker, not the value.
3. **First-class \"this file contains secrets\" annotation.** When the model has reason to believe a file contains secrets (filename, prior context, user statement), surface that as a structured flag in its working state, not just as a soft instruction in the prompt.
4. **Make the pre-execution filter overridable but loud.** A user can explicitly opt-out for a one-shot command, but the override should be per-command, logged, and never the default.

The key principle: **the model should not be trusted to police its own tool calls against secrets**. The harness should enforce it.

## User impact

This is not a hypothetical. Real reproduction in a real session today:

- A Cloudflare API token was written to chat history while diagnosing a malformed `.dev.vars` file.
- The token had to be rotated immediately, requiring re-issuance, re-pasting into the file, and re-verification.
- The user had to interrupt their actual work (deploying a Cloudflare Worker) to deal with the cleanup.
- The exposed value exists in conversation history, transmission logs, and any retained telemetry between the user's machine and Anthropic's servers — beyond the user's direct control to scrub.

For developers using Claude Code on client work, the consequences scale fast: a leaked production credential during a paid engagement is a trust-and-reputation event, not just a chore. The implicit cost of this class of bug — minutes per incident, multiplied across the developer base, multiplied by the percentage of incidents that involve real production secrets — is substantial. Anthropic should consider whether affected users are owed credit for time lost to cleanup caused by the agent's own failure to honor its documented safety rules, in addition to fixing the underlying bug.

## Environment

- Claude Code CLI on Windows 11 (MINGW64 / Git Bash)
- Model: Claude Opus 4.6 (1M context)
- Project has a global `CLAUDE.md` with explicit secret-handling rules
- User confirmed reproducibility immediately after the incident

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude Code exposes secrets from .env / .dev.vars files via grep -n and Read tool, despite CLAUDE.md prohibitions #44868

Summary

Reproduction

Why the existing safeguards don't work

Proposed fix

User impact

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Claude Code exposes secrets from .env / .dev.vars files via grep -n and Read tool, despite CLAUDE.md prohibitions #44868

Description

Summary

Reproduction

Why the existing safeguards don't work

Proposed fix

User impact

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions