Skip to content

feat: {{secret:NAME}} substitution in headers + body (#29)#19

Closed
bglusman wants to merge 18 commits intomainfrom
feat/substitution-body-headers
Closed

feat: {{secret:NAME}} substitution in headers + body (#29)#19
bglusman wants to merge 18 commits intomainfrom
feat/substitution-body-headers

Conversation

@bglusman
Copy link
Copy Markdown
Owner

Summary

Extends the URL-path substitution already shipped in PR #17 to cover
the two other places a {{secret:NAME}} ref can appear in an
outbound request:

  • Header values — every non-hop-by-hop header value runs through
    the new resolve_and_substitute() helper. Header NAMES are never
    substituted (can't usefully parameterize them; substituting into a
    key would let a secret leak as a header name in middle-box logs).
  • Body — content-type-gated:
    • application/json, application/*+json,
      application/x-www-form-urlencoded, text/* get full
      find-and-substitute.
    • Everything else (multipart, binary) takes the defensive
      raw-bytes path per RFC §11.8: memchr-style scan for
      {{secret:; if found, block the request; otherwise pass
      through unchanged. An agent cannot smuggle a ref through an
      unsupported content-type.

Real bug caught by the tests

While writing the JSON-body test, upstream kept returning
IncompleteBody. Substitution can change body length, but the
gateway was forwarding the client's original Content-Length
header — mismatch. Dropped content-length from the
forwarded-header list
so reqwest recomputes from the actual
payload. Latent bug in the existing intercept pipeline that the
URL-only path didn't trigger.

Stacks on PR #17

Depends on the substitution engine + URL wiring in
refactor/consolidate-onecli. Review PR #17 first; this branch's
diff against main includes those.

Test plan

  • 4 new given/when/then behavioral tests
    (crates/security-proxy/tests/substitution_body_headers.rs) —
    header substitution, JSON body substitution, unsupported-CT
    block, unresolvable-ref block.
  • All use real in-process gateway + wiremock upstream via
    reqwest::Proxy — exercises the full intercept pipeline.
  • Workspace cargo test green (39 security-proxy + 24
    onecli-client + others)
  • Clippy clean, fmt clean

Follow-up

  • Per-destination allowlist (§11.1). Needs a config shape change.
  • Streaming / chunked bodies: buffered substitution doesn't scale
    indefinitely; add a body-size cap if a use case needs it.

🤖 Generated with Claude Code

bglusman and others added 18 commits April 24, 2026 00:17
Per #28 consolidation planning. Both files are dead weight:

- policy.rs: the file itself is tagged deprecated by a prior author
  ("This module is being replaced by clashd sidecar + zeroclawed-policy-plugin.
  For new code, use the clashd HTTP API directly or the plugin hook.
  Fails closed."). No consumers in the workspace.
- bench.rs: placeholder iterations with TODO-style bodies
  ("// Simulate credential lookup and injection
   // This is a placeholder for actual benchmark").
  Not wired to any `cargo bench` target that actually runs.

Subsequent commits will continue #28: merge main.rs (the onecli service
binary) into security-proxy, rename the crate to reflect its
post-consolidation role as a shared library.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Running notes captured while inventorying security-proxy and
onecli-client for the merge. Each finding is non-obvious and changes
what the consolidation PR needs to look like:

1. Dual env-var conventions (`ZEROGATE_KEY_*` vs `<NAME>_API_KEY`) —
   users setting one won't be found by code reading the other.
2. Provider detection by host vs URL-path — both useful; unified
   gateway accepts both.
3. Auth-header scheme inconsistency — onecli-client only emits Bearer
   (plus brave), which breaks anthropic (wants x-api-key). Likely
   hidden because nobody tested anthropic through onecli.
4. Config schema divergence — agents.json richer than onecli's
   convention-inferred names; keep agents.json as canonical.
5. Resolver-vs-cache — onecli resolves on demand, security-proxy
   caches at startup only. Rotation is silently broken today.
6. Hardcoded `vault.enjyn.com` fallback — flagged for cleanup in
   rename pass; fix during consolidation since the file is already
   being touched.
7. Policy enforcement is aspirational — clashd is not called on every
   request; gateway path lacks the policy hook.
8. Substitution is greenfield — not a migration; #28 should land a
   stub module so #29 is a focused body-fill.
9. (meta) LLM-written tests need independent review — recorded here
   alongside task #34 and as a process note for #29/#30/#31 TDD.
10. e2e onecli_proxy.rs tests opt-out when onecli isn't running —
    repoint to security-proxy in #28.

Plus a running punch-list of the concrete changes the consolidation
PR should make.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Security-proxy now shares onecli-client's vault resolver rather than
its own parallel startup-time env scan. Per-request, on cache miss,
it consults env → fnox → vaultwarden and caches the result. This
fixes the silently-broken rotation behaviour documented in
`docs/rfcs/consolidation-findings.md` finding #5: previously, a
rotated key added to env or fnox after security-proxy started was
invisible until restart.

Changes:
- Add `onecli-client = { path = "../onecli-client" }` as a path dep.
- New `CredentialInjector::ensure_cached(provider) -> bool` that
  returns fast when the DashMap already has the provider, else calls
  `onecli_client::vault::get_secret(provider)` and inserts on success.
- Public wrapper `detect_provider_pub(host)` so the proxy hot path can
  determine the provider name without duplicating the pattern table.
- Proxy.rs resolves + caches before calling inject() on each request.
- Two behavior tests covering cache-hit (no resolver call) and
  nothing-resolves (cache stays clean, no `Bearer ""` footgun).
  Written in given/when/then doc form as per today's test-quality
  review discussion.

Left for follow-up commits on this branch:
- Merge onecli-client's `/vault/:secret` + `/proxy/:provider` routes
  into security-proxy (move the handlers, then delete main.rs).
- Reconcile env-var conventions (ZEROGATE_KEY_* vs <NAME>_API_KEY —
  finding #1).
- Per-provider auth header schemes (finding #3 — onecli only does
  Bearer; security-proxy has anthropic's x-api-key path).
- Remove hardcoded `vault.enjyn.com` fallback (finding #6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#6)

Per CLAUDE.md "Hard-coded fallback URLs": env vars for infra endpoints
should be mandatory, not silently defaulted to a specific host. The
prior default leaked a deployment-specific domain to anyone reading
the repo and would silently hit the wrong vault server when users
forgot to set the env.

Also: previously the function only bailed when the token was empty —
if the user set ONECLI_VAULT_TOKEN but forgot ONECLI_VAULT_URL, we'd
try to GET `/api/ciphers` against an empty base, producing a confusing
reqwest error. Now both are required; the bail message names exactly
which one is missing.

Existing tests still pass (24 in onecli-client lib). Note that
`fnox_failure_falls_through_to_vault_error` on feat/fnox-integration
asserts against the old "No ONECLI_VAULT_TOKEN" substring; its
alternative-phrase matcher ("not found") covers the new message, so
that test should still pass after merge. Worth verifying when the
branches reconcile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… audit

The test-quality audit subagent (see `docs/rfcs/test-quality-audit.md`)
surfaced two real security bugs where substring-contains semantics let
carefully-crafted URLs bypass the intended controls:

1. detect_provider — `host.contains("openai.com")` matched
   `api.openai.com.evil.example`, meaning an attacker-registered domain
   could trigger OpenAI-credential injection for their own server.
   Fixed by replacing the contains list with a `(domain, provider)` table
   that matches on DNS-label boundary: host is the domain exactly, OR
   ends with `.<domain>`. Added `detect_provider_rejects_lookalike_suffix_hosts`
   (6 lookalike cases) and `_accepts_real_subdomains` (positive
   companion) using given/when/then doc style.

2. check_bypassed — `url.contains(pattern)` matched
   `https://evil.com/?redirect=localhost`, smuggling a request past
   scanning because the bypass-list string appeared anywhere in the
   URL. Fixed by parsing the URL, extracting the host, and matching
   patterns against the host only; wildcards are now anchored regexes
   over the full host rather than free-floating substrings. Failure
   on URL parse is fail-closed (do not bypass). Added
   `check_bypassed_rejects_host_string_embedded_in_path` with 4
   smuggling cases.

Also from the audit:
- Deleted `scanner_test.rs` — was never wired into `lib.rs` so the 6
  tests inside were dead code that made the coverage look better than
  it was. If the worthwhile pieces (concurrent_scans, large_payload)
  matter, rewrite them inline in scanner.rs.
- Deleted `test_scan_performance_sanity` — <1s threshold is flaky by
  design on shared CI runners and tested nothing about correctness.
  Perf assertions belong in a criterion benchmark.

Also lands: `docs/rfcs/test-quality-audit.md` — the full audit with
KEEP/REWRITE/DELETE triage for 47 tests across onecli-client and
security-proxy, plus missing-coverage gaps and top-6 follow-up priorities.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reen

Per the test-quality audit flagging `integration.rs` as silent-green:
if the gateway wasn't running, the Err arm did `println!` and passed,
so an outage looked identical to a successful block. Now: `.expect(...)`
on the Err arm with a clear message telling the developer to start the
gateway first.

The tests remain `#[ignore]` so CI still skips them; only
`cargo test -- --ignored` exercises this path, and in that mode a
network failure is a test failure rather than a green herring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…icy docs

Two small cleanups driven by the test audit and consolidation findings:

1. Finding #1: `load_from_env` now emits a tracing::warn log whenever a
   `ZEROGATE_KEY_*` env var is encountered, nudging operators to the
   standard `<NAME>_API_KEY` convention that the shared resolver
   understands. The old path still populates the cache so existing
   deployments don't break; a future PR removes it.

2. Audit finding: `ensure_cached`'s doc-comment claimed "first-write-wins"
   but the covered `test_add_overwrites` test proves `add()` is
   last-write-wins. Both are true — `ensure_cached` itself never
   overwrites, `add` always does. The doc is rewritten to state the
   contract precisely: rotation via direct `add()` takes effect
   immediately, rotation via env/fnox/vault does NOT until the cache
   entry is cleared or the service restarts. TTL/invalidation flagged
   as follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oute

Part of #28 consolidation. Moves the `/vault/:secret` direct-read
endpoint from `onecli-client/src/main.rs` (the onecli binary) into
security-proxy as a step toward deleting the onecli binary entirely.

Also extracts the axum router construction into a dedicated
`security_proxy::router::build_app(state)` function. Before this, the
router was inline in main.rs and tests either (a) couldn't exercise
the real wiring or (b) had to duplicate it (with drift risk). Now the
binary and tests call the same builder.

Two behavioral tests in `tests/vault_route.rs`:

1. `vault_route_returns_resolved_secret` — given `T1_API_KEY=hello`,
   a GET /vault/t1 returns 200 with the resolved token. Catches route
   not registered / handler JSON shape / resolver not consulted / the
   `<NAME>_API_KEY` naming convention drifting.
2. `vault_route_returns_404_for_missing_secret_with_bland_message` —
   catches a footgun specifically: the handler must not render
   `e.to_string()` in the error body, because resolver errors can
   name the env vars probed and the vault URL (shape-of-store
   leakage).

Both written in given/when/then doc style; each exercises the real
server via an in-process listener on an ephemeral port.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests

Two improvements landing together because they're linked:

1. **Default config was not self-consistent.** `mitm_enabled: true`
   with `ca_cert_path: None, ca_key_path: None` — MITM requires a CA
   to issue leaf certs, so starting with the Default config would
   fail at startup. Flipped to `mitm_enabled: false` so defaults
   actually work; operators who want MITM set both mitm=true AND
   provide the CA paths. (Surfaced by the new self-consistency test
   below — the test *caught its own motivating bug*.)

2. **Rewrote three tautological tests from the audit:**
   - `test_default_config` pinned every constant from the Default impl
     → replaced with `default_bypass_list_includes_loopback` (a real
     invariant) and `default_config_is_self_consistent_for_mitm`
     (the test that found the bug above).
   - `test_config_serialization` only compared `port` → replaced with
     `config_roundtrips_through_json_preserving_every_field` using
     derived PartialEq. Now catches `skip_serializing_if`/field-rename
     regressions instead of only port regressions.
   - `test_verdict_serialization` used brittle `.contains()` substring
     assertions → replaced with `verdict_roundtrips_preserving_each_variant`
     which structurally compares each variant after roundtrip.

Added `PartialEq, Eq` to `GatewayConfig` to enable structural compare
in the roundtrip test (all field types were already PartialEq).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mock-backed behavioral

Per test-quality audit 2026-04-24: the previous 7 tests in client.rs
could not fail. Specific offenders:
- `test_client_creation_with_valid_config`: `client.is_ok()` on an
  infallible reqwest builder.
- `test_client_creation_with_custom_config`: duplicate of the above
  with trivial config differences.
- `test_client_get_routes_through_proxy`: comment admitted "we can't
  inspect RequestBuilder internals", test asserted nothing.
- `test_client_post_routes_through_proxy`: same.
- `test_request_builder_method_mapping`: `let _ = get_req` for four
  methods — zero assertions.
- `test_client_url_trailing_slash_stripped`: comment described the
  expected behaviour, body did not check it.
- `test_client_debug_format`: weak assertion on Debug output.

Replaced with 6 tokio+wiremock behavioral tests that send real HTTP
and assert on what the proxy saw:

- `get_forwards_target_url_and_agent_id_to_proxy` — verifies the core
  contract: `X-Target-URL` and `X-OneCLI-Agent-ID` headers are set to
  the expected values. `.expect(1)` catches regressions that would
  send zero or multiple requests.
- `post_preserves_method_through_proxy` — same for POST, different
  status code so the response plumbing is also exercised.
- `request_method_is_preserved_for_get_post_put_delete` — mounts a
  mock per verb, confirms each one is received correctly.
- `trailing_slash_on_proxy_url_does_not_double_up_path` — configures
  the proxy URL with a trailing slash, asserts the inbound path at
  the proxy is `/` (single slash), not `//`.
- `debug_format_does_not_expose_common_credential_fields` — keeps the
  positive assertions (url/agent_id visible) AND adds negative
  assertions that reject tokens a future field-addition might leak
  (`api_key`, `secret`, `token`, `password`, `Bearer`). If someone
  later adds a credential-bearing field and `#[derive(Debug)]` picks
  it up, this fails.
- `newly_built_client_can_send_requests` — one consolidated liveness
  test that replaces the two `is_ok()` construction tests, and
  actually proves the client can send a request.

All 6 tests run under `cargo test` (no ignore). wiremock is already a
dev-dep — no Cargo.toml changes needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core of task #29 per docs/rfcs/agent-secret-gateway.md §3. Not yet
wired into the proxy hot path — this commit lands the engine as a
standalone module with a solid behavioral test suite so the wiring
commit (next) is a focused, small change.

## API shape

Two-phase, sync:

- `find_refs(input) -> Result<HashSet<String>, SubstitutionError>` —
  extract the set of unique ref names. Fails on malformed or nested
  refs. Caller uses this to drive async resolution (env → fnox →
  vaultwarden, possibly in parallel).
- `substitute(input, resolved_map) -> Result<Cow<str>, SubstitutionError>` —
  render. Returns `Cow::Borrowed(input)` when no refs are present
  (zero allocation). Errors if any ref isn't in the map.

Sync because async closures bind lifetimes we can't easily satisfy; a
two-phase API is cleaner and lets the caller do whatever async pattern
they prefer (join_all, cache, policy-check per name, etc.).

## Contract pinned by tests

13 tests covering:

- no-refs fast-path returns Borrowed (zero alloc)
- single ref substituted; surrounding text preserved
- multiple refs substituted independently
- unresolvable ref → Unresolvable error (fail-closed: caller MUST
  reject the request rather than forwarding the literal name to the
  upstream)
- nested refs rejected at parse time
- resolver output is NOT re-scanned (single pass, no recursion)
- unterminated ref rejected as Malformed (must not treat
  everything-to-EOF as a secret name)
- empty and invalid-char names rejected as Malformed
- UTF-8 multibyte characters preserved around refs
- find_refs deduplicates
- fast-path activates even on input containing a single `{`

## Next (separate commit)

Wire `substitute` into `proxy.rs::intercept` so outbound URL, headers,
and JSON body are passed through it before forwarding upstream. That
commit will need the per-destination allowlist per §11.1 of the RFC
("substitute ANTHROPIC only toward *.anthropic.com") — it's the
security-critical knob.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 test quality audit (docs/rfcs/test-quality-audit.md) flagged a
CLAUDE.md violation: `crates/zeroclawed/src/auth.rs` and five other
files included what appeared to be real Telegram numeric IDs
(`8465871195` tied to an `owner`-role identity, `15555550002` for a
`user`-role identity) in fixtures, examples, and tests.

CLAUDE.md "Never commit these to this repo" is explicit that real
chat identifiers (Matrix handles, Discord user-ids, Telegram chat ids)
tied to specific users must not appear in this public repo, even in
test code.

Replaced globally:
- `8465871195` → `7000000001`
- `15555550002` → `7000000002`

Files touched:
- crates/zeroclawed/src/auth.rs (test fixtures + several test bodies)
- crates/zeroclawed/src/commands.rs (test fixtures)
- crates/zeroclawed/src/config.rs (test TOML + assert)
- crates/zeroclawed/src/channels/telegram.rs (test fixtures)
- crates/zeroclawed/src/channels/whatsapp.rs (test fixtures)
- crates/zeroclawed/examples/config.toml (example for external users)

All 33 zeroclawed tests still pass. No production resolver code was
touched — only test fixtures and an example — so the running
production service (which reads the real user config from
~/.zeroclawed/config.toml) is unaffected.

Also includes on this branch (pre-existing diff from prior commits):
- Wiring `{{secret:NAME}}` substitution into the proxy intercept
  pipeline. Substitutes URL refs before bypass-check and forwarding;
  fail-closed on any unresolvable ref so names don't leak to upstream.
- Round-2 audit findings appended to the audit doc (400+ lines).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 test quality audit flagged `config/validator.rs` as the
single highest-impact coverage gap: ~290 lines of runtime-error-class
validation (duplicate IDs, dangling agent references, invalid proxy
fields) with a commented-out `mod tests` TODO. This commit closes
the gap with 7 behavioral tests in given/when/then form.

Tests added (all inline, since zeroclawed is a binary-only crate and
has no library surface for integration tests):

- `baseline_minimum_config_validates_clean` — positive baseline so
  future refactors can't silently reject previously-valid input.
- `duplicate_agent_id_is_an_error` — the most common config typo
  (paste-rename where body changes but id doesn't).
- `duplicate_identity_id_is_an_error` — same for identities.
- `malformed_proxy_bind_is_an_error` — catches a class that would
  otherwise fail at startup with an opaque tokio bind error.
- `zero_proxy_timeout_is_an_error` — a zero timeout means hangs
  forever, never intended; must be caught at config-load time.
- `routing_rule_default_to_nonexistent_agent_is_an_error` — the
  "dangling reference" class that silently becomes "agent unavailable"
  at runtime.
- `routing_rule_allowed_list_with_nonexistent_agent_is_an_error` —
  same but for the `allowed_agents` branch (separate code path,
  would not be exercised by the default_agent test alone).

Strategy: each negative test starts from MIN_VALID (a known-good
fixture) and prepends/appends exactly ONE targeted deviation, so any
failure isolates to the invariant under test.

397 existing tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's gitleaks scan tripped on two classes of known-good content
introduced by the #28/#29 commits on this branch:

1. Adversarial tests in `check_bypassed_rejects_host_string_embedded_in_path`
   that INTENTIONALLY contain `user:pass@evil.com` URLs and
   `192.168.1.42`-style private IPs embedded in request paths —
   the whole point of those tests is to prove the scanner rejects
   exactly those shapes. Allowlisting specific values (not the
   generic ranges) so the smuggling-test inputs don't trip the rule
   they exist to test against.

2. `docs/rfcs/test-quality-audit.md` and similar design docs that
   discuss secret patterns at length (e.g. example keys in
   illustrative audit snippets). Adding `docs/rfcs/.*\.md$` to the
   path allowlist — human-authored prose shouldn't be entropy-scanned
   against the same bar as production source/configs. The same
   tradeoff already exists for CLAUDE.md.

Still caught (verified):
- generic 192.168.0.0/16, 10.0.0.0/8, 100.64.0.0/10 ranges
- basic-auth in URLs *outside* the specific user:pass@evil.com literal
- Bearer tokens, URL basic auth, and default gitleaks ruleset

Verified locally:
  gitleaks detect --source . --log-opts="e938d7e2^..HEAD"
  → no leaks found (vs 10+ previously)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 audit flagged Http as the only branch in `is_retryable()`
without test coverage. The existing tests cover Unreachable,
RateLimited, PolicyDenied, CredentialNotFound, ApprovalRequired, and
Config — but not Http, even though `is_retryable` explicitly
includes it.

The test builds a real reqwest error by connecting to 127.0.0.1:1
(nothing is ever listening there), converts via the `From` impl,
then asserts both the variant shape and retry behavior. Kept to one
test because the intent is to guard the contract, not enumerate
every possible reqwest error mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 audit flagged retry.rs as covering only first-try-success and
all-tries-fail — missing exactly the two branches that retry-logic
exists to handle:

1. `retries_until_transient_resolves` — operation fails twice then
   succeeds on attempt 3. Asserts both the return value is
   propagated from the successful attempt AND the call count is 3.
   Catches a regression where the loop short-circuits on first
   success but mishandles the value, or re-attempts after success.

2. `non_retryable_error_aborts_immediately` — operation returns a
   non-retryable PolicyDenied with max_retries=10. Asserts exactly
   one call. Catches a regression where the loop ignores the
   strategy's is_retryable() verdict.

Base delays set to 1ms so the tests complete in milliseconds instead
of the default 100ms × retries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot review surfaced several real issues. Fixing them as a batch
since they cluster into three themes:

**Security — info-leak paths**

- crates/onecli-client/src/vault.rs: actually remove the
  `https://vault.enjyn.com` hardcoded fallback. The prior commit
  added a `.is_empty()` guard but left the fallback in place, so the
  guard was dead code (URL would never be empty). Now both URL and
  token come from env directly with no default; missing either →
  bail with a clear message naming the specific variable. Left a
  "do not re-add" comment pointing at CLAUDE.md so the next
  contributor doesn't reintroduce a "helpful" default.
- crates/security-proxy/src/proxy.rs: substitution-failure 403
  response now returns a bland "Request rejected" body; the detailed
  error text is logged server-side at `warn!` so operators can debug
  but never echoed back to the (possibly untrusted) client. The old
  `Request rejected: {e}` path would disclose the ref name the agent
  used, which env vars were probed, etc. — shape-of-store info.
- crates/security-proxy/src/router.rs: vault_handler downgraded from
  `warn!(error = %e, ...)` to `debug!(secret = %name, ...)`. Same
  threat: the resolver's `e.to_string()` can include vault URL and
  probed env var names; logging it at `warn!` keeps those in ops
  dashboards as a side-channel. Debug-level means the information is
  available when investigating but not shipped to every log
  aggregator.

**Security — regex-metacharacter bypass**

- crates/security-proxy/src/proxy.rs `host_matches_pattern`: the
  prior build was `.replace('.', r"\.").replace('*', ".*")` which
  left regex metacharacters (`?`, `+`, `(`, `[`, `{`, `|`, `^`, `$`)
  active if present in user-supplied bypass patterns. Depending on
  the pattern this either widened matches (allow-too-much) or broke
  compilation (allow-nothing). Rebuilt as a true glob: split on
  `*`, `regex::escape` each literal segment, join with `[^.]*`
  (wildcard doesn't cross DNS-label boundaries). New
  `BypassMatcher` enum keeps the pre-compiled regex cached per
  pattern; a follow-up should precompile the whole list at
  `SecurityProxy::new` time, but the builder is already correct.

**Tests — hermetic + readiness**

- crates/security-proxy/tests/vault_route.rs: replace fixed 50ms
  sleep with a `/health` readiness poll (deadline 2s, 5ms between
  attempts). Fixed sleeps flake on loaded CI. Also updated the
  safety comment on `await_holding_lock` to correctly describe why
  the std-Mutex-across-await pattern is actually safe in these
  tests (multi-thread runtime, but no inner tasks re-acquire the
  lock), and named the follow-up options (`tokio::sync::Mutex` or
  `serial_test` crate) rather than asserting incorrect "single
  threaded" semantics.
- crates/security-proxy/src/credentials.rs
  `ensure_cached_returns_false_when_nothing_resolves`: explicitly
  clear `ONECLI_VAULT_URL/TOKEN` before running, and derive the
  provider name from `std::process::id()` so the test can't collide
  with any real secret that might exist in the dev/CI env.

**Docs — rotation + resolver chain**

- crates/security-proxy/src/credentials.rs: expanded the
  `ensure_cached` doc to be explicit about the limitations (first
  resolve wins; no re-resolve; no TTL; concurrent-callers race is
  benign because the resolver is deterministic). Says plainly that
  rotation at runtime requires either calling `add()` directly or a
  TTL/invalidation mechanism that doesn't exist yet.
- crates/security-proxy/src/router.rs: rewrite the `/vault/:secret`
  doc to reference "the shared resolver" rather than naming
  "env → fnox → vaultwarden". That chain is feature-flagged and
  varies by branch; consumers of the router shouldn't bake in a
  specific backend set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds on the URL-path substitution already in PR #17. Now handles the
two other places an agent can write a `{{secret:NAME}}` ref:

**Headers**
Every non-hop-by-hop header value runs through a new
`resolve_and_substitute()` helper before the upstream request is
built. Header NAMES are never substituted — agents can't usefully
parameterize them and substituting into a key would let a secret
leak as a header name that some middle box logs verbatim.
Fail-closed on any unresolvable ref, same semantics as URL.

**Body** — content-type-gated:
- `application/json`, `application/*+json`,
  `application/x-www-form-urlencoded`, and `text/*` get full
  find-and-substitute.
- Everything else (multipart, binary, unset) takes the defensive
  raw-bytes path: a `memchr`-style scan for `{{secret:`. If found,
  the request is blocked (per RFC §11.8 — an agent must not be
  able to smuggle a ref through an unsupported content-type to
  bypass substitution). If no opener is present, the body passes
  through unchanged.

**Real bug fixed while writing the behavioral tests**
The JSON-body test kept failing with an `IncompleteBody` error at
upstream — substitution can change the body byte length, but the
original `Content-Length` header was being forwarded unchanged.
Fix: strip `content-length` from the forwarded header list so
reqwest recomputes it from the actual payload. This is a latent
bug in the existing intercept pipeline that the URL-only path just
happened not to trigger (URL length changes don't touch body
length), so it's a good thing these tests caught it.

**Tests (4 new, given/when/then)**
- `header_value_with_ref_is_substituted_before_forward` — sets
  env, sends request with `X-Api-Key: {{secret:…}}` through the
  gateway, asserts upstream sees the substituted value.
- `json_body_with_ref_is_substituted_before_forward` — same for
  a POST with a JSON body.
- `ref_in_unsupported_content_type_body_is_blocked` —
  `application/octet-stream` with a ref in the bytes; expect 403,
  upstream never hit (`wiremock.expect(0)`).
- `unresolvable_ref_in_body_blocks_request` — ref in JSON body
  that nothing resolves; expect 403 with error body that does NOT
  include the ref name (same fail-closed discipline as URL path).

All four use a real in-process gateway + wiremock upstream via the
`reqwest::Proxy` client pattern, so they exercise the full intercept
pipeline end-to-end.

Remaining follow-up work:
- Per-destination allowlist (§11.1): "ANTHROPIC_API_KEY may only be
  substituted when forwarding to *.anthropic.com". Needs a config
  shape change; filing as a separate PR.
- Streaming / chunked bodies: the current code buffers the full
  body to substitute; bodies larger than available memory would be
  a problem in theory. In practice we're proxying model-API calls
  with bounded bodies. Add a config cap later if it matters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 24, 2026 16:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the security-proxy / onecli-client consolidation work by wiring {{secret:NAME}} substitution into additional outbound request surfaces (headers + body), adding behavioral tests around the full intercept pipeline, and tightening several security/test-quality gaps discovered during the audit.

Changes:

  • Add header-value and content-type-gated body substitution (with fail-closed behavior) to the security-proxy intercept pipeline.
  • Introduce /vault/:secret route and in-process router construction (build_app) to support end-to-end tests.
  • Add/upgrade tests and docs from the audit: validator coverage in zeroclawed, retry/error/client behavior tests in onecli-client, and remove hardcoded vault URL fallback.

Reviewed changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
docs/rfcs/test-quality-audit.md Adds a comprehensive test-quality audit writeup and prioritized follow-ups.
docs/rfcs/consolidation-findings.md Captures consolidation findings and decisions relevant to resolver/substitution/policy wiring.
crates/zeroclawed/src/config/validator.rs Adds behavioral tests covering key config validation invariants.
crates/zeroclawed/src/config.rs Replaces real-looking Telegram IDs in embedded test TOML + assertions.
crates/zeroclawed/src/commands.rs Updates test fixtures to use placeholder Telegram IDs.
crates/zeroclawed/src/channels/whatsapp.rs Updates test fixtures to use placeholder Telegram IDs.
crates/zeroclawed/src/channels/telegram.rs Updates test fixtures and assertions to use placeholder Telegram IDs.
crates/zeroclawed/src/auth.rs Updates auth tests to use placeholder Telegram IDs.
crates/zeroclawed/examples/config.toml Updates example config to use placeholder Telegram IDs.
crates/security-proxy/tests/vault_route.rs Adds in-process behavioral tests for /vault/:secret.
crates/security-proxy/tests/substitution_body_headers.rs Adds end-to-end tests for header/body substitution and fail-closed behavior.
crates/security-proxy/tests/integration.rs Fixes “silent-green” ignored integration tests by failing on network errors.
crates/security-proxy/src/substitution.rs Introduces the two-phase substitution engine (find_refs + substitute) and unit tests.
crates/security-proxy/src/scanner_test.rs Removes dead/unwired test module file.
crates/security-proxy/src/scanner.rs Removes flaky perf-threshold test.
crates/security-proxy/src/router.rs Extracts router construction and adds /vault/:secret handler.
crates/security-proxy/src/proxy.rs Wires substitution into intercept flow; tightens bypass matching; adds raw scan for unsupported body types.
crates/security-proxy/src/main.rs Switches binary to use router::build_app.
crates/security-proxy/src/lib.rs Exposes router + substitution modules and re-exports build_app.
crates/security-proxy/src/credentials.rs Adds on-demand resolver-backed caching, safer provider detection, and new tests.
crates/security-proxy/src/config.rs Makes defaults self-consistent, improves roundtrip tests, and derives PartialEq/Eq for config.
crates/security-proxy/Cargo.toml Adds onecli-client as a dependency for shared secret resolution.
crates/onecli-client/src/vault.rs Removes hardcoded vault URL fallback and requires URL+token for vaultwarden fallback.
crates/onecli-client/src/retry.rs Adds missing behavioral retry tests (transient-resolves, non-retryable aborts).
crates/onecli-client/src/policy.rs Deletes deprecated policy module stub.
crates/onecli-client/src/error.rs Adds test coverage for Http(_) retryability using a real reqwest error.
crates/onecli-client/src/client.rs Replaces non-asserting tests with wiremock-backed behavioral tests.
crates/onecli-client/src/bench.rs Removes unused placeholder benchmark module.
Cargo.lock Records new dependency edges (security-proxy → onecli-client).
.gitleaks.toml Expands allowlists for RFC docs and specific adversarial fixtures to reduce false positives.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +29 to +33
use security_proxy::config::GatewayConfig;
use security_proxy::proxy::SecurityProxy;

static ENV_MUTEX: Mutex<()> = Mutex::new(());

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests serialize env mutation with a file-local ENV_MUTEX, but env is process-global and other test modules in the workspace also mutate ONECLI_* vars (e.g. substitution_body_headers.rs and credentials.rs). With Rust’s default parallel test runner, this can still race and flake. Consider a single shared global lock (e.g., in a test util module) or using serial_test so all env-mutating tests across the crate serialize on the same mechanism.

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +33
static ENV_MUTEX: Mutex<()> = Mutex::new(());

fn set_env<K: AsRef<std::ffi::OsStr>, V: AsRef<std::ffi::OsStr>>(key: K, value: V) {
unsafe { std::env::set_var(key, value) }
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests serialize env mutation with a file-local ENV_MUTEX, but env is process-global and other test modules in the workspace also mutate ONECLI_* vars (e.g. vault_route.rs and credentials.rs). With Rust’s default parallel test runner, this can still race and flake. Consider a single shared global lock (e.g., in a test util module) or using serial_test so all env-mutating tests across the crate serialize on the same mechanism.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +56
match onecli_client::vault::get_secret(&secret_name).await {
Ok(token) => Json(serde_json::json!({
"status": "ok",
"secret": secret_name,
"token": token,
}))
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/vault/:secret returns the resolved secret token to any caller with no authentication/authorization guard. If this service is reachable by untrusted clients, this endpoint becomes a direct secret exfiltration vector; consider disabling the route by default, restricting it to localhost/admin-only, or requiring an auth token/mTLS before returning the secret value.

Copilot uses AI. Check for mistakes.
Comment on lines +349 to +357
async fn ensure_cached_returns_false_when_nothing_resolves() {
// Safety: env mutation in tests is inherently process-global;
// using unsafe to satisfy Rust 2024's edition semantics. The
// specific variables we touch aren't read by concurrent tests
// in this module.
unsafe {
std::env::remove_var("ONECLI_VAULT_TOKEN");
std::env::remove_var("ONECLI_VAULT_URL");
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test mutates ONECLI_VAULT_URL/ONECLI_VAULT_TOKEN without coordinating with the env-mutation mutex used by other security-proxy tests. Since env is process-global and tests run in parallel by default, this can cause flaky behavior (e.g., vault_route/substitution tests unexpectedly taking the vault fallback path). Consider using the same shared global env lock as the other tests, or switching all env-mutating tests to serial_test.

Copilot uses AI. Check for mistakes.
Comment on lines 154 to 158
// Bypass check
if self.check_bypassed(&target_url) {
info!("Bypassing: {}", target_url);
return Ok(self.forward_upstream(req, &target_url).await);
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bypass returns early before header/body substitution. That means a bypassed request can still forward a literal {{secret:...}} token in the body (and won’t fail-closed on unresolvable refs), which contradicts the substitution contract and can leak ref names to bypassed upstreams. Consider performing header/body substitution (and the raw-scan fail-closed check) before the bypass early-return, or applying the same substitution logic inside forward_upstream.

Suggested change
// Bypass check
if self.check_bypassed(&target_url) {
info!("Bypassing: {}", target_url);
return Ok(self.forward_upstream(req, &target_url).await);
}
// IMPORTANT: do not bypass yet. Bypassed requests must still
// pass through the same header/body substitution and fail-closed
// checks as non-bypassed requests so unresolved `{{secret:...}}`
// refs are never forwarded upstream.

Copilot uses AI. Check for mistakes.
Comment on lines +413 to +421
/// Zero-allocation fast path: if `find_refs` returns an empty set,
/// we return without a resolver round-trip. The cost of substitution
/// is thus proportional to the number of refs, not the size of the
/// input string.
async fn resolve_and_substitute(&self, input: &str) -> Result<String, String> {
let names = crate::substitution::find_refs(input).map_err(|e| e.to_string())?;
if names.is_empty() {
return Ok(input.to_string());
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_and_substitute is documented as a “zero-allocation fast path”, but the empty-ref branch still allocates via input.to_string(). Consider returning a Cow<'_, str> (mirroring substitution::substitute) or otherwise avoid allocating when no refs are present, especially since this helper is called for every header value and sometimes for large bodies.

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +205
for (k, v) in &original_headers {
match self.resolve_and_substitute(v).await {
Ok(new_v) => substituted_headers.push((k.clone(), new_v)),
Err(e) => {
warn!("BLOCKED: header substitution failed: {}", e);
return Ok(blocked_response("Request rejected"));
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Header substitution currently clones every header name (k.clone()) and always allocates a new String for the value even when no refs are present. This adds avoidable overhead on every request. Consider iterating original_headers by value (move) so keys aren’t cloned, and only replacing the value when find_refs finds refs (otherwise keep the original v unchanged).

Suggested change
for (k, v) in &original_headers {
match self.resolve_and_substitute(v).await {
Ok(new_v) => substituted_headers.push((k.clone(), new_v)),
Err(e) => {
warn!("BLOCKED: header substitution failed: {}", e);
return Ok(blocked_response("Request rejected"));
}
for (k, v) in original_headers {
if v.contains("{{") {
match self.resolve_and_substitute(&v).await {
Ok(new_v) => substituted_headers.push((k, new_v)),
Err(e) => {
warn!("BLOCKED: header substitution failed: {}", e);
return Ok(blocked_response("Request rejected"));
}
}
} else {
substituted_headers.push((k, v));

Copilot uses AI. Check for mistakes.
Comment on lines +200 to +205
match self.resolve_and_substitute(v).await {
Ok(new_v) => substituted_headers.push((k.clone(), new_v)),
Err(e) => {
warn!("BLOCKED: header substitution failed: {}", e);
return Ok(blocked_response("Request rejected"));
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substitution failures are logged at warn! with the full error string (which currently includes the secret ref name and may include vault/env probing details). This undermines the “don’t disclose store shape” goal and can leak sensitive ref names into shared logs. Consider logging a redacted message at warn (no ref name / underlying error) and emitting the detailed error only at debug/trace with appropriate log scoping.

Copilot uses AI. Check for mistakes.
Comment on lines +519 to +525
/// Compile a bypass pattern once. In the hot path today this still
/// runs per-request (called from `host_matches_pattern`); a
/// follow-up should precompile the whole list at `SecurityProxy`
/// construction, but doing that cleanly requires changing
/// `GatewayConfig` shape. For now the regex builder itself is
/// correct and escape-safe; perf follows.
fn compile_bypass_pattern(pattern: &str) -> BypassMatcher {
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host_matches_pattern compiles a new regex::Regex for every wildcard bypass pattern on every request. That can become a noticeable CPU cost under load (regex compilation is much more expensive than matching). Consider precompiling bypass matchers once when building SecurityProxy/GatewayConfig (e.g., store Vec<BypassMatcher> alongside the raw strings) and only doing is_match in the hot path.

Copilot uses AI. Check for mistakes.
@bglusman
Copy link
Copy Markdown
Owner Author

Codex integration sweep note: I reviewed the inline comments on this PR. GitHub rejected direct inline replies for these older/outdated review comments with HTTP 422, so I am responding top-level instead: 3138987119, 3138987156, 3138987178, 3138987209, 3138987235, 3138987265, 3138987285, 3138987314, 3138987339.\n\nI did not edit this branch. Items that overlap the secure/fnox/host-agent/digest integration work are addressed in draft PR #38 (codex-integration-code), including stdin-based fnox set, bounded fnox waits, whitespace-safe !secure parsing, identity-aware !secure audit logs, valid-input host-agent properties, real WhatsApp HMAC verification, loopback OneCLI default bind, and race-free digest temp paths. Remaining PR-specific findings stay actionable for this branch owner or a follow-up.

@bglusman
Copy link
Copy Markdown
Owner Author

Triage round (see .claude/pr-triage.md) flagged two LEGIT-FIX items here. Higher-priority one (vault route exfil + 0.0.0.0 default bind) addressed in #39 stacked on this branch. The bypass-before-substitution finding is deferred — the clean fix needs the dest_host parameter added in #22, so it'll land via the integration branch instead of as a standalone.

bglusman added a commit that referenced this pull request Apr 25, 2026
…ning (#44)

Squash-merge of integration/super-combined — 4 weeks of feature work + cross-PR security
fixes + codex agent's hardening, all green CI (14/14 checks).

## Features landing
- **fnox secret-resolver integration** (#15) + FnoxClient subprocess wrapper (#21)
- **Adversarial commit-reviewer + mechanical pre-commit gate** (#18)
- **{{secret:NAME}} substitution engine** in security-proxy URL/headers/body (#19)
- **Per-secret destination allowlist** (#22) — RFC §11.1 attack defense
- **!secure chat commands** (set/list) on Telegram (#20), Matrix (#28), WhatsApp (#31)
- **zeroclawed-mcp** scaffold — agent-facing secret discovery server (#23)
- **install.sh wires MCP** into Claude Code agent configs (#26)
- **zeroclawed-secret-paste** — localhost web UI for one-shot secret input (#34)
- **Bulk paste UI** — .env-style multi-secret onboarding with per-line results
- **LAN-friendly defaults** — bind 0.0.0.0 + RFC 1918 Origin acceptance
- **WhatsApp HMAC verification** (was always-true placeholder before — codex hardening)

## Security fixes folded in
- /vault/:secret bearer auth + 127.0.0.1 default bind (#39)
- URL-embedded secrets honor destination allowlist (#41)
- Paste-flow: bearer URL only at debug, fnox set via stdin not argv (#40)
- Paste-flow: graceful shutdown, exit-on-submit, reject Origin: null (#43)
- Subprocess timeouts + kill_on_drop on FnoxClient
- BrokenPipe-tolerant stdin write (Linux CI surface)
- Header-value log redaction
- OneCLI bound to 127.0.0.1 by default
- Sanitized real API token + Telegram IDs from sample configs (#36)

## Architecture / refactors
- Consolidated onecli binary into security-proxy (#17)
- Hardcoded vault URL removed from onecli-client
- security-proxy resolver wired into hot path
- Extracted build_app router; migrated /vault/:secret route
- !secure parser uses split_whitespace (was splitn), audit-logs invocations

## Test coverage added
- security-proxy substitution engine + body/headers tests
- onecli-client retry + Http(_) variant + adversarial fallthrough suite
- onecli-client client.rs rewritten from tautologies to wiremock-backed
- config/validator coverage (was zero, now 290-line module covered)
- 16 zeroclawed-secret-paste tests including bulk-mode cases

## Docs / RFCs
- agent-secret-gateway holistic architecture
- consolidation-findings (what #28 must address)
- secret-input-web-ui RFC (input-only, new-by-default)
- browser-harness integration spike
- test-quality-audit Round 1+2+3 (host-agent + zeroclawed priority files)

## Codex agent's hardening cherry-picks
- Subprocess timeouts on fnox calls
- map_spawn_error helper
- Validator hardening + atomic-counter digest race fix
- WhatsApp HMAC implementation + tests
- proxy header-value log redaction

CI: all 14 checks green at squash time.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@bglusman
Copy link
Copy Markdown
Owner Author

Subsumed by #44 (squashed to 9ed51fbc on main). All commits from this branch are present in the squash. Closing as redundant rather than merging again.

@bglusman bglusman closed this Apr 25, 2026
bglusman added a commit that referenced this pull request Apr 25, 2026
…ning (#44)

Squash-merge of integration/super-combined — 4 weeks of feature work + cross-PR security
fixes + codex agent's hardening, all green CI (14/14 checks).

## Features landing
- **fnox secret-resolver integration** (#15) + FnoxClient subprocess wrapper (#21)
- **Adversarial commit-reviewer + mechanical pre-commit gate** (#18)
- **{{secret:NAME}} substitution engine** in security-proxy URL/headers/body (#19)
- **Per-secret destination allowlist** (#22) — RFC §11.1 attack defense
- **!secure chat commands** (set/list) on Telegram (#20), Matrix (#28), WhatsApp (#31)
- **zeroclawed-mcp** scaffold — agent-facing secret discovery server (#23)
- **install.sh wires MCP** into Claude Code agent configs (#26)
- **zeroclawed-secret-paste** — localhost web UI for one-shot secret input (#34)
- **Bulk paste UI** — .env-style multi-secret onboarding with per-line results
- **LAN-friendly defaults** — bind 0.0.0.0 + RFC 1918 Origin acceptance
- **WhatsApp HMAC verification** (was always-true placeholder before — codex hardening)

## Security fixes folded in
- /vault/:secret bearer auth + 127.0.0.1 default bind (#39)
- URL-embedded secrets honor destination allowlist (#41)
- Paste-flow: bearer URL only at debug, fnox set via stdin not argv (#40)
- Paste-flow: graceful shutdown, exit-on-submit, reject Origin: null (#43)
- Subprocess timeouts + kill_on_drop on FnoxClient
- BrokenPipe-tolerant stdin write (Linux CI surface)
- Header-value log redaction
- OneCLI bound to 127.0.0.1 by default
- Sanitized real API token + Telegram IDs from sample configs (#36)

## Architecture / refactors
- Consolidated onecli binary into security-proxy (#17)
- Hardcoded vault URL removed from onecli-client
- security-proxy resolver wired into hot path
- Extracted build_app router; migrated /vault/:secret route
- !secure parser uses split_whitespace (was splitn), audit-logs invocations

## Test coverage added
- security-proxy substitution engine + body/headers tests
- onecli-client retry + Http(_) variant + adversarial fallthrough suite
- onecli-client client.rs rewritten from tautologies to wiremock-backed
- config/validator coverage (was zero, now 290-line module covered)
- 16 zeroclawed-secret-paste tests including bulk-mode cases

## Docs / RFCs
- agent-secret-gateway holistic architecture
- consolidation-findings (what #28 must address)
- secret-input-web-ui RFC (input-only, new-by-default)
- browser-harness integration spike
- test-quality-audit Round 1+2+3 (host-agent + zeroclawed priority files)

## Codex agent's hardening cherry-picks
- Subprocess timeouts on fnox calls
- map_spawn_error helper
- Validator hardening + atomic-counter digest race fix
- WhatsApp HMAC implementation + tests
- proxy header-value log redaction

CI: all 14 checks green at squash time.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
bglusman added a commit that referenced this pull request Apr 26, 2026
V1 of `.github/copilot-instructions.md` was ~970 words and read more like
documentation than reviewer guidance. Two issues that hurt signal:

1. **Length** — Copilot's per-repo instructions read window is ~4000
   chars; v1 was over that, so the trailing past-mistake list and
   skip-list were getting truncated.
2. **Format** — long bulleted exposition reads less like a rule and
   more like prose, which Copilot treats as background context rather
   than as constraints to apply.

V2 changes:

- Cuts to ~3500 chars by condensing the prioritization tiers and
  removing per-class HIGH/MED/LOW exposition (the priority-order list
  carries the same info in 5 lines).
- Leads with a single review philosophy line ("if uncertain, do not
  comment"), the highest-leverage rule borrowed from deno's
  copilot-instructions.md.
- Names specific past Copilot noise patterns from this repo's PR
  history (env-mutex/serial_test repeated 8+ times across #19/#22/#23;
  dead-doc-reference 4x across #20/#23/#25) so the "don't repeat
  across PRs" rule has teeth.
- Cross-references a new path-scoped file at
  `.github/instructions/rust.instructions.md` (`applyTo: "**/*.rs"`),
  which carries the Rust-specific review nits (`#[expect]` over
  `#[allow]`, `// SAFETY:` requirement, `Mutex` across `.await`,
  `select!` cancellation safety, `kill_on_drop`, `&str` over
  `&String`, `LazyLock<Regex>` for hot paths, etc.).

Path-scoped instructions are loaded only when a PR touches a file
matching `applyTo`, so Rust-specific rules don't burn the global
4000-char budget on PRs that only touch docs / TOML / shell.
bglusman added a commit that referenced this pull request Apr 27, 2026
…de (#56)

* chore(.github): add copilot-instructions.md to tune PR-review behavior

GitHub Copilot supports per-repo review instructions at
.github/copilot-instructions.md (≤2 pages, applied to every Copilot
PR review automatically). Adds calciforge-specific guidance to
improve signal-to-noise:

- Skip what pre-commit already gates (fmt, clippy, gitleaks)
- Prioritize HIGH-severity classes that bit us in past reviews:
  secret leakage in logs, substitution-boundary correctness,
  unwrap/expect outside tests, missing unsafe around set_var
  (edition 2024), blocking I/O in async, auth bypass paths
- Tell Copilot what's NOT a bug despite looking like one:
  {{secret:NAME}} sentinel syntax, post-history-scrub fake test
  values, FnoxClient subprocess-by-design, clashd/zeroclaw_*
  upstream references, mixed Rust edition (known)
- Past-mistake checklist (6 classes from real findings that
  landed and were caught later — substitution-after-bypass,
  None dest_host, bearer-in-info-log, fnox set argv leak,
  0.0.0.0 default, hardcoded fallback URLs)
- Skip even-if-correct: 'consider adding tests' without
  specifics, rename suggestions vs. functional convention,
  feature-creep proposals

Cross-references AGENTS.md (host-agent coding standards) and
CLAUDE.md (public-repo secret discipline) so Copilot follows
both. 83 lines, well under the documented 2-page cap.

* chore(.github): tighten copilot-instructions + add path-scoped Rust file

V1 of `.github/copilot-instructions.md` was ~970 words and read more like
documentation than reviewer guidance. Two issues that hurt signal:

1. **Length** — Copilot's per-repo instructions read window is ~4000
   chars; v1 was over that, so the trailing past-mistake list and
   skip-list were getting truncated.
2. **Format** — long bulleted exposition reads less like a rule and
   more like prose, which Copilot treats as background context rather
   than as constraints to apply.

V2 changes:

- Cuts to ~3500 chars by condensing the prioritization tiers and
  removing per-class HIGH/MED/LOW exposition (the priority-order list
  carries the same info in 5 lines).
- Leads with a single review philosophy line ("if uncertain, do not
  comment"), the highest-leverage rule borrowed from deno's
  copilot-instructions.md.
- Names specific past Copilot noise patterns from this repo's PR
  history (env-mutex/serial_test repeated 8+ times across #19/#22/#23;
  dead-doc-reference 4x across #20/#23/#25) so the "don't repeat
  across PRs" rule has teeth.
- Cross-references a new path-scoped file at
  `.github/instructions/rust.instructions.md` (`applyTo: "**/*.rs"`),
  which carries the Rust-specific review nits (`#[expect]` over
  `#[allow]`, `// SAFETY:` requirement, `Mutex` across `.await`,
  `select!` cancellation safety, `kill_on_drop`, `&str` over
  `&String`, `LazyLock<Regex>` for hot paths, etc.).

Path-scoped instructions are loaded only when a PR touches a file
matching `applyTo`, so Rust-specific rules don't burn the global
4000-char budget on PRs that only touch docs / TOML / shell.

* chore(.github): restore AGENTS.md + CLAUDE.md cross-refs in copilot instructions

Verified GitHub's copilot-instructions docs do not specify the ~4000-char
read window I'd assumed in the previous commit — that was the older
Copilot Chat feature, not the Copilot code-review one. With no real
length pressure, the AGENTS.md / CLAUDE.md pointers (dropped in v2 to
save chars) are worth restoring. CLAUDE.md's "never commit these" list
is exactly the kind of leakage Copilot should be enforcing on diff.

* docs: split AGENTS.md into workspace-wide root + host-agent crate file

The root `AGENTS.md` was titled "Calciforge Host-Agent" and carried
host-agent-specific build/architecture rules — at the repo root, where
agents (Claude Code, Codex, Copilot cloud agent, OpenClaw) read it as
workspace-wide guidance. The mismatch meant agents working in any
non-host-agent crate were getting irrelevant rules ("ZFS snapshot
delegation", "mTLS CN→Unix user") and missing the actually-shared
ones (substitution-boundary order, sentinel string contract,
public-repo secret discipline pointer).

Restructure:

- Move the existing host-agent content verbatim to
  `crates/host-agent/AGENTS.md` (`git mv` so history is preserved).
- New root `AGENTS.md` covers the whole workspace: crate inventory,
  project vocabulary, mandatory rules every agent must follow
  (CLAUDE.md secret discipline, pre-commit gate, sentinel contract,
  substitution boundary order, no-secret-values-in-logs, fnox stdin
  mode), workspace build/test commands, and pointers into per-area
  files (`crates/host-agent/AGENTS.md`, `docs/rfcs/`,
  `docs/security-gateway.md`, `docs/model-gateway.md`).

Cross-refs `.github/copilot-instructions.md` and
`.github/instructions/rust.instructions.md` so agents that find
AGENTS.md first can pick up the Copilot-specific tuning if relevant.

Pairs with the copilot-instructions tightening earlier on this branch.

* fix(.github): correct gitleaks allowlist description in copilot-instructions

Copilot's review caught a real factual error in v2: the line claimed
specific literals (`+15555550100`, `7000000001`, `eyJ0eXAi...`) were
allowlisted in `.gitleaks.toml`. They aren't — `7000000001` is even
used in non-allowlisted source (`crates/calciforge/src/auth.rs`). The
real allowlist mechanism is path-based (`tests/**/fixtures/`,
`docs/rfcs/*.md`, lockfiles, etc.) plus a small regex list (loopback,
RFC 5737, a few inherited-from-main values).

Replace the misleading "these specific literals are allowlisted" claim
with an accurate description of how the allowlist actually works, so
Copilot doesn't downgrade real findings on the assumption they fall
under a non-existent literal-match exemption.

Pleasingly meta: this is exactly the "verify against the codebase
before commenting" rule from the same file working as intended on the
PR that introduced the file.
@bglusman bglusman deleted the feat/substitution-body-headers branch May 1, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants