docs: test-quality-audit Round 3 (host-agent + zeroclawed priority files) by bglusman · Pull Request #37 · bglusman/calciforge

bglusman · 2026-04-25T03:18:48Z

Subagent-driven Round 3 of the test audit, +334 lines on top of Rounds 1+2 baseline. 21 files audited (all 19 host-agent inline test modules + 2 zeroclawed priority files: config.rs, context.rs).

Top findings worth escalating:

Real-looking 64-hex API token in config.rs SAMPLE_CONFIG — already filed as separate fix (PR security: sanitize real API token + Telegram ID missed by previous passes #36)
adapters/exec.rs has zero tests on the privileged validate/execute decision tree — largest single coverage gap in host-agent
auth/adapter.rs tests fictional types contradicting production policy (test says "Full autonomy never requires approval", production says it CANNOT bypass always_ask) — passing tests describing wrong behavior
approval/mod.rs has only the negative path tested — full happy path / replay rejection / expiration / target-mismatch all uncovered
identity_plugin.rs:203 tests str::starts_with, not the actual guard
error.rs IntoResponse mapping unfenced — both tests are non-panic checks
context.rs is the positive model — recommended reference pattern

Out-of-scope-but-flagged for next round: commands.rs (44 tests, 2158 LOC), install/ssh.rs, install/cli.rs, install/executor.rs, adapters/cli.rs.

Two real-looking secrets the audit discussed are redacted in the doc itself (truncated middle) so the doc doesn't trip gitleaks on the very thing it's flagging.

🤖 Generated with Claude Code

Consolidates all three rounds of the test-quality audit into one file: \`docs/rfcs/test-quality-audit.md\` (906 lines, +334 from Round 2's 572-line baseline). ## Round 3 scope (subagent-driven) Files audited (21 total): **Host-agent (all 19 inline test modules)**: error.rs, audit.rs, metrics.rs, rate_limit.rs, perm_warn.rs, config.rs, auth/{adapter, identity}.rs, adapters/{exec,systemd,registry,pct,git,zfs}.rs, zfs/mod.rs, approval/{signal,identity_plugin,token,mod}.rs. **Zeroclawed priority files**: config.rs (18 tests), context.rs (19 tests). **Files NOT audited (in-scope but past the 25-min time cap)**: \`commands.rs\` (44 tests, 2158 LOC), \`install/ssh.rs\`, \`install/cli.rs\`, \`install/executor.rs\`, \`adapters/cli.rs\`. Documented in the round's scope-coverage summary so the next pass can pick up cleanly. ## Top findings worth escalating 1. **Real-looking 64-hex API token in \`config.rs\` SAMPLE_CONFIG** (lines 807, 923) — \`zc_4f5c220eec86…\`. Round 2 sanitized the Telegram IDs in this file but missed the token. **Filed as separate fix in security/sanitize-config-rs-secrets branch (PR #36).** 2. **\`adapters/exec.rs\` has zero tests on the privileged validate/execute decision tree** — the largest single coverage gap in host-agent. Stdlib-tautology tests only. 3. **\`auth/adapter.rs\` tests fictional types** that contradict production policy (test-local says "Full autonomy never requires approval"; production at \`config.rs:447\` says "Full autonomy CANNOT bypass \`always_ask\`"). Tests are passing while describing wrong behavior. 4. **\`approval/mod.rs\` has only the negative path tested** — the full create→Signal-confirm→consume happy path, token-replay rejection, expiration, target-mismatch, and not-yet-approved branches are all uncovered. Largest authorization-flow gap in the crate. 5. **\`approval/identity_plugin.rs:203\` \`test_relative_path_rejected\`** tests \`str::starts_with\`, not the actual guard. Comment admits it. 6. **\`error.rs\` IntoResponse mapping** is unfenced — both tests are \`let _ = err.into_response()\` non-panic checks. 7. **\`context.rs\` is a positive example** — 19 uniformly behavioral, multi-assertion, edge-case tests. Recommended as the reference pattern for the codebase's other test modules. ## Why a separate branch Lands the docs separately from any code changes so the audit findings ship without waiting on the consolidation refactor (PR #17) or the ongoing FnoxClient/MCP work to merge. PR #17's commit \`b2614c9a\` introduces the file with Rounds 1+2 — when both branches land, the merge is straightforward (this branch's content strictly extends). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Copilot wasn't able to review any files in this pull request.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bglusman · 2026-04-25T04:46:09Z

Codex integration sweep: reviewed this open PR during the cross-PR pass. I did not find inline review comments to answer here, and I did not edit this branch. Any test-quality recommendations that overlap host-agent validator coverage are represented in draft integration PR #38.

bglusman · 2026-04-25T18:38:08Z

Subsumed by #44 (squashed to 9ed51fbc on main). All commits from this branch are present in the squash. Closing as redundant rather than merging again.

Copilot AI review requested due to automatic review settings April 25, 2026 03:18

Copilot AI reviewed Apr 25, 2026

View reviewed changes

This was referenced Apr 25, 2026

test(host-agent): add property coverage for command validators #27

Merged

[codex] Add test quality review #33

Closed

[integration] All non-conflicting code changes — evaluation target #42

Closed

bglusman closed this Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: test-quality-audit Round 3 (host-agent + zeroclawed priority files)#37

docs: test-quality-audit Round 3 (host-agent + zeroclawed priority files)#37
bglusman wants to merge 1 commit intomainfrom
docs/test-audit-round-3

bglusman commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

bglusman commented Apr 25, 2026

Uh oh!

bglusman commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bglusman commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

bglusman commented Apr 25, 2026

Uh oh!

bglusman commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants