Skip to content

Commit 90c61dd

Browse files
Aurelioloclaude
andcommitted
chore: add /codebase-audit skill for deep parallel codebase auditing
Adds a reusable Claude Code skill that orchestrates 20+ specialized agents to audit the entire codebase (or targeted scope) for issues. Key features: - Architecture research phase BEFORE launching audit agents - Parallel agent swarm (each agent searches for one issue type) - Mandatory validation pass (32% false positive rate observed) - Skepticism for "100% clean" results triggers deeper investigation - Work package grouping by code proximity, not severity - Existing issue deduplication (twice: in prompts + before creation) - Configurable scope (full, src/, web/, cli/, docs/, .github/) Methodology refined from a battle-tested audit session that produced 30 validated findings across 11 work packages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e2d86a3 commit 90c61dd

1 file changed

Lines changed: 395 additions & 0 deletions

File tree

Lines changed: 395 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,395 @@
1+
---
2+
description: "Deep codebase audit: launches specialized parallel agents to find issues, validates findings, groups into work packages, and creates GitHub issues"
3+
argument-hint: "<scope: full | src/ | web/ | cli/ | docs/ | .github/> [--report-only] [--skip-issues]"
4+
allowed-tools: ["Agent", "Bash", "Read", "Glob", "Grep", "Edit", "Write", "WebFetch", "WebSearch", "AskUserQuestion", "mcp__github__issue_write", "mcp__github__issue_read", "mcp__github__list_issues", "mcp__github__search_issues"]
5+
---
6+
7+
# /codebase-audit -- Deep Codebase Audit
8+
9+
Launch a swarm of specialized agents to find issues across the entire codebase (or a targeted scope), validate all findings against actual code, group into developer-friendly work packages, and optionally create GitHub issues.
10+
11+
## Key Principles (from battle-tested sessions)
12+
13+
1. **Never present unvalidated findings** -- validation is mandatory before ANY output to the user
14+
2. **Research architecture BEFORE auditing** -- agents that don't understand the system produce false positives
15+
3. **Skepticism is required** -- "100% clean" results are suspicious and trigger deeper investigation
16+
4. **Group by code proximity, NOT severity** -- work packages are what a developer would naturally fix together
17+
5. **No meta/tracking issues** -- every finding is a real issue or part of a real work package
18+
6. **Existing issue dedup happens TWICE** -- once in agent prompts, once after validation
19+
7. **Fix everything valid** -- no deferring, no "out of scope", no "future work"
20+
21+
---
22+
23+
## Phase 0: Parse Arguments & Determine Scope
24+
25+
Parse the user's argument to determine audit scope:
26+
27+
| Argument | Scope | Agent Categories |
28+
|----------|-------|------------------|
29+
| `full` (default) | Entire codebase | All categories |
30+
| `src/` or `src/synthorg/` | Python backend only | Python-focused categories |
31+
| `web/` | Vue dashboard only | Frontend categories |
32+
| `cli/` | Go CLI only | Go categories |
33+
| `docs/` or `site/` | Documentation/site | Docs/content categories |
34+
| `.github/` or `ci` | CI/CD only | CI/workflow categories |
35+
| `--report-only` | Any scope | Skip issue creation, report only |
36+
| `--skip-issues` | Any scope | Same as --report-only |
37+
38+
If no argument given, default to `full`.
39+
40+
---
41+
42+
## Phase 1: Gather Context
43+
44+
**This phase is CRITICAL. Agents without context produce false positives.**
45+
46+
### Step 1a: Fetch existing GitHub issues
47+
48+
```bash
49+
gh issue list --repo OWNER/REPO --state open --limit 200 --json number,title,labels
50+
```
51+
52+
Parse into a compact reference list: `#N: title [labels]`. This list is passed to EVERY audit agent.
53+
54+
### Step 1b: Research project architecture
55+
56+
Read key architectural files to build context that agents need. At minimum:
57+
58+
1. **CLAUDE.md** (already in context) -- project conventions, code standards, testing rules
59+
2. **Observability stack** -- read `src/synthorg/observability/__init__.py`, `_logger.py`, `sinks.py`, `setup.py`, `correlation.py` to understand logging architecture, sink routing, correlation ID system
60+
3. **DI/wiring** -- read `src/synthorg/api/auto_wire.py` and `src/synthorg/api/lifecycle.py` to understand service initialization
61+
4. **Testing setup** -- read `conftest.py` files, `pyproject.toml` test config section
62+
5. **Design spec pointer** -- read `docs/DESIGN_SPEC.md` to know which spec pages exist
63+
64+
Produce a **Architecture Brief** (200-400 words) summarizing:
65+
- Logging: how it works, sink routing rules, correlation IDs
66+
- DI: how services are wired, lifecycle phases
67+
- Testing: markers, parallelism, async mode, coverage requirements
68+
- Key conventions: immutability, error handling, vendor-agnostic naming
69+
70+
This brief is injected into every agent's prompt.
71+
72+
### Step 1c: Identify scope-specific files
73+
74+
If scope is targeted (not `full`), glob the target directory to understand what's there.
75+
76+
---
77+
78+
## Phase 2: Select & Launch Audit Agents
79+
80+
### Agent Roster
81+
82+
Select agents based on scope. Each agent searches for ONE type of issue only.
83+
84+
#### Python Backend Agents (scope includes `src/`)
85+
86+
| Agent | What It Searches For |
87+
|-------|---------------------|
88+
| `missing-logging` | Business logic modules without `get_logger`, error paths that don't log before raising, state transitions without INFO logging, missing DEBUG at decision points |
89+
| `event-constants` | Log calls using raw strings instead of event constants from `observability/events/` |
90+
| `silent-errors` | Bare `except:`, `except Exception: pass`, catch blocks that swallow without logging |
91+
| `test-coverage` | Public modules with no corresponding test file, empty test files |
92+
| `flaky-tests` | Unmocked time, real asyncio.sleep in tests, timing-dependent assertions, skipped tests |
93+
| `wiring-lifecycle` | Incorrectly wired services, missing DI, lifecycle gaps, protocol implementations incomplete |
94+
| `security-gaps` | Hardcoded secrets, missing auth guards, injection vectors, SSRF, XSS |
95+
| `dead-code` | Unreachable functions, unused imports, orphaned modules |
96+
| `todo-fixme` | Unresolved TODOs that should be tracked as issues |
97+
| `spec-drift` | Implementation diverging from design spec behavior |
98+
| `api-consistency` | REST endpoint issues: wrong status codes, missing validation, inconsistent patterns |
99+
| `async-patterns` | Bare create_task, missing await, blocking in async, race conditions |
100+
| `immutability` | Mutable defaults, in-place mutation of frozen models, missing deepcopy |
101+
| `missing-validation` | System boundary inputs without validation (API params, config loading, external data) |
102+
| `type-hints` | Missing return types, bare Any, missing NotBlankStr on identifiers |
103+
| `vendor-names` | Real vendor names used where generic names should be (per CLAUDE.md rules) |
104+
| `observability-gaps` | Sink routing gaps, correlation ID propagation drops, missing event constant modules |
105+
106+
#### Frontend Agents (scope includes `web/`)
107+
108+
| Agent | What It Searches For |
109+
|-------|---------------------|
110+
| `vue-dashboard` | Broken API refs, missing error handling, console.log in prod, TypeScript gaps, a11y |
111+
112+
#### Go CLI Agents (scope includes `cli/`)
113+
114+
| Agent | What It Searches For |
115+
|-------|---------------------|
116+
| `go-cli` | Ignored errors, resource leaks, missing error wrapping, cross-platform issues |
117+
118+
#### Infrastructure Agents (scope includes `.github/` or `docker/`)
119+
120+
| Agent | What It Searches For |
121+
|-------|---------------------|
122+
| `docker-infra` | Dockerfile issues, compose config, port security, healthchecks |
123+
| `ci-workflows` | Missing timeouts, script injection, permissions gaps, silent failures |
124+
125+
#### Documentation Agents (scope includes `docs/` or `site/`)
126+
127+
| Agent | What It Searches For |
128+
|-------|---------------------|
129+
| `docs-consistency` | Broken links, outdated info, wrong commands, inconsistent terminology |
130+
| `landing-site` | SEO gaps, broken links, a11y issues, missing error pages |
131+
132+
#### Cross-Cutting Agents (always included)
133+
134+
| Agent | What It Searches For |
135+
|-------|---------------------|
136+
| `dependency-issues` | Unused deps, missing deps, version conflicts across all package managers |
137+
| `docstring-gaps` | Public classes/functions missing Google-style docstrings |
138+
139+
### Agent Prompt Template
140+
141+
Every agent receives this structure:
142+
143+
```
144+
## Task
145+
You are searching the codebase for ONE specific type of issue: {ISSUE_TYPE}.
146+
147+
## Architecture Context
148+
{ARCHITECTURE_BRIEF from Phase 1b}
149+
150+
## Existing Open Issues (do NOT report these)
151+
{ISSUE_LIST from Phase 1a}
152+
153+
## Scope
154+
Search: {SCOPE_DIRECTORIES}
155+
156+
## Rules
157+
1. For each finding, report: file path, line number, what's wrong, operational impact
158+
2. Cross-reference against the existing issues list -- only report what's NOT already tracked
159+
3. BE SKEPTICAL of your own findings -- verify each one by reading the actual code
160+
4. If you find ZERO issues, state that clearly but also explain what you checked
161+
5. Do NOT flag things that are intentional design decisions (check comments, docstrings)
162+
6. Rate each finding: CONFIRMED (verified in code) or LIKELY (needs validation)
163+
164+
## What to Search For
165+
{CATEGORY-SPECIFIC INSTRUCTIONS}
166+
```
167+
168+
### Launch
169+
170+
Launch ALL selected agents in parallel using the Agent tool with `run_in_background: true`. Give each a descriptive `name` for tracking.
171+
172+
Track agent count and report to user: "Launched N audit agents in parallel. Waiting for results..."
173+
174+
---
175+
176+
## Phase 3: Collect & Deduplicate
177+
178+
As agents complete, collect their findings. Once ALL agents have reported:
179+
180+
### Step 3a: Check for suspicious clean results
181+
182+
If any agent reported zero findings, flag it:
183+
- "Agent `{name}` found zero issues. This may be accurate or the agent may have been too shallow."
184+
- These categories are candidates for Phase 5 (deep dive).
185+
186+
### Step 3b: Merge all findings into a single list
187+
188+
Combine findings from all agents into one flat list with columns:
189+
- Source agent
190+
- File path : line number
191+
- Category
192+
- Description
193+
- Agent's self-assessed confidence (CONFIRMED / LIKELY)
194+
195+
### Step 3c: Deduplicate
196+
197+
- Multiple agents may flag the same line/issue (e.g., security + validation both flag missing input checks)
198+
- Merge duplicates, keep the most detailed description
199+
- Remove findings that match existing open GitHub issues (by file path + description similarity)
200+
201+
---
202+
203+
## Phase 4: Validate Findings
204+
205+
**MANDATORY. Never skip this phase.**
206+
207+
Launch validation agents in parallel. Each validation agent gets a batch of 8-12 findings and is instructed to:
208+
209+
1. Read the actual source file at the reported line number
210+
2. Verify the issue exists as described
211+
3. Check if the "issue" is actually intentional (read comments, docstrings, related code)
212+
4. Check if CI/build/tests handle it in a way the audit agent missed
213+
5. Classify each finding:
214+
- **CONFIRMED** -- verified in code, real issue
215+
- **LIKELY CONFIRMED** -- code suggests the issue but edge case unclear
216+
- **LIKELY FALSE** -- probably not a real issue (explain why)
217+
- **FALSE POSITIVE** -- definitely not an issue (explain why)
218+
6. For intentional patterns (e.g., graceful shutdown error swallowing), mark as "CONFIRMED but INTENTIONAL" -- these are excluded from work packages
219+
220+
### Validation Agent Prompt Template
221+
222+
```
223+
Validate these audit findings by reading the ACTUAL SOURCE CODE.
224+
For each finding, determine: CONFIRMED, LIKELY CONFIRMED, LIKELY FALSE, or FALSE POSITIVE.
225+
226+
For each:
227+
1. Read the file at the reported line number
228+
2. Quote the actual code
229+
3. Check if it's intentional (read surrounding comments, docstrings)
230+
4. Check if CI, tests, or build pipelines handle it
231+
5. Give a clear verdict with evidence
232+
233+
{BATCH OF FINDINGS}
234+
```
235+
236+
### After validation
237+
238+
- Remove all FALSE POSITIVE and LIKELY FALSE findings
239+
- Keep CONFIRMED and LIKELY CONFIRMED
240+
- Mark CONFIRMED-but-INTENTIONAL as excluded (note in report but don't create issues)
241+
- Calculate false positive rate: `removed / total`
242+
- Report: "Validated N findings. Removed M false positives (X%). N remaining confirmed findings."
243+
244+
---
245+
246+
## Phase 5: Deep Dive on Suspicious Clean Results
247+
248+
For each audit category that found ZERO issues in Phase 2:
249+
250+
1. **Research the relevant architecture first** -- read the actual implementation files to understand how the system works
251+
2. **Craft a targeted, informed prompt** -- include specific architectural details (e.g., "the observability stack uses structlog with 8 sinks routed by logger name prefix via _SINK_ROUTING in sinks.py")
252+
3. **Launch a second agent** with the enriched prompt and explicit instructions: "The first audit found nothing. Dig deeper. Check specific functions, look for subtle gaps, verify edge cases."
253+
4. **Validate any new findings** (same as Phase 4)
254+
5. Add validated findings to the main list
255+
256+
Skip this phase if the user passed `--quick` or if the zero-finding categories are genuinely well-covered (e.g., dependencies audit finding nothing is believable).
257+
258+
---
259+
260+
## Phase 6: Present Validated Findings
261+
262+
Present the validated, deduplicated findings to the user. Format:
263+
264+
### Summary Table
265+
266+
```
267+
| # | Finding | File:Line | Category | Verdict |
268+
|---|---------|-----------|----------|---------|
269+
| 1 | Description | path:123 | category | CONFIRMED |
270+
| ... | ... | ... | ... | ... |
271+
```
272+
273+
### Statistics
274+
275+
- Total findings: N
276+
- False positives removed: M (X%)
277+
- Confirmed: N1, Likely confirmed: N2
278+
- Intentional (excluded): N3
279+
- Categories with zero findings: list
280+
281+
### User Gate
282+
283+
Ask the user:
284+
1. **"Proceed to group into work packages and create issues" (Recommended)**
285+
2. "Show me the full detail for each finding first"
286+
3. "Export as markdown report only (no issues)"
287+
288+
---
289+
290+
## Phase 7: Group into Work Packages
291+
292+
**Group by code proximity, NOT by severity.**
293+
294+
### Grouping Rules
295+
296+
1. **Same directory/module** -- findings touching the same `src/synthorg/<module>/` go together
297+
2. **Same file** -- multiple findings in one file always go in the same package
298+
3. **Dependency chain** -- if fixing A requires fixing B first, bundle them
299+
4. **Same developer context** -- what would a developer naturally fix in one sitting?
300+
5. **Target medium scope** -- each package should be a meaningful PR, not too small (1 finding) or too large (15+ findings)
301+
6. **Never group by severity** -- a HIGH and LOW in the same file go together; a HIGH and HIGH in different modules do NOT
302+
303+
### Common Groupings
304+
305+
These patterns recur across audits:
306+
- **API controller sweep** -- validation, response patterns, auth hardening (all in `api/controllers/`)
307+
- **Observability fixes** -- sink routing, correlation, event constants (all in `observability/`)
308+
- **Test quality** -- flaky fixes + missing coverage (all in `tests/`)
309+
- **CI hardening** -- timeouts, permissions, script safety (all in `.github/`)
310+
- **Documentation** -- content fixes across `docs/` and `site/`
311+
- **Language-specific** -- Go fixes together, Vue fixes together
312+
313+
### Standalone features
314+
315+
If a finding is a "feature not yet implemented" (spec drift with TODO/stub), it can be its own issue if medium+ scope. Do NOT create meta/tracking issues -- each issue must be implementable on its own.
316+
317+
### Present to User
318+
319+
Show the proposed work packages:
320+
321+
```
322+
## Proposed Work Packages (N total)
323+
324+
### WP1: Name
325+
| # | Finding |
326+
|---|---------|
327+
| 1 | ... |
328+
| 2 | ... |
329+
**Rationale:** Why these go together.
330+
331+
### WP2: Name
332+
...
333+
```
334+
335+
Ask: "Create issues for all N work packages? Or adjust groupings first?"
336+
337+
---
338+
339+
## Phase 8: Final Issue Dedup & Creation
340+
341+
### Step 8a: Final dedup against existing issues
342+
343+
Before creating, do one final check:
344+
345+
```bash
346+
gh issue list --repo OWNER/REPO --state open --limit 200 --json number,title,labels
347+
```
348+
349+
For each work package, search for title/description overlap with existing issues. If a finding is already covered by an existing issue, either:
350+
- Remove it from the work package
351+
- Note "extends #NNN" in the new issue body
352+
353+
### Step 8b: Create issues
354+
355+
For each work package, create a GitHub issue with:
356+
357+
- **Title**: `<type>: <concise description>` (matching commit convention: fix, feat, chore, docs, test)
358+
- **Body**:
359+
- `## Summary` -- 1-2 sentences
360+
- `## Findings` -- table of findings with file:line, description
361+
- `## Files to Modify` -- list of files that need changes
362+
- Design spec references if applicable
363+
- **Labels**: appropriate type/scope/spec labels
364+
365+
Use the `mcp__github__issue_write` tool or `gh issue create` via Bash.
366+
367+
**IMPORTANT**: Never use em-dashes or non-ASCII punctuation in issue bodies (project convention).
368+
369+
### Step 8c: Report
370+
371+
Present the complete list of created issues:
372+
373+
```
374+
| WP | Issue | Title |
375+
|----|-------|-------|
376+
| 1 | #NNN | ... |
377+
| ... | ... | ... |
378+
```
379+
380+
---
381+
382+
## Rules
383+
384+
1. **NEVER present unvalidated findings to the user.** Validation (Phase 4) is mandatory.
385+
2. **ALWAYS research architecture before auditing.** Phase 1b is not optional.
386+
3. **Be skeptical of clean results.** Zero findings triggers Phase 5 deep dive.
387+
4. **Group by code proximity, NEVER by severity.** What files does a developer touch together?
388+
5. **No meta/tracking issues.** Every issue must be directly implementable.
389+
6. **Dedup twice.** Once in agent prompts (Phase 2), once before issue creation (Phase 8a).
390+
7. **All agents run in parallel.** Never launch agents sequentially when they're independent.
391+
8. **Agent prompts include architecture context.** Never launch a "blind" agent.
392+
9. **Intentional patterns are not bugs.** Graceful shutdown error swallowing, defensive cleanup, etc. are valid patterns -- exclude from issues but note in report.
393+
10. **Respect project conventions.** Read CLAUDE.md, use correct commands (`uv run python -m pytest`, not `uv run pytest`), no vendor names, etc.
394+
11. **Default to creating issues.** Unless user passes `--report-only`, the skill creates issues.
395+
12. **Never push code.** This skill audits and creates issues -- it does not fix code.

0 commit comments

Comments
 (0)