|
| 1 | +--- |
| 2 | +description: "Deep codebase audit: launches specialized parallel agents to find issues, validates findings, groups into work packages, and creates GitHub issues" |
| 3 | +argument-hint: "<scope: full | src/ | web/ | cli/ | docs/ | .github/> [--report-only] [--skip-issues]" |
| 4 | +allowed-tools: ["Agent", "Bash", "Read", "Glob", "Grep", "Edit", "Write", "WebFetch", "WebSearch", "AskUserQuestion", "mcp__github__issue_write", "mcp__github__issue_read", "mcp__github__list_issues", "mcp__github__search_issues"] |
| 5 | +--- |
| 6 | + |
| 7 | +# /codebase-audit -- Deep Codebase Audit |
| 8 | + |
| 9 | +Launch a swarm of specialized agents to find issues across the entire codebase (or a targeted scope), validate all findings against actual code, group into developer-friendly work packages, and optionally create GitHub issues. |
| 10 | + |
| 11 | +## Key Principles (from battle-tested sessions) |
| 12 | + |
| 13 | +1. **Never present unvalidated findings** -- validation is mandatory before ANY output to the user |
| 14 | +2. **Research architecture BEFORE auditing** -- agents that don't understand the system produce false positives |
| 15 | +3. **Skepticism is required** -- "100% clean" results are suspicious and trigger deeper investigation |
| 16 | +4. **Group by code proximity, NOT severity** -- work packages are what a developer would naturally fix together |
| 17 | +5. **No meta/tracking issues** -- every finding is a real issue or part of a real work package |
| 18 | +6. **Existing issue dedup happens TWICE** -- once in agent prompts, once after validation |
| 19 | +7. **Fix everything valid** -- no deferring, no "out of scope", no "future work" |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## Phase 0: Parse Arguments & Determine Scope |
| 24 | + |
| 25 | +Parse the user's argument to determine audit scope: |
| 26 | + |
| 27 | +| Argument | Scope | Agent Categories | |
| 28 | +|----------|-------|------------------| |
| 29 | +| `full` (default) | Entire codebase | All categories | |
| 30 | +| `src/` or `src/synthorg/` | Python backend only | Python-focused categories | |
| 31 | +| `web/` | Vue dashboard only | Frontend categories | |
| 32 | +| `cli/` | Go CLI only | Go categories | |
| 33 | +| `docs/` or `site/` | Documentation/site | Docs/content categories | |
| 34 | +| `.github/` or `ci` | CI/CD only | CI/workflow categories | |
| 35 | +| `--report-only` | Any scope | Skip issue creation, report only | |
| 36 | +| `--skip-issues` | Any scope | Same as --report-only | |
| 37 | + |
| 38 | +If no argument given, default to `full`. |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## Phase 1: Gather Context |
| 43 | + |
| 44 | +**This phase is CRITICAL. Agents without context produce false positives.** |
| 45 | + |
| 46 | +### Step 1a: Fetch existing GitHub issues |
| 47 | + |
| 48 | +```bash |
| 49 | +gh issue list --repo OWNER/REPO --state open --limit 200 --json number,title,labels |
| 50 | +``` |
| 51 | + |
| 52 | +Parse into a compact reference list: `#N: title [labels]`. This list is passed to EVERY audit agent. |
| 53 | + |
| 54 | +### Step 1b: Research project architecture |
| 55 | + |
| 56 | +Read key architectural files to build context that agents need. At minimum: |
| 57 | + |
| 58 | +1. **CLAUDE.md** (already in context) -- project conventions, code standards, testing rules |
| 59 | +2. **Observability stack** -- read `src/synthorg/observability/__init__.py`, `_logger.py`, `sinks.py`, `setup.py`, `correlation.py` to understand logging architecture, sink routing, correlation ID system |
| 60 | +3. **DI/wiring** -- read `src/synthorg/api/auto_wire.py` and `src/synthorg/api/lifecycle.py` to understand service initialization |
| 61 | +4. **Testing setup** -- read `conftest.py` files, `pyproject.toml` test config section |
| 62 | +5. **Design spec pointer** -- read `docs/DESIGN_SPEC.md` to know which spec pages exist |
| 63 | + |
| 64 | +Produce a **Architecture Brief** (200-400 words) summarizing: |
| 65 | +- Logging: how it works, sink routing rules, correlation IDs |
| 66 | +- DI: how services are wired, lifecycle phases |
| 67 | +- Testing: markers, parallelism, async mode, coverage requirements |
| 68 | +- Key conventions: immutability, error handling, vendor-agnostic naming |
| 69 | + |
| 70 | +This brief is injected into every agent's prompt. |
| 71 | + |
| 72 | +### Step 1c: Identify scope-specific files |
| 73 | + |
| 74 | +If scope is targeted (not `full`), glob the target directory to understand what's there. |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +## Phase 2: Select & Launch Audit Agents |
| 79 | + |
| 80 | +### Agent Roster |
| 81 | + |
| 82 | +Select agents based on scope. Each agent searches for ONE type of issue only. |
| 83 | + |
| 84 | +#### Python Backend Agents (scope includes `src/`) |
| 85 | + |
| 86 | +| Agent | What It Searches For | |
| 87 | +|-------|---------------------| |
| 88 | +| `missing-logging` | Business logic modules without `get_logger`, error paths that don't log before raising, state transitions without INFO logging, missing DEBUG at decision points | |
| 89 | +| `event-constants` | Log calls using raw strings instead of event constants from `observability/events/` | |
| 90 | +| `silent-errors` | Bare `except:`, `except Exception: pass`, catch blocks that swallow without logging | |
| 91 | +| `test-coverage` | Public modules with no corresponding test file, empty test files | |
| 92 | +| `flaky-tests` | Unmocked time, real asyncio.sleep in tests, timing-dependent assertions, skipped tests | |
| 93 | +| `wiring-lifecycle` | Incorrectly wired services, missing DI, lifecycle gaps, protocol implementations incomplete | |
| 94 | +| `security-gaps` | Hardcoded secrets, missing auth guards, injection vectors, SSRF, XSS | |
| 95 | +| `dead-code` | Unreachable functions, unused imports, orphaned modules | |
| 96 | +| `todo-fixme` | Unresolved TODOs that should be tracked as issues | |
| 97 | +| `spec-drift` | Implementation diverging from design spec behavior | |
| 98 | +| `api-consistency` | REST endpoint issues: wrong status codes, missing validation, inconsistent patterns | |
| 99 | +| `async-patterns` | Bare create_task, missing await, blocking in async, race conditions | |
| 100 | +| `immutability` | Mutable defaults, in-place mutation of frozen models, missing deepcopy | |
| 101 | +| `missing-validation` | System boundary inputs without validation (API params, config loading, external data) | |
| 102 | +| `type-hints` | Missing return types, bare Any, missing NotBlankStr on identifiers | |
| 103 | +| `vendor-names` | Real vendor names used where generic names should be (per CLAUDE.md rules) | |
| 104 | +| `observability-gaps` | Sink routing gaps, correlation ID propagation drops, missing event constant modules | |
| 105 | + |
| 106 | +#### Frontend Agents (scope includes `web/`) |
| 107 | + |
| 108 | +| Agent | What It Searches For | |
| 109 | +|-------|---------------------| |
| 110 | +| `vue-dashboard` | Broken API refs, missing error handling, console.log in prod, TypeScript gaps, a11y | |
| 111 | + |
| 112 | +#### Go CLI Agents (scope includes `cli/`) |
| 113 | + |
| 114 | +| Agent | What It Searches For | |
| 115 | +|-------|---------------------| |
| 116 | +| `go-cli` | Ignored errors, resource leaks, missing error wrapping, cross-platform issues | |
| 117 | + |
| 118 | +#### Infrastructure Agents (scope includes `.github/` or `docker/`) |
| 119 | + |
| 120 | +| Agent | What It Searches For | |
| 121 | +|-------|---------------------| |
| 122 | +| `docker-infra` | Dockerfile issues, compose config, port security, healthchecks | |
| 123 | +| `ci-workflows` | Missing timeouts, script injection, permissions gaps, silent failures | |
| 124 | + |
| 125 | +#### Documentation Agents (scope includes `docs/` or `site/`) |
| 126 | + |
| 127 | +| Agent | What It Searches For | |
| 128 | +|-------|---------------------| |
| 129 | +| `docs-consistency` | Broken links, outdated info, wrong commands, inconsistent terminology | |
| 130 | +| `landing-site` | SEO gaps, broken links, a11y issues, missing error pages | |
| 131 | + |
| 132 | +#### Cross-Cutting Agents (always included) |
| 133 | + |
| 134 | +| Agent | What It Searches For | |
| 135 | +|-------|---------------------| |
| 136 | +| `dependency-issues` | Unused deps, missing deps, version conflicts across all package managers | |
| 137 | +| `docstring-gaps` | Public classes/functions missing Google-style docstrings | |
| 138 | + |
| 139 | +### Agent Prompt Template |
| 140 | + |
| 141 | +Every agent receives this structure: |
| 142 | + |
| 143 | +``` |
| 144 | +## Task |
| 145 | +You are searching the codebase for ONE specific type of issue: {ISSUE_TYPE}. |
| 146 | +
|
| 147 | +## Architecture Context |
| 148 | +{ARCHITECTURE_BRIEF from Phase 1b} |
| 149 | +
|
| 150 | +## Existing Open Issues (do NOT report these) |
| 151 | +{ISSUE_LIST from Phase 1a} |
| 152 | +
|
| 153 | +## Scope |
| 154 | +Search: {SCOPE_DIRECTORIES} |
| 155 | +
|
| 156 | +## Rules |
| 157 | +1. For each finding, report: file path, line number, what's wrong, operational impact |
| 158 | +2. Cross-reference against the existing issues list -- only report what's NOT already tracked |
| 159 | +3. BE SKEPTICAL of your own findings -- verify each one by reading the actual code |
| 160 | +4. If you find ZERO issues, state that clearly but also explain what you checked |
| 161 | +5. Do NOT flag things that are intentional design decisions (check comments, docstrings) |
| 162 | +6. Rate each finding: CONFIRMED (verified in code) or LIKELY (needs validation) |
| 163 | +
|
| 164 | +## What to Search For |
| 165 | +{CATEGORY-SPECIFIC INSTRUCTIONS} |
| 166 | +``` |
| 167 | + |
| 168 | +### Launch |
| 169 | + |
| 170 | +Launch ALL selected agents in parallel using the Agent tool with `run_in_background: true`. Give each a descriptive `name` for tracking. |
| 171 | + |
| 172 | +Track agent count and report to user: "Launched N audit agents in parallel. Waiting for results..." |
| 173 | + |
| 174 | +--- |
| 175 | + |
| 176 | +## Phase 3: Collect & Deduplicate |
| 177 | + |
| 178 | +As agents complete, collect their findings. Once ALL agents have reported: |
| 179 | + |
| 180 | +### Step 3a: Check for suspicious clean results |
| 181 | + |
| 182 | +If any agent reported zero findings, flag it: |
| 183 | +- "Agent `{name}` found zero issues. This may be accurate or the agent may have been too shallow." |
| 184 | +- These categories are candidates for Phase 5 (deep dive). |
| 185 | + |
| 186 | +### Step 3b: Merge all findings into a single list |
| 187 | + |
| 188 | +Combine findings from all agents into one flat list with columns: |
| 189 | +- Source agent |
| 190 | +- File path : line number |
| 191 | +- Category |
| 192 | +- Description |
| 193 | +- Agent's self-assessed confidence (CONFIRMED / LIKELY) |
| 194 | + |
| 195 | +### Step 3c: Deduplicate |
| 196 | + |
| 197 | +- Multiple agents may flag the same line/issue (e.g., security + validation both flag missing input checks) |
| 198 | +- Merge duplicates, keep the most detailed description |
| 199 | +- Remove findings that match existing open GitHub issues (by file path + description similarity) |
| 200 | + |
| 201 | +--- |
| 202 | + |
| 203 | +## Phase 4: Validate Findings |
| 204 | + |
| 205 | +**MANDATORY. Never skip this phase.** |
| 206 | + |
| 207 | +Launch validation agents in parallel. Each validation agent gets a batch of 8-12 findings and is instructed to: |
| 208 | + |
| 209 | +1. Read the actual source file at the reported line number |
| 210 | +2. Verify the issue exists as described |
| 211 | +3. Check if the "issue" is actually intentional (read comments, docstrings, related code) |
| 212 | +4. Check if CI/build/tests handle it in a way the audit agent missed |
| 213 | +5. Classify each finding: |
| 214 | + - **CONFIRMED** -- verified in code, real issue |
| 215 | + - **LIKELY CONFIRMED** -- code suggests the issue but edge case unclear |
| 216 | + - **LIKELY FALSE** -- probably not a real issue (explain why) |
| 217 | + - **FALSE POSITIVE** -- definitely not an issue (explain why) |
| 218 | +6. For intentional patterns (e.g., graceful shutdown error swallowing), mark as "CONFIRMED but INTENTIONAL" -- these are excluded from work packages |
| 219 | + |
| 220 | +### Validation Agent Prompt Template |
| 221 | + |
| 222 | +``` |
| 223 | +Validate these audit findings by reading the ACTUAL SOURCE CODE. |
| 224 | +For each finding, determine: CONFIRMED, LIKELY CONFIRMED, LIKELY FALSE, or FALSE POSITIVE. |
| 225 | +
|
| 226 | +For each: |
| 227 | +1. Read the file at the reported line number |
| 228 | +2. Quote the actual code |
| 229 | +3. Check if it's intentional (read surrounding comments, docstrings) |
| 230 | +4. Check if CI, tests, or build pipelines handle it |
| 231 | +5. Give a clear verdict with evidence |
| 232 | +
|
| 233 | +{BATCH OF FINDINGS} |
| 234 | +``` |
| 235 | + |
| 236 | +### After validation |
| 237 | + |
| 238 | +- Remove all FALSE POSITIVE and LIKELY FALSE findings |
| 239 | +- Keep CONFIRMED and LIKELY CONFIRMED |
| 240 | +- Mark CONFIRMED-but-INTENTIONAL as excluded (note in report but don't create issues) |
| 241 | +- Calculate false positive rate: `removed / total` |
| 242 | +- Report: "Validated N findings. Removed M false positives (X%). N remaining confirmed findings." |
| 243 | + |
| 244 | +--- |
| 245 | + |
| 246 | +## Phase 5: Deep Dive on Suspicious Clean Results |
| 247 | + |
| 248 | +For each audit category that found ZERO issues in Phase 2: |
| 249 | + |
| 250 | +1. **Research the relevant architecture first** -- read the actual implementation files to understand how the system works |
| 251 | +2. **Craft a targeted, informed prompt** -- include specific architectural details (e.g., "the observability stack uses structlog with 8 sinks routed by logger name prefix via _SINK_ROUTING in sinks.py") |
| 252 | +3. **Launch a second agent** with the enriched prompt and explicit instructions: "The first audit found nothing. Dig deeper. Check specific functions, look for subtle gaps, verify edge cases." |
| 253 | +4. **Validate any new findings** (same as Phase 4) |
| 254 | +5. Add validated findings to the main list |
| 255 | + |
| 256 | +Skip this phase if the user passed `--quick` or if the zero-finding categories are genuinely well-covered (e.g., dependencies audit finding nothing is believable). |
| 257 | + |
| 258 | +--- |
| 259 | + |
| 260 | +## Phase 6: Present Validated Findings |
| 261 | + |
| 262 | +Present the validated, deduplicated findings to the user. Format: |
| 263 | + |
| 264 | +### Summary Table |
| 265 | + |
| 266 | +``` |
| 267 | +| # | Finding | File:Line | Category | Verdict | |
| 268 | +|---|---------|-----------|----------|---------| |
| 269 | +| 1 | Description | path:123 | category | CONFIRMED | |
| 270 | +| ... | ... | ... | ... | ... | |
| 271 | +``` |
| 272 | + |
| 273 | +### Statistics |
| 274 | + |
| 275 | +- Total findings: N |
| 276 | +- False positives removed: M (X%) |
| 277 | +- Confirmed: N1, Likely confirmed: N2 |
| 278 | +- Intentional (excluded): N3 |
| 279 | +- Categories with zero findings: list |
| 280 | + |
| 281 | +### User Gate |
| 282 | + |
| 283 | +Ask the user: |
| 284 | +1. **"Proceed to group into work packages and create issues" (Recommended)** |
| 285 | +2. "Show me the full detail for each finding first" |
| 286 | +3. "Export as markdown report only (no issues)" |
| 287 | + |
| 288 | +--- |
| 289 | + |
| 290 | +## Phase 7: Group into Work Packages |
| 291 | + |
| 292 | +**Group by code proximity, NOT by severity.** |
| 293 | + |
| 294 | +### Grouping Rules |
| 295 | + |
| 296 | +1. **Same directory/module** -- findings touching the same `src/synthorg/<module>/` go together |
| 297 | +2. **Same file** -- multiple findings in one file always go in the same package |
| 298 | +3. **Dependency chain** -- if fixing A requires fixing B first, bundle them |
| 299 | +4. **Same developer context** -- what would a developer naturally fix in one sitting? |
| 300 | +5. **Target medium scope** -- each package should be a meaningful PR, not too small (1 finding) or too large (15+ findings) |
| 301 | +6. **Never group by severity** -- a HIGH and LOW in the same file go together; a HIGH and HIGH in different modules do NOT |
| 302 | + |
| 303 | +### Common Groupings |
| 304 | + |
| 305 | +These patterns recur across audits: |
| 306 | +- **API controller sweep** -- validation, response patterns, auth hardening (all in `api/controllers/`) |
| 307 | +- **Observability fixes** -- sink routing, correlation, event constants (all in `observability/`) |
| 308 | +- **Test quality** -- flaky fixes + missing coverage (all in `tests/`) |
| 309 | +- **CI hardening** -- timeouts, permissions, script safety (all in `.github/`) |
| 310 | +- **Documentation** -- content fixes across `docs/` and `site/` |
| 311 | +- **Language-specific** -- Go fixes together, Vue fixes together |
| 312 | + |
| 313 | +### Standalone features |
| 314 | + |
| 315 | +If a finding is a "feature not yet implemented" (spec drift with TODO/stub), it can be its own issue if medium+ scope. Do NOT create meta/tracking issues -- each issue must be implementable on its own. |
| 316 | + |
| 317 | +### Present to User |
| 318 | + |
| 319 | +Show the proposed work packages: |
| 320 | + |
| 321 | +``` |
| 322 | +## Proposed Work Packages (N total) |
| 323 | +
|
| 324 | +### WP1: Name |
| 325 | +| # | Finding | |
| 326 | +|---|---------| |
| 327 | +| 1 | ... | |
| 328 | +| 2 | ... | |
| 329 | +**Rationale:** Why these go together. |
| 330 | +
|
| 331 | +### WP2: Name |
| 332 | +... |
| 333 | +``` |
| 334 | + |
| 335 | +Ask: "Create issues for all N work packages? Or adjust groupings first?" |
| 336 | + |
| 337 | +--- |
| 338 | + |
| 339 | +## Phase 8: Final Issue Dedup & Creation |
| 340 | + |
| 341 | +### Step 8a: Final dedup against existing issues |
| 342 | + |
| 343 | +Before creating, do one final check: |
| 344 | + |
| 345 | +```bash |
| 346 | +gh issue list --repo OWNER/REPO --state open --limit 200 --json number,title,labels |
| 347 | +``` |
| 348 | + |
| 349 | +For each work package, search for title/description overlap with existing issues. If a finding is already covered by an existing issue, either: |
| 350 | +- Remove it from the work package |
| 351 | +- Note "extends #NNN" in the new issue body |
| 352 | + |
| 353 | +### Step 8b: Create issues |
| 354 | + |
| 355 | +For each work package, create a GitHub issue with: |
| 356 | + |
| 357 | +- **Title**: `<type>: <concise description>` (matching commit convention: fix, feat, chore, docs, test) |
| 358 | +- **Body**: |
| 359 | + - `## Summary` -- 1-2 sentences |
| 360 | + - `## Findings` -- table of findings with file:line, description |
| 361 | + - `## Files to Modify` -- list of files that need changes |
| 362 | + - Design spec references if applicable |
| 363 | +- **Labels**: appropriate type/scope/spec labels |
| 364 | + |
| 365 | +Use the `mcp__github__issue_write` tool or `gh issue create` via Bash. |
| 366 | + |
| 367 | +**IMPORTANT**: Never use em-dashes or non-ASCII punctuation in issue bodies (project convention). |
| 368 | + |
| 369 | +### Step 8c: Report |
| 370 | + |
| 371 | +Present the complete list of created issues: |
| 372 | + |
| 373 | +``` |
| 374 | +| WP | Issue | Title | |
| 375 | +|----|-------|-------| |
| 376 | +| 1 | #NNN | ... | |
| 377 | +| ... | ... | ... | |
| 378 | +``` |
| 379 | + |
| 380 | +--- |
| 381 | + |
| 382 | +## Rules |
| 383 | + |
| 384 | +1. **NEVER present unvalidated findings to the user.** Validation (Phase 4) is mandatory. |
| 385 | +2. **ALWAYS research architecture before auditing.** Phase 1b is not optional. |
| 386 | +3. **Be skeptical of clean results.** Zero findings triggers Phase 5 deep dive. |
| 387 | +4. **Group by code proximity, NEVER by severity.** What files does a developer touch together? |
| 388 | +5. **No meta/tracking issues.** Every issue must be directly implementable. |
| 389 | +6. **Dedup twice.** Once in agent prompts (Phase 2), once before issue creation (Phase 8a). |
| 390 | +7. **All agents run in parallel.** Never launch agents sequentially when they're independent. |
| 391 | +8. **Agent prompts include architecture context.** Never launch a "blind" agent. |
| 392 | +9. **Intentional patterns are not bugs.** Graceful shutdown error swallowing, defensive cleanup, etc. are valid patterns -- exclude from issues but note in report. |
| 393 | +10. **Respect project conventions.** Read CLAUDE.md, use correct commands (`uv run python -m pytest`, not `uv run pytest`), no vendor names, etc. |
| 394 | +11. **Default to creating issues.** Unless user passes `--report-only`, the skill creates issues. |
| 395 | +12. **Never push code.** This skill audits and creates issues -- it does not fix code. |
0 commit comments