Skip to content

feat(#141): add /debug skill — structured hypothesis-driven debugging#142

Merged
atlas-apex merged 2 commits into
devfrom
feature/GH-141-debug-skill
May 1, 2026
Merged

feat(#141): add /debug skill — structured hypothesis-driven debugging#142
atlas-apex merged 2 commits into
devfrom
feature/GH-141-debug-skill

Conversation

@atlas-apex

Copy link
Copy Markdown
Collaborator

Summary

Adds the /debug skill at .claude/skills/debug/SKILL.md. Closes #141.

A methodology skill that enforces five disciplines on debug sessions:

  1. Capture the symptom precisely (exact URL, exact response, exact step)
  2. Read the architecture before guessing — map every layer the failing operation touches, file by file
  3. Form a hypothesis ladder — 3–5 candidate causes, each with an explicit evidence test that confirms or refutes it
  4. Gather evidence first, fix second
  5. Verify the fix against the original symptom evidence — re-run the same curl / browser repro from step 4. Unit tests verify code correctness, not feature correctness.

Stack appendices (Web + Desktop) carry the stack-specific bits — surface-evidence requirements (step 1), architecture-surface maps (step 2), and evidence-tests cookbooks (step 4). The methodology body stays portable across stacks; appendices are where stack-specific knowledge accumulates over time.

Web appendix covers browser routing, framework configs (Next/Nuxt/Vite), SPA-fallback layers, CDN, origin, the shared API client, backend handlers, and auth providers.

Desktop appendix covers Electron / Tauri / native-shell concerns: app entry points, IPC bridges, native modules, auto-updater, sandbox / entitlements, code signing, crash reports.

Includes a "When NOT to use" section so the methodology overhead doesn't sandbag simple bugs (typos, off-by-ones, greenfield exploration), and a maintainer note ("don't add an appendix until you have a real session's worth of patterns to seed it") so we don't ship speculative half-finished tables.

Why this skill exists

A real OAuth debug session in a managed project saw three sequential fixes chase adjacent symptoms because each was hypothesis-then-fix without evidence in between. Each cycle costs deploys + approvals + user patience. This skill is the "never do that again" guardrail — and the methodology is generic enough that any stack benefits.

Testing

  1. Markdown lint — single file, standard markdown; the markdown-lint workflow on .github/workflows/markdownlint.yml should pass
  2. Lychee link-check — every link is either an in-repo relative path (no network roundtrip) or an external https:// link to commonly-stable resources; Link Check workflow should pass
  3. No new hooks / no new scripts — pure skill addition, no shellcheck exposure
  4. Skill discovery — Claude Code will pick up the new skill at runtime; no settings.json change required (skills are auto-discovered from .claude/skills/<name>/SKILL.md)

Per the apexyard release-cut model, this targets dev. After merge, the next release tag picks this up and adopters get the skill via /update.

Closes #141

Glossary

Term Definition
Hypothesis ladder 3–5 candidate causes ordered by probability, each with a specific evidence test that would prove or disprove it. The discipline is: if you can't write a confirm/refute test for a hypothesis, it's not a hypothesis — it's a vibe. Drop it.
Architecture surface The set of files representing every layer the failing operation traverses — from user input to the persistence/external boundary. Read each one before forming hypotheses; most "tricky" bugs live in the layer the agent never opened.
Evidence-tests cookbook A stack-specific table mapping common hypothesis classes (server returns wrong status, CDN cache stale, redirect loop, auth header missing, etc.) to the exact command or observation that confirms or refutes them.
Surface evidence Concrete artifacts captured at the moment of failure — curl -I output, browser DevTools redirect chain, console errors, the exact URL in the address bar. The skill refuses to advance past step 1 without these.
Stack appendix A self-contained section at the bottom of the skill carrying the stack-specific surface-evidence requirements + architecture-surface map + evidence-tests cookbook for one stack class (Web, Desktop, …). New appendices land only after a real session in that stack class produces a useful pattern.
Compounding issue A new symptom that appears immediately after a fix lands. Not a separate bug — almost always a second-order effect of the same root cause or the fix's side effect. The skill explicitly warns against treating these as separate bugs.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants