feat(commands): add 6 lifecycle slash commands#1611
Conversation
Add Osmani-pattern lifecycle commands for Claude Code: - /spec: problem to requirements with CVA analysis - /plan: specs to milestones with risk coverage - /build: TDD implementation with atomic commits - /test: layered testing with hypothesis debugging - /review: 5-axis review (architecture, security, quality, tests, standards) - /ship: pre-flight validation delegating to /push-pr Remove unused workflow/ commands (0-init through 9-sync). Update AGENTS.md and CLAUDE.md routing. Exclude .claude/commands/ from markdownlint MD041. Closes #1609 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR Validation ReportNote ✅ Status: PASS Description Validation
PR Standards
QA Validation
⚡ Warnings
Powered by PR Validation workflow |
There was a problem hiding this comment.
Code Review
This pull request replaces the legacy numbered workflow commands with a new lifecycle-based command set including /spec, /plan, /build, /test, /review, and /ship, while updating documentation and linting configurations. A critical issue was identified regarding the removal of the legacy commands, which will break existing PowerShell tests that hardcode references to the old command names.
I am having trouble creating individual review comments. Click here to see my feedback.
.claude/commands/workflow/0-init.md (1-48)
The removal of the legacy workflow/ commands will break the existing test suite. Multiple test files, including tests/Invoke-WorkflowCommand.Tests.ps1 (line 40) and tests/WorkflowHelpers.Tests.ps1 (line 52), contain hardcoded references to these specific command names (e.g., 0-init, 1-plan, 4-security). These tests must be updated to validate the new lifecycle commands (/spec, /plan, etc.) or removed if the underlying workflow logic is being retired to ensure CI/CD pipeline health.
References
- When analyzing a diff for breaking changes, consider the full context of the file, not just the changed lines, to ensure that removals are not compensated for elsewhere.
AI Quality Gate ReviewWarning WalkthroughThis PR was reviewed by six AI agents in parallel, analyzing different aspects of the changes:
Review Summary
💡 Quick Access: Click on individual agent jobs (e.g., "🔒 security Review", "🧪 qa Review") in the workflow run to see detailed findings and step summaries. Security Review DetailsNow let me check if any old workflow commands are being removed and examine the changes to AGENTS.md and CLAUDE.md: Let me check if the workflow commands were removed as stated in the PR description: The workflow directory no longer exists as expected. Let me verify the model alias used in the prompt-engineer skill (the memory mentioned model aliases must be claude-{opus|sonnet|haiku}-4-5): The prompt-engineer SKILL.md has Now let me complete my security analysis. I've reviewed all the files in this PR: Security Review SummaryPR Type Classification: PROMPT/DOCS/CONFIG Changed files:
Findings
AnalysisPrompt Injection Surface Review:
Secret Detection:
Configuration Security:
Workflow Skill Deprecation:
Recommendations
Verdict{
"verdict": "PASS",
"message": "No security vulnerabilities found. PR adds lifecycle commands with proper agent delegation. One minor non-security frontmatter issue noted.",
"agent": "security",
"timestamp": "2026-04-11T06:41:00.000Z",
"findings": [
{
"severity": "low",
"category": "misconfiguration",
"description": "Invalid model alias claude-sonnet-4-6 not in VALID_MODEL_ALIASES (claude-{opus|sonnet|haiku}-4-5 or dated snapshots)",
"location": ".claude/skills/prompt-engineer/SKILL.md:7",
"cwe": "N/A",
"recommendation": "Change model: claude-sonnet-4-6 to model: claude-sonnet-4-5"
}
]
}QA Review DetailsNow I have enough context to perform the QA review. Let me analyze the PR systematically. QA Review: PR #1611PR Type ClassificationClassification Rationale: All changed files are markdown ( Test Coverage AssessmentN/A - DOCS only PR Per PR Type Detection rules: "Files without executable logic do not require tests." The 6 new lifecycle command files are Claude Code slash command definitions (markdown with YAML frontmatter). They contain no executable code paths requiring unit tests. The PR description confirms validation approach:
Quality Concerns
Investigation of model alias concern: The repository memory states "Skill frontmatter validator only accepts model aliases claude-{opus|sonnet|haiku}-4-5". However, this is a DOCS-only PR and the PR passed CI ( Regression Risk Assessment
EvidenceValidation Summary
{
"verdict": "PASS",
"message": "DOCS-only PR replacing workflow commands with lifecycle commands; no executable code requires tests",
"agent": "qa",
"timestamp": "2026-04-11T06:42:00Z",
"findings": [
{
"severity": "low",
"category": "code-quality",
"description": "Model alias claude-sonnet-4-6 in prompt-engineer SKILL.md may not match validator expectations",
"location": ".claude/skills/prompt-engineer/SKILL.md:7",
"recommendation": "Verify skill frontmatter validator accepts 4-6 aliases or revert to claude-sonnet-4-5"
}
]
}Analyst Review DetailsLet me search for the correct repository and examine the local files directly. Now let me check if the old workflow commands were removed: Good, the workflow directory was removed. Let me verify the documentation alignment: PR #1611 Analysis: Lifecycle Slash CommandsCode Quality Score
Overall: 4.75/5 Impact Assessment
Findings
Recommendations
VerdictDetailed AssessmentStrengths:
Architecture:
{
"verdict": "WARN",
"message": "Clean lifecycle command refactoring; verify model alias claude-sonnet-4-6 validity",
"agent": "analyst",
"timestamp": "2026-04-11T06:41:00Z",
"findings": [
{
"severity": "low",
"category": "consistency",
"description": "prompt-engineer/SKILL.md uses model claude-sonnet-4-6 but repository memory indicates only 4-5 aliases are validated",
"location": ".claude/skills/prompt-engineer/SKILL.md:7",
"recommendation": "Verify claude-sonnet-4-6 is added to scripts/validation/skill_frontmatter.py:35-53 accepted aliases or revert to claude-sonnet-4-5"
}
]
}Architect Review DetailsLet me check the deleted workflow commands to understand what's being removed: Design Quality Assessment
Overall Design Score: 4.8/5 Architectural Concerns
Breaking Change Assessment
Technical Debt Analysis
ADR Assessment
Recommendations
VerdictThe PR introduces 6 well-structured lifecycle commands that follow established patterns (ADR-030 skills superiority, ADR-006 thin workflows). Each command has:
The deprecation of numbered workflow commands is properly communicated. Markdownlint exclusions for MD041 (first line heading) are justified because command files use frontmatter. {
"verdict": "PASS",
"message": "Well-designed lifecycle command system following ADR-030 skill patterns with clean separation of concerns",
"agent": "architect",
"timestamp": "2026-04-11T06:42:00Z",
"findings": [
{
"severity": "low",
"category": "tech-debt",
"description": "Model alias claude-sonnet-4-6 not in validated alias list per repository memory",
"location": ".claude/skills/prompt-engineer/SKILL.md:7",
"recommendation": "Change to claude-sonnet-4-5 to match skill_frontmatter.py validated aliases"
}
]
}DevOps Review DetailsNow I have enough context to perform the DevOps review. Let me check if the workflow directory still exists or if the old files were properly removed: Now I have enough information to complete the DevOps review. DevOps Review: PR #1611PR Scope Detection
PR TYPE: DOCS/CONFIG/TEMPLATE hybrid. Running gates: Schema validation, Template review. This PR does not modify:
Pipeline Impact Assessment
CI/CD Quality Checks
Template AssessmentNew Lifecycle Commands: Adequate All 6 new commands (
Removed Commands: workflow/0-init through workflow/9-sync removed as expected per PR description. Workflow SKILL.md: Properly marked DEPRECATED with clear migration guidance. Findings
Configuration Review.markdownlint-cli2.yaml changes:
Automation Opportunities
The lifecycle commands already leverage skills and agents appropriately. No redundant patterns detected. Recommendations
Verdict{
"verdict": "WARN",
"message": "Model alias claude-sonnet-4-6 in prompt-engineer/SKILL.md is invalid per skill frontmatter validator; use claude-sonnet-4-5",
"agent": "devops",
"timestamp": "2026-04-11T06:41:30.000Z",
"findings": [
{
"severity": "low",
"category": "templates",
"description": "Skill frontmatter uses model claude-sonnet-4-6 which is not in the validator allowed list",
"location": ".claude/skills/prompt-engineer/SKILL.md:7",
"recommendation": "Change model to claude-sonnet-4-5 per scripts/validation/skill_frontmatter.py allowed model list"
}
]
}Roadmap Review DetailsNow let me check the issue referenced and see the workflow files that were removed: Now I have enough context to provide my strategic alignment assessment. Strategic Alignment Assessment
Feature Completeness
Impact Analysis
KANO ClassificationPerformance Feature: The more intuitive the command names, the more satisfied users become. Users expected to learn numbered commands before; now they get semantic names that match their mental model. Concerns
Positive Observations
Recommendations
Verdict{
"verdict": "PASS",
"message": "Lifecycle commands deliver measurable user value by replacing numbered workflow commands with semantic names that match developer mental models",
"agent": "roadmap",
"timestamp": "2026-04-11T06:41:00Z",
"findings": [
{
"severity": "low",
"category": "documentation",
"description": "No explicit migration path for users of deprecated /0-init through /4-security commands",
"location": ".claude/commands/",
"recommendation": "Consider adding migration table to CHANGELOG for existing users"
}
]
}Run Details
Powered by AI Quality Gate workflow |
There was a problem hiding this comment.
Pull request overview
Adds a new “lifecycle” set of Osmani-style slash commands under .claude/commands/ and updates top-level routing docs/config to prefer them over the older numbered workflow commands.
Changes:
- Added 6 lifecycle slash commands:
/spec,/plan,/build,/test,/review,/ship. - Removed the legacy numbered workflow command markdown files (
.claude/commands/workflow/*). - Updated routing documentation (CLAUDE.md, AGENTS.md) and adjusted markdownlint config for command files.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
CLAUDE.md |
Updates routing rules to point at the new lifecycle commands. |
AGENTS.md |
Replaces “workflow” command references with lifecycle command list. |
.markdownlint-cli2.yaml |
Excludes .claude/commands/** from markdownlint (intended to address MD041). |
.claude/commands/spec.md |
New /spec command for requirements + acceptance criteria generation. |
.claude/commands/plan.md |
New /plan command for milestone/task decomposition and risk planning. |
.claude/commands/build.md |
New /build command for TDD implementation + atomic commits workflow. |
.claude/commands/test.md |
New /test command for layered testing + debugging guidance. |
.claude/commands/review.md |
New /review command for 5-axis review orchestration. |
.claude/commands/ship.md |
New /ship command for pre-flight checks + PR creation. |
.claude/commands/workflow/0-init.md |
Removed legacy numbered workflow command. |
.claude/commands/workflow/1-plan.md |
Removed legacy numbered workflow command. |
.claude/commands/workflow/2-impl.md |
Removed legacy numbered workflow command. |
.claude/commands/workflow/3-qa.md |
Removed legacy numbered workflow command. |
.claude/commands/workflow/4-security.md |
Removed legacy numbered workflow command. |
.claude/commands/workflow/9-sync.md |
Removed legacy numbered workflow command. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces numbered workflow with six lifecycle slash commands (/spec, /plan, /build, /test, /review, /ship); adds new command docs, deletes legacy workflow files, updates AGENTS.md and CLAUDE.md routing, updates markdownlint ignores, and bumps one skill model metadata. No code/API changes. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as "Lifecycle Command (/spec/.../ship)"
participant Agent as "Task(subagent)"
participant Skill as "Skill(s)"
participant GitCI as "Git / CI"
User->>CLI: invoke /spec or /plan or /build or /test or /review or /ship
CLI->>Agent: Task(subagent_type="spec-generator"/"planner"/"implementer"/"qa"/"architect"/"devops")
Agent->>Skill: call Skill(...) (cva-analysis, milestone-planner, code-qualities-assessment, quality-grades, pipeline-validator, etc.)
Agent->>GitCI: read refs / create PRs / run pipeline checks
Skill-->>Agent: analysis / plan / verdicts
Agent-->>CLI: structured output (milestones, tasks, tests, findings, PR)
CLI-->>User: report (structured artifacts, PASS/FAIL, PR link)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 4
🧹 Nitpick comments (2)
.claude/commands/plan.md (2)
1-5: Addmodel: opusto frontmatter.Planning with 5-axis evaluation, pre-mortem, and risk analysis is complex reasoning. Specify opus model in frontmatter.
📝 Proposed fix
--- description: Plan how to build it. Decompose specs into milestones with dependencies and risk mitigations. Run after /spec. allowed-tools: Task, Skill, Read, Glob, Grep argument-hint: [spec-output-or-issue-number] +model: opus ---As per coding guidelines: "Slash command frontmatter must specify the
modelfield with appropriate Claude model selection based on task complexity (haiku for simple, sonnet for standard, opus for complex reasoning)."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/plan.md around lines 1 - 5, The frontmatter at the top of the command definition is missing the required model selection; update the YAML frontmatter block (the leading --- section) to include the key "model: opus" so the command explicitly uses the opus Claude model for complex reasoning tasks; ensure the new "model: opus" line is added alongside the existing keys (description, allowed-tools, argument-hint) in the frontmatter.
7-36: Consider addingultrathinkkeyword for deep planning analysis.Planning with 5-axis evaluation, pre-mortem, dependency ordering, and risk analysis benefits from extended thinking. Add the
ultrathinkkeyword to activate up to 31,999 tokens for deep reasoning.💡 Example placement
Invoke the milestone-planner and task-decomposer agents. +ultrathink + Plan how to build: $ARGUMENTSAs per coding guidelines: "Slash command files with complex reasoning tasks (e.g., architectural design, multi-step debugging, trade-off analysis, edge case analysis) SHOULD include the
ultrathinkkeyword in the prompt text to activate extended thinking mode."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/plan.md around lines 7 - 36, Add the ultrathink keyword to the command prompt so the planner uses extended reasoning; specifically update the prompt text that begins "Invoke the planner and execution-plans skills." / "Plan how to build: $ARGUMENTS" to include the literal token ultrathink (e.g., prefix or append the invocation line with ultrathink) so the planner/subagent receives the extended-thinking mode for deep planning analysis across the five axes and pre-mortem steps.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/commands/plan.md:
- Around line 7-11: Update the wording on the first line to use the same agent
naming used later: replace "planner and execution-plans skills" with a phrase
that references the agent subagent_type names (e.g., "milestone-planner and
task-decomposer agents") or explicitly clarify the mapping, so the invocation
lines that call Task(subagent_type="milestone-planner") and
Task(subagent_type="task-decomposer") match the descriptive text.
In @.claude/commands/review.md:
- Line 4: The frontmatter key "argument-hint" is too broad (currently
"argument-hint: [branch-or-pr-number]") while the implementation only handles
branch diffs via "git diff main...HEAD"; update the contract to match the code
by changing the argument-hint to only accept branch identifiers (e.g.,
"argument-hint: [branch]") or alternatively implement PR resolution logic that
maps a PR number to a branch/commit before running "git diff main...HEAD";
ensure references to "argument-hint", the prompt body, and any validation logic
enforce the new narrower contract so the prompt and implementation remain
consistent.
In @.markdownlint-cli2.yaml:
- Around line 141-142: Remove the blanket exclusion "- \".claude/commands/**\""
and instead add a targeted rule override that disables only MD041 for files
under the ".claude/commands/**" pattern (or more specific pattern like
".claude/commands/**/*.md") so other markdownlint rules (e.g., MD040) still run;
update the adjacent comment to accurately state that only MD041 is being
suppressed for slash command files with YAML frontmatter. Ensure the override
references the rule ID MD041 and the exact pattern ".claude/commands/**" (or a
more specific glob) so the change is narrowly scoped and self-describing.
In `@CLAUDE.md`:
- Around line 42-48: Clarify that the listed lifecycle actions (the bullets that
say "invoke spec", "invoke plan", "invoke build", "invoke test", "invoke
review", "invoke ship", "invoke analyze") are skill/tool invocations rather than
generic command routing: update the wording to explicitly distinguish "command
routing" (agent-level or CLI commands) from "skill routing" (calls into the
Skill tool), e.g., change each "invoke X" to "use Skill:X" or "call Skill X" and
add a short prefix sentence like "Use the Skill tool for the following lifecycle
actions; do not treat these as plain command routing." Ensure the change touches
the bullet list entries and the surrounding explanatory sentence so readers
cannot confuse invoking a skill with issuing a command.
---
Nitpick comments:
In @.claude/commands/plan.md:
- Around line 1-5: The frontmatter at the top of the command definition is
missing the required model selection; update the YAML frontmatter block (the
leading --- section) to include the key "model: opus" so the command explicitly
uses the opus Claude model for complex reasoning tasks; ensure the new "model:
opus" line is added alongside the existing keys (description, allowed-tools,
argument-hint) in the frontmatter.
- Around line 7-36: Add the ultrathink keyword to the command prompt so the
planner uses extended reasoning; specifically update the prompt text that begins
"Invoke the planner and execution-plans skills." / "Plan how to build:
$ARGUMENTS" to include the literal token ultrathink (e.g., prefix or append the
invocation line with ultrathink) so the planner/subagent receives the
extended-thinking mode for deep planning analysis across the five axes and
pre-mortem steps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 080b12e8-428b-4f87-b38f-98e8eaabe6ae
📒 Files selected for processing (15)
.claude/commands/build.md.claude/commands/plan.md.claude/commands/review.md.claude/commands/ship.md.claude/commands/spec.md.claude/commands/test.md.claude/commands/workflow/0-init.md.claude/commands/workflow/1-plan.md.claude/commands/workflow/2-impl.md.claude/commands/workflow/3-qa.md.claude/commands/workflow/4-security.md.claude/commands/workflow/9-sync.md.markdownlint-cli2.yamlAGENTS.mdCLAUDE.md
💤 Files with no reviewable changes (6)
- .claude/commands/workflow/9-sync.md
- .claude/commands/workflow/4-security.md
- .claude/commands/workflow/2-impl.md
- .claude/commands/workflow/1-plan.md
- .claude/commands/workflow/3-qa.md
- .claude/commands/workflow/0-init.md
- Fix CLAUDE.md skill routing to reference actual skills (analyze, github, pr-comment-responder, reflect, session-end, quality-grades) and separate lifecycle commands (/spec, /plan, /build, /test, /review, /ship) into their own section - Add @CLAUDE.md import to spec.md, plan.md, test.md, and ship.md for consistent project context loading - Add Write and Edit to test.md allowed-tools to enable writing tests and fixing code as documented in steps 5-6
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/commands/plan.md:
- Around line 1-5: Update the slash-command frontmatter to include the required
"model" key and revise the description field to use trigger phrasing;
specifically add a "model" entry alongside existing keys (allowed-tools,
argument-hint) and change the description line (the "description" frontmatter
entry) from declarative wording to start with "Use when ..." so it validates as
a slash-command (adjust the text "Plan how to build it..." to a trigger form).
In @.claude/commands/test.md:
- Line 2: Update the frontmatter description in .claude/commands/test.md to
follow the trigger-based pattern: replace the current generic "description" text
with a sentence that begins with "Use when..." (or an explicit trigger like "Run
when...") that states the conditions under which the command should be invoked
(e.g., "Use when you want to run layered tests after /build to prove changes and
debug failures"). Edit the description field in the file (the frontmatter key
"description") to this trigger-style phrasing so it clearly specifies WHEN to
run the command.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 312701be-4d35-423f-b28c-10749511b7ab
📒 Files selected for processing (5)
.claude/commands/plan.md.claude/commands/ship.md.claude/commands/spec.md.claude/commands/test.mdCLAUDE.md
✅ Files skipped from review due to trivial changes (2)
- .claude/commands/spec.md
- .claude/commands/ship.md
🚧 Files skipped from review as they are similar to previous changes (1)
- CLAUDE.md
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Overly broad markdownlint ignore disables all rules for commands
- Replaced the broad
.claude/commands/**ignore with explicit list of only the 6 new lifecycle command files, preserving full linting for pre-existing command files.
- Replaced the broad
- ✅ Fixed: Ship command invokes commands as if they were skills
- Changed instructions to correctly reference
/validate-pr-descriptionand/push-pras slash commands instead of incorrectly calling them skills via the Skill tool.
- Changed instructions to correctly reference
Preview (48d9d4876e)
diff --git a/.claude/commands/build.md b/.claude/commands/build.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/build.md
@@ -1,0 +1,48 @@
+---
+description: Build incrementally. Implement changes in thin vertical slices with TDD and atomic commits. Run after /plan.
+allowed-tools: Task, Skill, Read, Write, Edit, Glob, Grep, Bash(git add:*), Bash(git commit:*), Bash(git status:*), Bash(git diff:*), Bash(python3:*)
+argument-hint: [plan-step-or-task-description]
+---
+
+@CLAUDE.md
+
+Invoke the code-qualities-assessment and golden-principles skills.
+
+Build: $ARGUMENTS
+
+Use Task(subagent_type="implementer") as the primary agent. If no argument provided, check for recent /plan output or ask what to build.
+
+Evaluate across all 5 axes:
+
+1. **Test-first discipline** - Red before green before refactor. No code without a failing test.
+2. **Commit atomicity** - Each commit is one logical change, rollback-safe.
+3. **Code quality** - Cohesion, coupling, encapsulation, testability, non-redundancy.
+4. **Complexity budget** - Cyclomatic complexity <=10. Methods <=60 lines. No nesting.
+5. **Standards compliance** - Golden principles, style enforcement, naming conventions.
+
+## Software Hierarchy of Needs
+
+Design emerges bottom-up. Enforce qualities before reaching for patterns.
+
+1. Qualities: Cohesion, Coupling, DRY, Encapsulation, Testability
+2. Principles: Open-Closed, Encapsulate by Policy, Separation of Concerns
+3. Practices: Programming by Intention, State Always Private, CVA
+4. Wisdom: Design to interfaces, Favor delegation over inheritance, Encapsulate what varies, Separate use from creation
+5. Patterns: Only when the problem demands it. Three similar lines beat a premature abstraction.
+
+## Process
+
+1. Read the task or plan step
+2. Write a failing test (red)
+3. Write the minimum code to pass (green)
+4. Refactor toward the hierarchy of needs (refactor)
+5. Run golden-principles and taste-lints before committing
+6. Commit atomically with conventional message
+7. Repeat for next slice
+
+## Guardrails
+
+- Hard to test? Fix the design, not the test. Indicates tight coupling or weak encapsulation.
+- Ask "how would I test this?" even without tests.
+- Every method should read like a sentence (Programming by Intention).
+- Favor delegation over inheritance. A makes B, or A uses B. Never both.
diff --git a/.claude/commands/plan.md b/.claude/commands/plan.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/plan.md
@@ -1,0 +1,47 @@
+---
+description: Plan how to build it. Decompose specs into milestones with dependencies and risk mitigations. Run after /spec.
+allowed-tools: Task, Skill, Read, Glob, Grep
+argument-hint: [spec-output-or-issue-number]
+---
+
+@CLAUDE.md
+
+Invoke the planner and execution-plans skills.
+
+Plan how to build: $ARGUMENTS
+
+Use Task(subagent_type="milestone-planner") for milestone breakdown, then Task(subagent_type="task-decomposer") for atomic tasks. If no argument provided, check for recent /spec output or ask what to plan.
+
+Evaluate across all 5 axes:
+
+1. **Scope integrity** - Nothing unnecessary, nothing missing
+2. **Dependency ordering** - Can tasks execute in the stated sequence?
+3. **Risk coverage** - All P0 risks have mitigations
+4. **Estimate confidence** - Complexity-based sizing (S/M/L), not time-based
+5. **Reversibility** - Which steps are hard to undo?
+
+## Principles
+
+- **Programming by Intention**: Sergeant methods direct workflow. Each task should read like an intent, not an implementation.
+- **OODA Loop**: Observe (read the spec), Orient (map to existing code), Decide (sequence tasks), Act (commit the plan). Faster loops win.
+- **First Principles**: Question the requirement, try to delete the step, then optimize, then speed up, then automate. Never automate something that should not exist.
+
+## Process
+
+1. Read the spec or issue
+2. Map sub-problems to existing code (what already exists?)
+3. Break into milestones with clear exit criteria
+4. Decompose milestones into atomic tasks (each independently verifiable)
+5. Sequence by dependencies, flag parallel opportunities
+6. Run pre-mortem on the plan itself
+7. Route to Task(subagent_type="critic") for validation
+
+## Output
+
+Structured plan with:
+
+- Milestones (numbered, with exit criteria)
+- Tasks per milestone (atomic, with acceptance criteria)
+- Dependency graph (what blocks what)
+- Risk register (risk, likelihood, mitigation)
+- Deferred items (explicitly out of scope for this plan)
diff --git a/.claude/commands/review.md b/.claude/commands/review.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/review.md
@@ -1,0 +1,49 @@
+---
+description: Review before merge. Five-axis code review across architecture, security, quality, tests, and standards. Run after /test.
+allowed-tools: Task, Skill, Read, Glob, Grep, Bash(git diff:*), Bash(git log:*), Bash(git status:*)
+argument-hint: [branch-or-pr-number]
+---
+
+@CLAUDE.md
+
+Invoke the analyze, code-qualities-assessment, and security-scan skills.
+
+Review the current changes across all five axes: $ARGUMENTS
+
+If no argument, review the current branch diff against main.
+
+Sequential evaluation order:
+
+1. **Architecture** - Follows existing patterns? Clean boundaries? Right abstraction level? Coupling intentional? ADR conformance?
+2. **Security** - Input validated? Secrets safe? Auth checked? OWASP top 10? STRIDE threats? CWE scan? (Use Task(subagent_type="security"))
+3. **Code quality** - Score all 5 qualities: cohesion, coupling, encapsulation, testability, non-redundancy. Cyclomatic complexity <=10? Methods <=60 lines?
+4. **Test completeness** - Every new code path has a test? Failure paths covered? Acceptance criteria verified?
+5. **Standards** - Golden principles, taste lints, style enforcement, naming conventions
+
+## Principles
+
+- **Design to interfaces**: Review signatures from the consumer perspective. Hidden implementation details should stay hidden.
+- **Encapsulate what varies**: If the diff introduces variation, is it encapsulated? Or scattered?
+- **Chesterton's Fence**: Before removing code, verify you understand why it existed.
+- **Principle of Least Privilege**: New permissions, scopes, or access? Challenge each one.
+
+## Process
+
+1. Read the diff (git diff main...HEAD)
+2. Architecture pass: Task(subagent_type="architect") evaluates structure
+3. Security pass: Task(subagent_type="security") evaluates threats
+4. Quality pass: invoke code-qualities-assessment skill
+5. Test pass: Task(subagent_type="qa") evaluates coverage
+6. Standards pass: invoke golden-principles and taste-lints skills
+7. Synthesize findings
+
+## Output
+
+Categorize each finding as **Critical**, **Important**, or **Suggestion**.
+
+Structured review with:
+
+- Finding (what is wrong)
+- Location (file:line)
+- Severity (Critical/Important/Suggestion)
+- Fix (specific recommendation)
diff --git a/.claude/commands/ship.md b/.claude/commands/ship.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/ship.md
@@ -1,0 +1,44 @@
+---
+description: Ship it. Pre-flight validation, CI check, and PR creation. Run after /review.
+allowed-tools: Task, Skill, Read, Glob, Grep, Bash(git diff:*), Bash(git log:*), Bash(git status:*), Bash(git push:*), Bash(python3:*)
+argument-hint: [target-branch]
+---
+
+@CLAUDE.md
+
+Invoke the pipeline-validator skill.
+
+Ship the current branch: $ARGUMENTS
+
+Use Task(subagent_type="devops") as the primary agent. Default target is main unless specified.
+
+Pre-flight checks (all must pass):
+
+1. **Pipeline health** - All CI checks green? No suppressed failures? Run pipeline-validator.
+2. **Security posture** - Final security-scan clean? No new CWE findings? No secrets in diff?
+3. **Review complete** - Has /review been run? Any unresolved Critical findings?
+4. **Tests passing** - All tests green? No skipped tests without justification?
+5. **Standards clean** - Golden principles and taste lints pass?
+
+## Principles
+
+- **Faster is safer**: Small, frequent shipments reduce blast radius. Ship early.
+- **No deliberate debt**: If it is not ready, do not ship it. Fix it or defer it.
+- **Observability first**: If you cannot measure it, you cannot ship it safely.
+
+## Process
+
+1. Run pre-flight checks (all 5 above)
+2. If any check fails: report what failed, why, and how to fix. Stop.
+3. If all pass: validate PR description (run /validate-pr-description command)
+4. Create PR via /push-pr command
+5. Report: what shipped, PR link, any warnings
+
+## Output
+
+Ship report:
+
+- Pre-flight results (pass/fail per check)
+- PR link (if created)
+- Warnings (non-blocking concerns)
+- Next steps (monitoring, follow-up items)
diff --git a/.claude/commands/spec.md b/.claude/commands/spec.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/spec.md
@@ -1,0 +1,45 @@
+---
+description: Define what to build. Transform a problem into testable requirements with acceptance criteria.
+allowed-tools: Task, Skill, Read, Glob, Grep
+argument-hint: [problem-statement-or-issue-number]
+---
+
+@CLAUDE.md
+
+Invoke the cva-analysis and decision-critic skills.
+
+Define what to build for: $ARGUMENTS
+
+Use Task(subagent_type="spec-generator") to produce requirements. If no argument provided, ask what problem to solve.
+
+Evaluate across all 5 axes:
+
+1. **Problem clarity** - Is the right problem being solved? Could a reframing yield 10x impact?
+2. **Requirement testability** - Can each requirement be verified pass/fail?
+3. **Completeness** - No gaps between problem statement and acceptance criteria?
+4. **Traceability** - REQ to DESIGN to TASK linkage established?
+5. **Feasibility** - Buildable within constraints? Existing code to leverage?
+
+## Principles
+
+- **CVA**: Identify commonalities first, then variabilities, then relationships. Greatest risk is the wrong abstraction.
+- **YAGNI**: Only specify what is needed now. Speculative requirements create waste.
+- **Separation of Concerns**: Each requirement addresses one concern. Mixed concerns signal a missing decomposition.
+
+## Process
+
+1. Clarify the problem (what, who, why, constraints)
+2. Search for existing solutions in the codebase (grep for related patterns)
+3. Apply CVA: what is common across use cases? What varies?
+4. Write requirements as testable acceptance criteria
+5. Run pre-mortem: what fails first?
+6. Run decision-critic: challenge assumptions before committing
+
+## Output
+
+Structured requirements with:
+
+- Problem statement (1-2 sentences)
+- Acceptance criteria (numbered, testable)
+- Out of scope (explicit exclusions)
+- Open questions (unresolved unknowns)
diff --git a/.claude/commands/test.md b/.claude/commands/test.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/test.md
@@ -1,0 +1,47 @@
+---
+description: Prove it works. Run layered tests and debug failures with hypothesis-driven investigation. Run after /build.
+allowed-tools: Task, Skill, Read, Write, Edit, Glob, Grep, Bash(git diff:*), Bash(git status:*), Bash(python3:*), Bash(pytest:*), Bash(npm test:*)
+argument-hint: [component-or-failure-description]
+---
+
+@CLAUDE.md
+
+Invoke the code-qualities-assessment and quality-grades skills.
+
+Test: $ARGUMENTS
+
+Use Task(subagent_type="qa") as the primary agent. For security testing, also invoke Task(subagent_type="security"). If no argument provided, test the current branch diff against main.
+
+Evaluate across all 5 axes:
+
+1. **Unit coverage** - Each method in isolation, dependencies injected
+2. **Integration coverage** - Contracts between components verified
+3. **Acceptance coverage** - Each requirement has a passing test
+4. **Security coverage** - OWASP top 10 scenarios exercised
+5. **Failure coverage** - Error paths tested, chaos hypotheses validated
+
+## Principles
+
+- **Testability is design feedback**: Hard to test means poor encapsulation, tight coupling, Law of Demeter violation, weak cohesion, or procedural code.
+- **Tests are proof**: A passing test is evidence. A missing test is a gap in knowledge.
+- **Hypothesis-driven debugging**: When a test fails, form a hypothesis before changing code. Verify the hypothesis. Then fix.
+
+## Process
+
+1. Identify what changed (git diff against main)
+2. Map changes to test coverage: which tests cover this code?
+3. Run existing tests first (catch regressions)
+4. Identify coverage gaps: new code paths without tests
+5. Write missing tests (unit first, then integration)
+6. For failures: hypothesis, verify, fix (never change code without understanding why)
+7. Run security-scan for CWE patterns
+8. Report: passing, failing, gaps, recommendations
+
+## Output
+
+Structured test report:
+
+- Tests run (count, pass/fail)
+- Coverage gaps (specific files and functions)
+- Security findings (CWE references)
+- Recommendations (what to add, what to fix)
diff --git a/.claude/commands/workflow/0-init.md b/.claude/commands/workflow/0-init.md
deleted file mode 100644
--- a/.claude/commands/workflow/0-init.md
+++ /dev/null
@@ -1,48 +1,0 @@
----
-description: Session initialization - enforce ADR-007 memory-first architecture at session start. Loads project context, creates session log, and declares current branch via Invoke-Init.ps1.
-argument-hint: [--session-number N] [--objective "text"]
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(pwsh .claude/skills/session-init/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
-model: sonnet
----
-
-# /0-init — Session Initialization
-
-Enforce ADR-007 memory-first architecture at session start.
-
-## Context
-
-Recent sessions: !`ls -1 .agents/sessions/ | tail -5`
-
-Current branch: !`git branch --show-current`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-Init.ps1 $ARGUMENTS
-```
-
-## What This Command Does
-
-1. **Load project context** — initializes session state via Agent Orchestration MCP (graceful fallback if unavailable)
-2. **Load initial instructions** — read AGENTS.md for current project rules
-3. **Read HANDOFF.md** — load prior session context (read-only)
-4. **Surface prior context** — retrieves relevant session history via Agent Orchestration MCP (graceful fallback if unavailable)
-5. **Create session log** — via `New-SessionLog.ps1`
-6. **Declare current branch** — output git branch for orientation
-7. **Record evidence** — persist session state (graceful fallback if unavailable)
-
-## Arguments
-
-- `--session-number N`: Optional. Auto-detected from `.agents/sessions/`.
-- `--objective "text"`: Optional. Derived from branch name if omitted.
-
-## Related
-
-- Protocol: `.agents/SESSION-PROTOCOL.md`
-- ADR-007: `.agents/architecture/ADR-007-memory-first-architecture.md`
-- Session Init Skill: `.claude/skills/session-init/SKILL.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/1-plan.md b/.claude/commands/workflow/1-plan.md
deleted file mode 100644
--- a/.claude/commands/workflow/1-plan.md
+++ /dev/null
@@ -1,54 +1,0 @@
----
-description: Planning phase - route task to planner (default), architect (--arch), or roadmap→high-level-advisor chain (--strategic).
-argument-hint: [--arch] [--strategic] <task-description>
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
- - mcp__agent_orchestration__invoke_agent
- - mcp__agent_orchestration__track_handoff
- - mcp__agent_orchestration__get_routing_recommendation
-model: sonnet
----
-
-# /1-plan — Planning Phase
-
-Route a planning task to the appropriate agent.
-
-## Context
-
-Current branch: !`git branch --show-current`
-
-Recent commits: !`git log --oneline -5`
-
-Planning artifacts: !`ls -1 .agents/planning/ 2>/dev/null | tail -10`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-Plan.ps1 $ARGUMENTS
-```
-
-## Variants
-
-| Flag | Agent | Use When |
-|------|-------|----------|
-| *(none)* | `planner` | Standard feature/task planning |
-| `--arch` | `architect` | Design decisions, ADR-worthy choices |
-| `--strategic` | `roadmap → high-level-advisor` | Roadmap, epics, strategic alignment |
-
-## Arguments
-
-- `--arch`: Use architect agent instead of planner.
-- `--strategic`: Chain roadmap agent → high-level-advisor.
-- Remaining text: Task description passed to agent.
-
-## Output
-
-Planning artifacts stored in `.agents/planning/`.
-
-## Related
-
-- ADR-013: `.agents/architecture/ADR-013-agent-orchestration-mcp.md`
-- Agent Orchestration Spec: `.agents/specs/agent-orchestration-mcp-spec.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/2-impl.md b/.claude/commands/workflow/2-impl.md
deleted file mode 100644
--- a/.claude/commands/workflow/2-impl.md
+++ /dev/null
@@ -1,49 +1,0 @@
----
-description: Implementation phase - invoke implementer agent (default), or run full sequential chain (--full), or parallel execution of implementer+qa+security (--parallel).
-argument-hint: [--full] [--parallel] <implementation-task>
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
- - mcp__agent_orchestration__invoke_agent
- - mcp__agent_orchestration__track_handoff
- - mcp__agent_orchestration__start_parallel_execution
- - mcp__agent_orchestration__aggregate_parallel_results
- - mcp__agent_orchestration__resolve_conflict
-model: sonnet
----
-
-# /2-impl — Implementation Phase
-
-Invoke the implementer agent, optionally chaining QA and security.
-
-## Context
-
-Planning artifacts: !`ls -1 .agents/planning/ 2>/dev/null | tail -10`
-
-Current branch: !`git branch --show-current`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-Impl.ps1 $ARGUMENTS
-```
-
-## Execution Modes
-
-| Flag | Mode | Description |
-|------|------|-------------|
-| *(none)* | Default | Implementer agent only |
-| `--full` | Sequential | implementer → qa → security |
-| `--parallel` | Parallel | implementer + parallel(qa, security) |
-
-## Arguments
-
-- `--full`: Run full sequential chain after implementation.
-- `--parallel`: Run QA and security in parallel after implementation.
-- Remaining text: Implementation task description.
-
-## Related
-
-- ADR-013: `.agents/architecture/ADR-013-agent-orchestration-mcp.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/3-qa.md b/.claude/commands/workflow/3-qa.md
deleted file mode 100644
--- a/.claude/commands/workflow/3-qa.md
+++ /dev/null
@@ -1,45 +1,0 @@
----
-description: Quality assurance - invoke QA agent, validate test coverage, check acceptance criteria, and report results.
-argument-hint: [--coverage-threshold N] <verification-scope>
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
- - mcp__agent_orchestration__invoke_agent
- - mcp__agent_orchestration__track_handoff
-model: sonnet
----
-
-# /3-qa — Quality Assurance
-
-Invoke the QA agent and validate implementation quality.
-
-## Context
-
-Implementation artifacts: !`ls -1 .agents/sessions/ | tail -3`
-
-Current branch: !`git branch --show-current`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-QA.ps1 $ARGUMENTS
-```
-
-## What This Command Does
-
-1. Invoke `qa` agent via Agent Orchestration MCP
-2. Validate test coverage against threshold (default: 80%)
-3. Check acceptance criteria from planning artifacts
-4. Report pass/fail with details
-5. Track handoff back to orchestrator
-
-## Arguments
-
-- `--coverage-threshold N`: Minimum coverage percentage (default: 80).
-- Remaining text: Verification scope.
-
-## Related
-
-- ADR-006: `.agents/architecture/ADR-006-thin-workflows-testable-modules.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/4-security.md b/.claude/commands/workflow/4-security.md
deleted file mode 100644
--- a/.claude/commands/workflow/4-security.md
+++ /dev/null
@@ -1,46 +1,0 @@
----
-description: Security review - invoke security agent with OWASP Top 10 check, secret detection, and dependency audit.
-argument-hint: [--owasp-only] [--secrets-only] <security-scope>
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
- - mcp__agent_orchestration__invoke_agent
- - mcp__agent_orchestration__track_handoff
-model: opus
----
-
-# /4-security — Security Review
-
-Comprehensive security assessment using the security agent.
-
-## Context
-
-Implementation artifacts: !`ls -1 .agents/sessions/ | tail -3`
-
-Current branch: !`git branch --show-current`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-Security.ps1 $ARGUMENTS
-```
-
-## What This Command Does
-
-1. Invoke `security` agent via Agent Orchestration MCP (model: opus per ADR-013)
-2. OWASP Top 10 check (skipped with `--secrets-only`)
-3. Secret detection scan (skipped with `--owasp-only`)
-4. Dependency audit for known vulnerabilities
-5. Generate security report with findings
-
-## Arguments
-
-- `--owasp-only`: Run only OWASP Top 10 check.
-- `--secrets-only`: Run only secret detection.
-- Remaining text: Security scope.
-
-## Related
-
-- ADR-013: `.agents/architecture/ADR-013-agent-orchestration-mcp.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/9-sync.md b/.claude/commands/workflow/9-sync.md
deleted file mode 100644
--- a/.claude/commands/workflow/9-sync.md
+++ /dev/null
@@ -1,116 +1,0 @@
----
-description: Auto-generate session documentation. Queries session history, generates workflow diagrams, updates session logs, and syncs memory. Use at the end of any workflow to capture what happened.
-model: sonnet
-argument-hint: [--dry-run]
-allowed-tools:
- - Bash(python .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - mcp__serena__*
- - mcp__forgetful__*
----
-
-# /9-sync — Auto-Documentation & Memory Sync
-
-Generate comprehensive session documentation automatically.
-
-## Overview
-
-This command closes the workflow loop by documenting what happened during a session. It:
-
-1. Collects session history (agents invoked, tools used, files changed)
-2. Generates a workflow sequence diagram (Mermaid)
-3. Extracts key decisions and artifacts
-4. Appends documentation to the session log
-5. Syncs context to Serena memory for cross-session persistence
-6. Suggests retrospective learnings
-
-## Execution Steps
-
-### Step 1: Gather Session Context
-
-Collect the current session state:
-
-```bash
-# Get current branch and recent commits
-git log --oneline -20 --since="$(date -d '8 hours ago' --iso-8601)" 2>/dev/null || git log --oneline -20
-
-# Get files changed in this session
-git diff --stat HEAD~10..HEAD 2>/dev/null || git diff --stat main..HEAD
-
-# Get current session log if it exists
-ls -t .agents/sessions/*.json 2>/dev/null | head -1
-```
-
-### Step 2: Generate Session Documentation
-
-Run the sync script to produce the session documentation:
-
-```bash
-python .claude/skills/workflow/scripts/sync_session_documentation.py $ARGUMENTS
-```
-
-This script will:
-
-- Scan git history for session commits
-- Identify agents referenced in commit messages
-- Generate a Mermaid sequence diagram
-- Produce a structured session summary
-
-### Step 3: Extract Decisions and Artifacts
-
-From the session context, identify:
-
-- **Decisions made**: ADRs created/modified, design choices documented
-- **Artifacts created**: New files, modified scripts, PRs opened
-- **Issues referenced**: GitHub issues addressed or discovered
-- **Risks identified**: Any blockers or concerns raised
-
-### Step 4: Update Session Log
-
-Append the sync output to the current session log in `.agents/sessions/`. The entry MUST include:
-
-| Field | Description |
-|-------|-------------|
-| `agents_invoked` | Ordered list of agents used (with duration estimates) |
-| `decisions_made` | Key decisions with rationale |
-| `artifacts_created` | Files, commits, issues, PRs |
-| `workflow_diagram` | Mermaid sequence diagram |
-| `retrospective_learnings` | Suggested improvements |
-
-### Step 5: Sync to Memory Systems
-
-Update persistent memory for cross-session context:
-
-1. **Serena**: Store key decisions and outcomes via `mcp__serena__save_memory`
-2. **Forgetful**: Record learnings via `mcp__forgetful__save_memory`
-
-### Step 6: Suggest Retrospective Learnings
-
-Based on the session, suggest:
-
-- What went well (patterns to repeat)
-- What could improve (process gaps)
-- What to watch for (emerging risks)
-
-## Arguments
-
-| Argument | Description |
-|----------|-------------|
-| `--dry-run` | Preview documentation without writing to session log |
-
-## Output
-
-The command produces a session sync report with sections for Workflow Diagram (Mermaid),
-Agents Invoked, Decisions Made, Artifacts Created, and Retrospective Learnings.
-
-## Dependencies
-
-- Session State MCP (`agents://history` resource) — graceful fallback to git history when unavailable
-- Serena MCP — for memory persistence
-- Forgetful MCP — for learning extraction
-
-## Related
-
-- [SESSION-PROTOCOL.md](../../../.agents/SESSION-PROTOCOL.md) — Session requirements
-- [ADR-007: Memory-First Architecture](../../../.agents/architecture/ADR-007-memory-first-architecture.md)
-- [PRD: Workflow Orchestration Enhancement](../../../.agents/planning/prd-workflow-orchestration-enhancement.md)
\ No newline at end of file
diff --git a/.markdownlint-cli2.yaml b/.markdownlint-cli2.yaml
--- a/.markdownlint-cli2.yaml
+++ b/.markdownlint-cli2.yaml
@@ -137,3 +137,12 @@
# CLAUDE.md files are managed by the claude-mem plugin which prepends <claude-mem-context>
# HTML tags, violating MD033 and MD041. These are tool-managed metadata, not authored markdown.
- "**/CLAUDE.md"
+
+ # New lifecycle command files have only YAML frontmatter, no H1 heading. MD041 not applicable.
+ # Only the 6 new lifecycle commands are excluded; other command files retain full linting.
+ - ".claude/commands/spec.md"
+ - ".claude/commands/plan.md"
+ - ".claude/commands/build.md"
+ - ".claude/commands/test.md"
+ - ".claude/commands/review.md"
+ - ".claude/commands/ship.md"
diff --git a/AGENTS.md b/AGENTS.md
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -42,7 +42,7 @@
|PRs: GitHub|Reviews: pr-comment-responder|Conflicts: merge-resolver
|Session: session-init, session-end|CI fix: session-log-fixer|Push: /push-pr
|Security: security-detection|Quality: analyze|Learn: reflect
-|Workflow: workflow (0-init, 1-plan, 2-impl, 3-qa, 4-security)
+|Lifecycle: /spec, /plan, /build, /test, /review, /ship
### ADR Review (BLOCKING)
diff --git a/CLAUDE.md b/CLAUDE.md
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -39,11 +39,19 @@
The skill has specialized workflows that produce better results than ad-hoc answers.
Key routing rules:
-- Bugs, errors, "why is this broken", 500 errors → invoke analyze
-- Ship, deploy, push, create PR → invoke github
-- QA, test the site, find bugs → invoke qa (subagent_type)
-- Code review, check my diff → invoke pr-comment-responder
-- Weekly retro → invoke reflect
-- Architecture review → invoke analyze
-- Save progress, checkpoint, resume → invoke session-end
-- Code quality, health check → invoke quality-grades
+- Bugs, errors, "why is this broken" → invoke analyze skill
+- PRs, issues, GitHub operations → invoke github skill
+- PR review threads, comment triage → invoke pr-comment-responder skill
+- Weekly retro → invoke reflect skill
+- Save progress, checkpoint → invoke session-end skill
+- Code quality, health check → invoke quality-grades skill
+
+## Lifecycle commands
+
+For development lifecycle phases, use these slash commands (not skills):
+- Define requirements, "what should we build" → /spec
+- Plan work, break down tasks, estimate → /plan
+- Implement, code, build features → /build
+- Test, prove it works, debug failures → /test
+- Review code, check my diff → /review
+- Ship, deploy, push, create PR → /shipYou can send follow-ups to the cloud agent here.
…command - .markdownlint-cli2.yaml: Replace overly broad .claude/commands/** ignore with specific list of 6 new lifecycle command files. This preserves full linting for pre-existing command files (push-pr.md, pr-review.md, validate-pr-description.md, etc.) - .claude/commands/ship.md: Correct references to validate-pr-description and push-pr. These are slash commands (.claude/commands/*.md), not skills (.claude/skills/*/SKILL.md). Update instructions to use command invocation syntax instead of Skill tool references.
Review Triage RequiredNote Priority: NORMAL - Human approval required before bot responds Review Summary
Next Steps
Powered by PR Maintenance workflow - Add triage:approved label |
…cle commands - Add role personas to every Task(subagent_type) invocation - Move skill invocations from top-level directives into process steps - Add empty $ARGUMENTS guards to all commands - Add @CLAUDE.md import to all 6 for consistency - Add PR type classification to /test (skip irrelevant gates) - Add structured verdict output format to /test and /ship - Add Write to spec.md and plan.md allowed-tools - Add Bash(uv:*) to build.md and test.md, Bash(gh:*) to ship.md - Fix review.md hard-coded main branch reference - Expand test.md from 5 axes to 6 quality gates with multi-agent dispatch - Update prompt-engineer skill to claude-sonnet-4-6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spec-to-Implementation ValidationCaution ❌ Final Verdict: FAIL What is Spec Validation?This validation ensures your implementation matches the specifications:
Validation Summary
Spec References
Requirements Traceability DetailsNow I have all the information needed to build the requirements coverage matrix. Requirements Coverage Matrix
Summary
Gaps
Implementation Completeness DetailsAcceptance Criteria Checklist
Missing Functionality
Edge Cases Not Covered
Implementation Quality
Run Details
Powered by AI Spec Validator workflow |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Test command hardcodes
origin/mainas base branch- Updated test.md Step 0 to detect base branch from
gh pr view --json baseRefNamewith fallback to main, and added Bash(gh:*) to allowed-tools.
- Updated test.md Step 0 to detect base branch from
- ✅ Fixed: Review command lacks permission to run
ghCLI- Added Bash(gh:*) to allowed-tools in review.md to enable the documented base branch detection via gh pr view.
Preview (8301c93bd9)
diff --git a/.claude/commands/build.md b/.claude/commands/build.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/build.md
@@ -1,0 +1,42 @@
+---
+description: Build incrementally. Implement changes in thin vertical slices with TDD and atomic commits. Run after /plan.
+allowed-tools: Task, Skill, Read, Write, Edit, Glob, Grep, Bash(*)
+argument-hint: [plan-step-or-task-description]
+---
+
+@CLAUDE.md
+
+Build: $ARGUMENTS
+
+If $ARGUMENTS is empty, check for recent /plan output in the conversation. If none found, ask the user what to build.
+
+## Agent
+
+Task(subagent_type="implementer"): You are a senior engineer. Discover the project's tech stack, coding patterns, and test conventions by reading the codebase. Build in thin vertical slices. Test-first when the project has tests. Commit atomically.
+
+For each slice:
+
+1. Read the task
+2. Understand the existing code patterns (read related files, check test conventions)
+3. Write a failing test if the project has a test framework
+4. Write the minimum code to pass
+5. Refactor toward quality (cohesion, encapsulation, simplicity)
+6. Commit with a conventional message
+
+## Quality Signals
+
+After implementation, invoke Skill(skill="code-qualities-assessment") to score the result.
+
+The agent should self-check:
+
+- Is this hard to test? That indicates a design problem, not a test problem.
+- Does every method read like a sentence? (Programming by Intention)
+- Is coupling intentional or accidental?
+- Would a stranger understand this code without asking questions?
+
+## Guardrails
+
+- Atomic commits. Each commit is one logical change, rollback-safe.
+- No code without understanding the existing patterns first.
+- Favor delegation over inheritance. A makes B, or A uses B. Never both.
+- Three similar lines beat a premature abstraction.
diff --git a/.claude/commands/plan.md b/.claude/commands/plan.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/plan.md
@@ -1,0 +1,45 @@
+---
+description: Plan how to build it. Decompose specs into milestones with dependencies and risk mitigations. Run after /spec.
+allowed-tools: Task, Skill, Read, Write, Glob, Grep
+argument-hint: [spec-output-or-issue-number]
+---
+
+@CLAUDE.md
+
+Plan: $ARGUMENTS
+
+If $ARGUMENTS is empty, check for recent /spec output in the conversation. If none found, ask the user what to plan.
+
+## Process
+
+1. Read the spec or issue
+2. Map sub-problems to existing code (what already exists? use Grep/Glob to verify)
+3. Task(subagent_type="milestone-planner"): You are a project planner. Break the spec into milestones with clear exit criteria. Each milestone is independently shippable. Sequence by dependencies. Flag parallel opportunities.
+4. Task(subagent_type="task-decomposer"): You are a work breakdown specialist. Decompose each milestone into atomic tasks. Each task is independently verifiable with a clear done definition. Size by complexity (S/M/L), not time.
+5. Invoke Skill(skill="execution-plans") to persist the plan as a versioned artifact.
+6. Task(subagent_type="analyst"): You are a risk analyst. Run a pre-mortem on this plan. What fails first? What dependencies are fragile? What assumptions are untested?
+7. Task(subagent_type="critic"): You are a plan reviewer. Validate: is scope complete? Can tasks execute in the stated sequence? Are estimates credible? Is anything missing?
+
+## Evaluation Axes
+
+1. **Scope integrity** - Nothing unnecessary, nothing missing
+2. **Dependency ordering** - Can tasks execute in the stated sequence?
+3. **Risk coverage** - All P0 risks have mitigations
+4. **Estimate confidence** - Complexity-based sizing (S/M/L), not time-based
+5. **Reversibility** - Which steps are hard to undo?
+
+## Principles
+
+- **Programming by Intention**: Each task should read like an intent, not an implementation detail.
+- **OODA Loop**: Observe (read the spec), Orient (map to existing code), Decide (sequence tasks), Act (commit the plan). Faster loops win.
+- **First Principles**: Question the requirement, try to delete the step, then optimize, then speed up, then automate. Never automate something that should not exist.
+
+## Output
+
+Structured plan:
+
+- **Milestones** (numbered, with exit criteria)
+- **Tasks per milestone** (atomic, with acceptance criteria and S/M/L sizing)
+- **Dependency graph** (what blocks what, what can run in parallel)
+- **Risk register** (risk, likelihood, impact, mitigation)
+- **Deferred items** (explicitly out of scope for this plan)
diff --git a/.claude/commands/review.md b/.claude/commands/review.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/review.md
@@ -1,0 +1,80 @@
+---
+description: Review before merge. Five-axis code review across architecture, security, quality, tests, and standards. Run after /test.
+allowed-tools: Task, Skill, Read, Glob, Grep, Bash(git diff:*), Bash(git log:*), Bash(git status:*), Bash(gh:*)
+argument-hint: [branch-or-pr-number]
+---
+
+@CLAUDE.md
+
+Review: $ARGUMENTS
+
+If no argument, review the current branch diff against the base branch. Detect the base branch from `gh pr view --json baseRefName` or fall back to `main`.
+
+## Process
+
+Run axes sequentially. Each axis produces findings categorized as Critical, Important, or Suggestion.
+
+1. Read the diff (git diff against detected base branch)
+2. **Architecture pass**: Task(subagent_type="architect")
+3. **Security pass**: Task(subagent_type="security")
+4. **Quality pass**: Invoke Skill(skill="code-qualities-assessment")
+5. **Test pass**: Task(subagent_type="qa")
+6. **Standards pass**: Invoke Skill(skill="golden-principles") and Skill(skill="taste-lints")
+7. Synthesize findings across all axes
+
+## Axis 1: Architecture
+
+Task(subagent_type="architect"): You are a software architect reviewing for structural integrity. Check ADR conformance in .agents/architecture/. Evaluate from the consumer perspective, not the implementer perspective. Findings must cite file:line.
+
+- Follows existing patterns? Clean boundaries? Right abstraction level?
+- Coupling intentional? Cohesion strong?
+- ADR conformance? Any decisions that need a new ADR?
+
+## Axis 2: Security
+
+Invoke Skill(skill="security-scan") for CWE pattern detection.
+
+Task(subagent_type="security"): You are a security auditor. Assume every input is malicious. Reference CWE numbers. Evaluate:
+
+- Input validated? Secrets safe? Auth checked?
+- OWASP top 10? STRIDE threats?
+- New permissions, scopes, or access? Challenge each one (Principle of Least Privilege).
+
+## Axis 3: Code Quality
+
+Invoke Skill(skill="code-qualities-assessment") to score all 5 qualities: cohesion, coupling, encapsulation, testability, non-redundancy.
+
+- Cyclomatic complexity <=10? Methods <=60 lines?
+- DRY violations? Premature abstractions?
+
+## Axis 4: Test Completeness
+
+Task(subagent_type="qa"): You are a QA engineer verifying coverage. For every new code path in the diff, verify a corresponding test exists. Flag gaps with specific file:line references.
+
+- Every new code path has a test? Failure paths covered?
+- Acceptance criteria verified?
+
+## Axis 5: Standards
+
+Invoke Skill(skill="golden-principles") and Skill(skill="taste-lints").
+
+- Golden principle violations? Naming conventions?
+- Style enforcement? Consistency with existing patterns?
+
+## Principles
+
+- **Design to interfaces**: Review signatures from the consumer perspective. Hidden implementation details should stay hidden.
+- **Encapsulate what varies**: If the diff introduces variation, is it encapsulated? Or scattered?
+- **Chesterton's Fence**: Before removing code, verify you understand why it existed.
+- **Principle of Least Privilege**: New permissions, scopes, or access? Challenge each one.
+
+## Output
+
+Categorize each finding as **Critical**, **Important**, or **Suggestion**.
+
+Per-finding format:
+
+- Finding (what is wrong)
+- Location (file:line)
+- Severity (Critical/Important/Suggestion)
+- Fix (specific recommendation)
diff --git a/.claude/commands/ship.md b/.claude/commands/ship.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/ship.md
@@ -1,0 +1,53 @@
+---
+description: Ship it. Pre-flight validation, CI check, and PR creation. Run after /review.
+allowed-tools: Task, Skill, Read, Glob, Grep, Bash(git diff:*), Bash(git log:*), Bash(git status:*), Bash(git push:*), Bash(gh:*), Bash(python3:*)
+argument-hint: [target-branch]
+---
+
+@CLAUDE.md
+
+Ship: $ARGUMENTS
+
+Default target is main unless specified. If $ARGUMENTS names a different branch, use that as the target.
+
+## Pre-flight Checks
+
+Task(subagent_type="devops"): You are a release engineer. Run all 5 pre-flight checks below. Report pass/fail for each with specific evidence. Any failure blocks shipping.
+
+1. **Pipeline health** - Invoke Skill(skill="pipeline-validator"). All CI checks green? No suppressed failures?
+2. **Security posture** - Invoke Skill(skill="security-scan"). No new CWE findings? No secrets in diff?
+3. **Review complete** - Has /review been run on this branch? Any unresolved Critical findings? Check review logs.
+4. **Tests passing** - All tests green? No skipped tests without justification?
+5. **Standards clean** - Invoke Skill(skill="golden-principles") and Skill(skill="taste-lints"). Both pass?
+
+## Process
+
+1. Run all 5 pre-flight checks
+2. If any check fails: report what failed, why, and how to fix. Stop.
+3. If all pass: invoke Skill(skill="validate-pr-description") to validate PR metadata
+4. Create PR: invoke Skill(skill="push-pr") to commit, push, and open PR
+5. Report: what shipped, PR link, any warnings
+
+## Principles
+
+- **Faster is safer**: Small, frequent shipments reduce blast radius. Ship early.
+- **No deliberate debt**: If it is not ready, do not ship it. Fix it or defer it.
+- **Observability first**: If you cannot measure it, you cannot ship it safely.
+
+## Output
+
+Ship report:
+
+```text
+PRE-FLIGHT:
+ Pipeline: PASS|FAIL (evidence)
+ Security: PASS|FAIL (evidence)
+ Review: PASS|FAIL (evidence)
+ Tests: PASS|FAIL (evidence)
+ Standards: PASS|FAIL (evidence)
+
+RESULT: SHIPPED|BLOCKED
+PR: [link if created]
+WARNINGS: [non-blocking concerns]
+NEXT: [monitoring, follow-up items]
+```
diff --git a/.claude/commands/spec.md b/.claude/commands/spec.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/spec.md
@@ -1,0 +1,45 @@
+---
+description: Define what to build. Transform a problem into testable requirements with acceptance criteria.
+allowed-tools: Task, Skill, Read, Write, Glob, Grep
+argument-hint: [problem-statement-or-issue-number]
+---
+
+@CLAUDE.md
+
+Spec: $ARGUMENTS
+
+If $ARGUMENTS is empty, ask the user what problem to solve. Do not proceed without a problem statement.
+
+## Process
+
+1. Clarify the problem (what, who, why, constraints)
+2. Search for existing solutions in the codebase (grep for related patterns)
+3. Invoke Skill(skill="cva-analysis"): identify commonalities across use cases, then variabilities, then relationships
+4. Write requirements as testable acceptance criteria
+5. Task(subagent_type="analyst"): You are a requirements analyst. Your job is to find gaps, ambiguities, and untestable requirements. For each requirement, ask: can this be verified pass/fail? Flag anything vague.
+6. Invoke Skill(skill="decision-critic"): challenge assumptions before committing
+7. Task(subagent_type="critic"): You are a skeptical reviewer. Run a pre-mortem: assume this spec ships and fails. What broke first? What was missing?
+
+## Evaluation Axes
+
+1. **Problem clarity** - Is the right problem being solved? Could a reframing yield 10x impact?
+2. **Requirement testability** - Can each requirement be verified pass/fail?
+3. **Completeness** - No gaps between problem statement and acceptance criteria?
+4. **Traceability** - REQ to DESIGN to TASK linkage established?
+5. **Feasibility** - Buildable within constraints? Existing code to leverage?
+
+## Principles
+
+- **CVA**: Identify commonalities first, then variabilities, then relationships. Greatest risk is the wrong abstraction.
+- **YAGNI**: Only specify what is needed now. Speculative requirements create waste.
+- **Separation of Concerns**: Each requirement addresses one concern. Mixed concerns signal a missing decomposition.
+
+## Output
+
+Structured requirements document:
+
+- **Problem statement** (1-2 sentences)
+- **Acceptance criteria** (numbered, each independently testable as pass/fail)
+- **Out of scope** (explicit exclusions to prevent creep)
+- **Open questions** (unresolved unknowns with owners)
+- **CVA summary** (what is common, what varies, what relationships exist)
diff --git a/.claude/commands/test.md b/.claude/commands/test.md
new file mode 100644
--- /dev/null
+++ b/.claude/commands/test.md
@@ -1,0 +1,144 @@
+---
+description: Prove it works. Multi-dimensional quality validation across functional, non-functional, security, DevOps, DX, and observability. Run after /build.
+allowed-tools: Task, Skill, Read, Glob, Grep, Bash(git diff:*), Bash(git status:*), Bash(git log:*), Bash(gh:*), Bash(python3:*), Bash(pytest:*), Bash(npm test:*), Bash(uv:*), Bash(pester:*)
+argument-hint: [component-or-failure-description]
+---
+
+@CLAUDE.md
+
+Test: $ARGUMENTS
+
+If $ARGUMENTS is empty, test the current branch diff against the base branch.
+
+## Step 0: Classify PR Type
+
+Detect the base branch from `gh pr view --json baseRefName` or fall back to `main`. Run `git diff origin/<base-branch> --name-only` and classify changed files:
+
+| Type | Patterns | Gates to Run |
+|------|----------|--------------|
+| CODE | *.py, *.ps1, *.ts, *.js, *.cs | All 6 gates |
+| WORKFLOW | *.yml in .github/workflows/ | Gates 1, 3, 4 |
+| CONFIG | *.json, *.yaml (non-workflow) | Gates 3, 4 |
+| DOCS | *.md, *.txt, *.rst | Gate 5 only |
+| MIXED | Combination | Apply per-file rules |
+
+Print: `PR TYPE: [type]. Running gates: [list].`
+
+Skip non-applicable gates. Do not waste agent invocations on irrelevant dimensions.
+
+## Gate 1: Functional Testing
+
+Invoke Skill(skill="code-qualities-assessment") for quality baseline.
+
+Task(subagent_type="qa"): You are a senior QA engineer. Your job is to catch issues that will cause production incidents. Be skeptical. Cite specific file:line evidence for every finding. Evaluate:
+
+1. **Unit coverage** - Each method in isolation, dependencies injected. Every new function has at least 1 test.
+2. **Integration coverage** - Contracts between components verified. Cross-module boundaries exercised.
+3. **Acceptance coverage** - Each requirement has a passing test. Map to acceptance criteria from /spec output.
+4. **Edge cases** - Null/empty/boundary values, invalid types, concurrent access where applicable.
+5. **Error paths** - Every catch/error branch tested. No silent swallowing. Resources cleaned up on failure.
+6. **Regression risk** - High-risk areas (auth, data persistence, payments) require full coverage regardless of change size.
+
+Output: `VERDICT: PASS|WARN|CRITICAL_FAIL` with findings array.
+
+## Gate 2: Non-Functional Testing
+
+Task(subagent_type="analyst"): You are a performance and reliability engineer. Focus on failure modes, not the happy path. Use measurable criteria, not subjective judgments. Evaluate:
+
+1. **Performance** - No N+1 queries, no O(n*m) in hot paths, no blocking calls in async context.
+2. **Scalability** - Will this bottleneck under load? Connection pooling, caching strategy, pagination.
+3. **Reliability** - Retry logic, circuit breakers, graceful degradation. Failure modes tested.
+4. **Complexity** - Cyclomatic complexity <=10. Methods <=60 lines. No deep nesting.
+5. **Maintainability** - Readability, naming clarity, consistency with existing patterns.
+
+Output: `VERDICT: PASS|WARN|CRITICAL_FAIL` with findings array.
+
+## Gate 3: Security Testing
+
+Invoke Skill(skill="security-scan") for CWE pattern detection.
+
+Task(subagent_type="security"): You are a security auditor performing OWASP Top 10 review. Assume every input is malicious. Reference CWE numbers for every finding. Evaluate:
+
+1. **Injection** - Shell (CWE-78), XSS (CWE-79), SQL (CWE-89). No string interpolation in queries.
+2. **Authentication** - Session handling, credential storage, token validation.
+3. **Secrets** - No hardcoded API keys, passwords, tokens in diff. Secrets via environment only.
+4. **Input validation** - All user-facing inputs validated. LLM output treated as untrusted.
+5. **Dependencies** - New packages reviewed for known vulnerabilities. Versions pinned.
+
+Output: `VERDICT: PASS|WARN|CRITICAL_FAIL` with findings array including CWE references.
+
+## Gate 4: DevOps Testing
+
+Task(subagent_type="devops"): You are a build and release engineer. Focus on pipeline safety, reproducibility, and supply chain security. Evaluate:
+
+1. **Pipeline impact** - Do changes affect CI/CD? Are workflow files valid YAML?
+2. **Actions security** - Pinned to SHA? Permissions scoped minimally? No secrets in logs?
+3. **Shell quality** - Input sanitization, exit code handling, error propagation.
+4. **Build reproducibility** - Deterministic builds, locked dependencies, no floating versions.
+5. **Artifact integrity** - Correct upload/download, retention policy, no sensitive data in artifacts.
+
+Output: `VERDICT: PASS|WARN|CRITICAL_FAIL` with findings array.
+
+## Gate 5: Developer Experience (DX)
+
+Task(subagent_type="critic"): You are a developer advocate reviewing from the consumer perspective. Would a new contributor understand this code? Would the API frustrate or delight? Evaluate:
+
+1. **API ergonomics** - Consumer perspective. Are signatures intuitive? Error messages helpful?
+2. **Documentation** - Is changed behavior documented? Are code comments accurate (not stale)?
+3. **Debuggability** - Can a developer diagnose failures from logs alone? Stack traces preserved?
+4. **Onboarding** - Would a new contributor understand this code? Are conventions followed?
+5. **Tooling** - Does this work with existing linters, formatters, IDE support?
+
+Output: `VERDICT: PASS|WARN|CRITICAL_FAIL` with findings array.
+
+## Gate 6: Observability and Monitoring
+
+Task(subagent_type="architect"): You are an SRE reviewing production readiness. If this code fails at 3am, can oncall diagnose it without reading the source? Evaluate:
+
+1. **Logging** - Are meaningful events logged? Structured logging with correlation IDs?
+2. **Metrics** - Are SLIs defined for new features? Latency, error rate, throughput tracked?
+3. **Alerting** - Would failures trigger alerts? Are thresholds appropriate?
+4. **Tracing** - Are distributed traces propagated? Span context preserved across boundaries?
+5. **Health checks** - New services have liveness/readiness probes? Degradation detectable?
+
+Output: `VERDICT: PASS|WARN|CRITICAL_FAIL` with findings array.
+
+## Principles
+
+- **Testability is design feedback**: Hard to test means poor encapsulation, tight coupling, Law of Demeter violation, weak cohesion, or procedural code.
+- **Tests are proof**: A passing test is evidence. A missing test is a gap in knowledge.
+- **Hypothesis-driven debugging**: When a test fails, form a hypothesis before changing code. Verify the hypothesis. Then fix.
+- **Defense in depth**: Assume the happy path works. Focus on failure modes.
+
+## Process
+
+1. Identify what changed (git diff against base branch)
+2. Classify PR type (Step 0). Skip non-applicable gates.
+3. Run applicable gates sequentially. Each gate dispatches its own agent.
+4. If any gate produces CRITICAL_FAIL: continue remaining gates (findings are additive). Mark overall verdict as CRITICAL_FAIL immediately.
+5. For test failures: hypothesis, verify, fix (never change code without understanding why)
+6. Invoke Skill(skill="quality-grades") to synthesize gate verdicts into overall quality score.
+
+## Output
+
+Each gate MUST produce a verdict line and findings array:
+
+```text
+GATE: [name]
+VERDICT: PASS|WARN|CRITICAL_FAIL
+FINDINGS:
+- [SEVERITY] (file:line) description — recommendation
+```
+
+Synthesize into overall report:
+
+| Gate | Verdict | Findings | Evidence |
+|------|---------|----------|----------|
+| Functional | PASS/WARN/CRITICAL_FAIL | Count | file:line citations |
+| Non-Functional | PASS/WARN/CRITICAL_FAIL | Count | file:line citations |
+| Security | PASS/WARN/CRITICAL_FAIL | Count | CWE references |
+| DevOps | PASS/WARN/CRITICAL_FAIL | Count | file:line citations |
+| DX | PASS/WARN/CRITICAL_FAIL | Count | file:line citations |
+| Observability | PASS/WARN/CRITICAL_FAIL | Count | file:line citations |
+
+**Overall verdict**: CRITICAL_FAIL if any gate fails. WARN if any gate warns. PASS if all gates pass.
diff --git a/.claude/commands/workflow/0-init.md b/.claude/commands/workflow/0-init.md
deleted file mode 100644
--- a/.claude/commands/workflow/0-init.md
+++ /dev/null
@@ -1,48 +1,0 @@
----
-description: Session initialization - enforce ADR-007 memory-first architecture at session start. Loads project context, creates session log, and declares current branch via Invoke-Init.ps1.
-argument-hint: [--session-number N] [--objective "text"]
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(pwsh .claude/skills/session-init/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
-model: sonnet
----
-
-# /0-init — Session Initialization
-
-Enforce ADR-007 memory-first architecture at session start.
-
-## Context
-
-Recent sessions: !`ls -1 .agents/sessions/ | tail -5`
-
-Current branch: !`git branch --show-current`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-Init.ps1 $ARGUMENTS
-```
-
-## What This Command Does
-
-1. **Load project context** — initializes session state via Agent Orchestration MCP (graceful fallback if unavailable)
-2. **Load initial instructions** — read AGENTS.md for current project rules
-3. **Read HANDOFF.md** — load prior session context (read-only)
-4. **Surface prior context** — retrieves relevant session history via Agent Orchestration MCP (graceful fallback if unavailable)
-5. **Create session log** — via `New-SessionLog.ps1`
-6. **Declare current branch** — output git branch for orientation
-7. **Record evidence** — persist session state (graceful fallback if unavailable)
-
-## Arguments
-
-- `--session-number N`: Optional. Auto-detected from `.agents/sessions/`.
-- `--objective "text"`: Optional. Derived from branch name if omitted.
-
-## Related
-
-- Protocol: `.agents/SESSION-PROTOCOL.md`
-- ADR-007: `.agents/architecture/ADR-007-memory-first-architecture.md`
-- Session Init Skill: `.claude/skills/session-init/SKILL.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/1-plan.md b/.claude/commands/workflow/1-plan.md
deleted file mode 100644
--- a/.claude/commands/workflow/1-plan.md
+++ /dev/null
@@ -1,54 +1,0 @@
----
-description: Planning phase - route task to planner (default), architect (--arch), or roadmap→high-level-advisor chain (--strategic).
-argument-hint: [--arch] [--strategic] <task-description>
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
- - mcp__agent_orchestration__invoke_agent
- - mcp__agent_orchestration__track_handoff
- - mcp__agent_orchestration__get_routing_recommendation
-model: sonnet
----
-
-# /1-plan — Planning Phase
-
-Route a planning task to the appropriate agent.
-
-## Context
-
-Current branch: !`git branch --show-current`
-
-Recent commits: !`git log --oneline -5`
-
-Planning artifacts: !`ls -1 .agents/planning/ 2>/dev/null | tail -10`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-Plan.ps1 $ARGUMENTS
-```
-
-## Variants
-
-| Flag | Agent | Use When |
-|------|-------|----------|
-| *(none)* | `planner` | Standard feature/task planning |
-| `--arch` | `architect` | Design decisions, ADR-worthy choices |
-| `--strategic` | `roadmap → high-level-advisor` | Roadmap, epics, strategic alignment |
-
-## Arguments
-
-- `--arch`: Use architect agent instead of planner.
-- `--strategic`: Chain roadmap agent → high-level-advisor.
-- Remaining text: Task description passed to agent.
-
-## Output
-
-Planning artifacts stored in `.agents/planning/`.
-
-## Related
-
-- ADR-013: `.agents/architecture/ADR-013-agent-orchestration-mcp.md`
-- Agent Orchestration Spec: `.agents/specs/agent-orchestration-mcp-spec.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/2-impl.md b/.claude/commands/workflow/2-impl.md
deleted file mode 100644
--- a/.claude/commands/workflow/2-impl.md
+++ /dev/null
@@ -1,49 +1,0 @@
----
-description: Implementation phase - invoke implementer agent (default), or run full sequential chain (--full), or parallel execution of implementer+qa+security (--parallel).
-argument-hint: [--full] [--parallel] <implementation-task>
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
- - mcp__agent_orchestration__invoke_agent
- - mcp__agent_orchestration__track_handoff
- - mcp__agent_orchestration__start_parallel_execution
- - mcp__agent_orchestration__aggregate_parallel_results
- - mcp__agent_orchestration__resolve_conflict
-model: sonnet
----
-
-# /2-impl — Implementation Phase
-
-Invoke the implementer agent, optionally chaining QA and security.
-
-## Context
-
-Planning artifacts: !`ls -1 .agents/planning/ 2>/dev/null | tail -10`
-
-Current branch: !`git branch --show-current`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-Impl.ps1 $ARGUMENTS
-```
-
-## Execution Modes
-
-| Flag | Mode | Description |
-|------|------|-------------|
-| *(none)* | Default | Implementer agent only |
-| `--full` | Sequential | implementer → qa → security |
-| `--parallel` | Parallel | implementer + parallel(qa, security) |
-
-## Arguments
-
-- `--full`: Run full sequential chain after implementation.
-- `--parallel`: Run QA and security in parallel after implementation.
-- Remaining text: Implementation task description.
-
-## Related
-
-- ADR-013: `.agents/architecture/ADR-013-agent-orchestration-mcp.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/3-qa.md b/.claude/commands/workflow/3-qa.md
deleted file mode 100644
--- a/.claude/commands/workflow/3-qa.md
+++ /dev/null
@@ -1,45 +1,0 @@
----
-description: Quality assurance - invoke QA agent, validate test coverage, check acceptance criteria, and report results.
-argument-hint: [--coverage-threshold N] <verification-scope>
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
- - mcp__agent_orchestration__invoke_agent
- - mcp__agent_orchestration__track_handoff
-model: sonnet
----
-
-# /3-qa — Quality Assurance
-
-Invoke the QA agent and validate implementation quality.
-
-## Context
-
-Implementation artifacts: !`ls -1 .agents/sessions/ | tail -3`
-
-Current branch: !`git branch --show-current`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-QA.ps1 $ARGUMENTS
-```
-
-## What This Command Does
-
-1. Invoke `qa` agent via Agent Orchestration MCP
-2. Validate test coverage against threshold (default: 80%)
-3. Check acceptance criteria from planning artifacts
-4. Report pass/fail with details
-5. Track handoff back to orchestrator
-
-## Arguments
-
-- `--coverage-threshold N`: Minimum coverage percentage (default: 80).
-- Remaining text: Verification scope.
-
-## Related
-
-- ADR-006: `.agents/architecture/ADR-006-thin-workflows-testable-modules.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/4-security.md b/.claude/commands/workflow/4-security.md
deleted file mode 100644
--- a/.claude/commands/workflow/4-security.md
+++ /dev/null
@@ -1,46 +1,0 @@
----
-description: Security review - invoke security agent with OWASP Top 10 check, secret detection, and dependency audit.
-argument-hint: [--owasp-only] [--secrets-only] <security-scope>
-allowed-tools:
- - Bash(pwsh .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - Bash(ls:*)
- - Read
- - mcp__agent_orchestration__invoke_agent
- - mcp__agent_orchestration__track_handoff
-model: opus
----
-
-# /4-security — Security Review
-
-Comprehensive security assessment using the security agent.
-
-## Context
-
-Implementation artifacts: !`ls -1 .agents/sessions/ | tail -3`
-
-Current branch: !`git branch --show-current`
-
-## Invocation
-
-```bash
-pwsh .claude/skills/workflow/scripts/Invoke-Security.ps1 $ARGUMENTS
-```
-
-## What This Command Does
-
-1. Invoke `security` agent via Agent Orchestration MCP (model: opus per ADR-013)
-2. OWASP Top 10 check (skipped with `--secrets-only`)
-3. Secret detection scan (skipped with `--owasp-only`)
-4. Dependency audit for known vulnerabilities
-5. Generate security report with findings
-
-## Arguments
-
-- `--owasp-only`: Run only OWASP Top 10 check.
-- `--secrets-only`: Run only secret detection.
-- Remaining text: Security scope.
-
-## Related
-
-- ADR-013: `.agents/architecture/ADR-013-agent-orchestration-mcp.md`
\ No newline at end of file
diff --git a/.claude/commands/workflow/9-sync.md b/.claude/commands/workflow/9-sync.md
deleted file mode 100644
--- a/.claude/commands/workflow/9-sync.md
+++ /dev/null
@@ -1,116 +1,0 @@
----
-description: Auto-generate session documentation. Queries session history, generates workflow diagrams, updates session logs, and syncs memory. Use at the end of any workflow to capture what happened.
-model: sonnet
-argument-hint: [--dry-run]
-allowed-tools:
- - Bash(python .claude/skills/workflow/scripts/*)
- - Bash(git:*)
- - mcp__serena__*
- - mcp__forgetful__*
----
-
-# /9-sync — Auto-Documentation & Memory Sync
-
-Generate comprehensive session documentation automatically.
-
-## Overview
-
-This command closes the workflow loop by documenting what happened during a session. It:
-
-1. Collects session history (agents invoked, tools used, files changed)
-2. Generates a workflow sequence diagram (Mermaid)
-3. Extracts key decisions and artifacts
-4. Appends documentation to the session log
-5. Syncs context to Serena memory for cross-session persistence
-6. Suggests retrospective learnings
-
-## Execution Steps
-
-### Step 1: Gather Session Context
-
-Collect the current session state:
-
-```bash
-# Get current branch and recent commits
-git log --oneline -20 --since="$(date -d '8 hours ago' --iso-8601)" 2>/dev/null || git log --oneline -20
-
-# Get files changed in this session
-git diff --stat HEAD~10..HEAD 2>/dev/null || git diff --stat main..HEAD
-
-# Get current session log if it exists
-ls -t .agents/sessions/*.json 2>/dev/null | head -1
-```
-
-### Step 2: Generate Session Documentation
-
-Run the sync script to produce the session documentation:
-
-```bash
-python .claude/skills/workflow/scripts/sync_session_documentation.py $ARGUMENTS
-```
-
-This script will:
-
-- Scan git history for session commits
-- Identify agents referenced in commit messages
-- Generate a Mermaid sequence diagram
-- Produce a structured session summary
-
-### Step 3: Extract Decisions and Artifacts
-
-From the session context, identify:
-
-- **Decisions made**: ADRs created/modified, design choices documented
-- **Artifacts created**: New files, modified scripts, PRs opened
-- **Issues referenced**: GitHub issues addressed or discovered
-- **Risks identified**: Any blockers or concerns raised
-
-### Step 4: Update Session Log
-
-Append the sync output to the current session log in `.agents/sessions/`. The entry MUST include:
-
-| Field | Description |
-|-------|-------------|
... diff truncated: showing 800 of 919 linesYou can send follow-ups to the cloud agent here.
Pull Request
Summary
Replace legacy numbered workflow commands with 6 lifecycle slash commands following the Osmani agent-skills pattern. Commands are stack-agnostic, platform-agnostic, and host-agnostic.
Added:
.claude/commands/spec.md- Define what to build.claude/commands/plan.md- Plan how to build.claude/commands/build.md- Build incrementally.claude/commands/test.md- Multi-dimensional quality validation.claude/commands/review.md- Five-axis code review.claude/commands/ship.md- Pre-flight validation and PR creationDeleted:
.claude/commands/workflow/0-init.md.claude/commands/workflow/1-plan.md.claude/commands/workflow/2-impl.md.claude/commands/workflow/3-qa.md.claude/commands/workflow/4-security.md.claude/commands/workflow/9-sync.mdModified:
AGENTS.md- Lifecycle reference replaces workflow referenceCLAUDE.md- Separate skill routing from lifecycle commands.markdownlint-cli2.yaml- Exclude lifecycle commands from linting.claude/skills/workflow/SKILL.md- Marked DEPRECATED.claude/skills/prompt-engineer/SKILL.md- Model bumped to claude-sonnet-4-6Specification References
Changes
Type of Change
Testing
Agent Review
Security Review
Other Agent Reviews
Checklist
Related Issues
Closes #1609