Spec-driven development is a methodology that treats specifications as executable contracts from which AI agents derive code, preventing architectural drift through automated enforcement rather than passive documentation.
TL;DR
Spec-driven development (SDD) turns specifications from passive documentation into executable contracts that constrain what AI agents generate. A good spec defines six elements: outcomes, scope boundaries, constraints, prior decisions, task breakdown, and verification criteria. SDD catches architectural violations and API contract drift that unit tests structurally cannot, and pairs with multi-agent patterns (Coordinator, Implementor, Verifier) to scale across parallel work.
Most teams I've worked with discover SDD reactively: AI-generated code passes unit tests but violates architectural patterns, breaks API integration contracts, or introduces security anti-patterns that surface only in production. The arXiv paper "Spec-Driven Development: From Code to Contract in the Age of AI" (Feb 2026) frames the core distinction: traditional specs are read by humans, while SDD specs execute as validation gates.
See how Intent turns executable specs into enforced architectural contracts, with semantic dependency mapping across 400,000+ files.
Free tier available · VS Code extension · Takes 2 minutes
Why Spec-Driven Development Matters Now
Three forces converged in 2025-2026 that make SDD the workflow I default to when AI-generated code needs to survive in production.
AI code generation works at scale, and so do its vulnerabilities. LLMs generate vulnerable code at rates ranging from 9.8% to 42.1% across benchmarks (Yan et al., 2025), and surviving AI-introduced issues in production repositories had topped 110,000 by February 2026. SDD embeds executable specifications as active validation gates against exactly these failures.
Compliance requirements now treat specifications as evidence. The EU AI Act requires high-risk AI systems to comply with obligations starting August 2, 2026, with fines of up to €15 million or 3% of global annual turnover for non-compliance with high-risk obligations.
Distributed architectures demand formal governance. Deloitte's State of AI 2026 reports that only one in five companies has a mature governance model for autonomous AI agents. Without structured specifications governing cross-service coordination, I've watched teams hit compounding integration failures as their multi-repository architectures scale.
The Data-Backed Case: Why AI-Generated Code Needs Specification Gates
A SonarQube analysis of five LLMs (arXiv, Aug 2025) generating Java code found over 70% of Llama 3.2 90B's detected vulnerabilities rated BLOCKER severity, and roughly two-thirds of GPT-4o's and OpenCoder-8B's rated BLOCKER or CRITICAL. The pattern repeats across the literature.
Pearce et al. (IEEE S&P, 2023) found roughly 40% of programs generated in security-sensitive contexts contained vulnerabilities. Yan et al. (2025) put the range at 9.8% to 42.1% across their benchmarks. A catalog by Fu et al. (ACM TOSEM, 2025) identified 43 CWEs across three AI code-generation tools. By February 2026, a large-scale empirical study (arXiv, 2026) had counted more than 110,000 surviving AI-introduced issues in production repositories.
These findings match what I've seen in practice. Unit tests verify individual functions; they don't catch architectural violations, API contract drift, or security anti-patterns that emerge across service boundaries. SDD specifications operate at the system level, catching defect classes unit tests structurally cannot.
How SDD Differs from PRDs, Design Docs, TDD, and BDD
An SDD spec isn't a PRD or a design doc with a new label. The distinction I keep coming back to: a PRD or design doc is written for human readers who can interpret ambiguity and fill gaps from organizational context. AI agents fill gaps too, but not in the way you'd want. Without explicit scope, agents make assumptions and head in the wrong direction fast.
| Artifact | Primary Reader | How Ambiguity Is Resolved | Update Cadence |
|---|---|---|---|
| PRD | Product and engineering humans | Conversation, tribal knowledge | Infrequent, often stale |
| Design Doc | Engineering peers | Shared context, review comments | Point-in-time artifact |
| SDD Spec | AI agent + CI pipeline | Explicit constraints + verification rules | Living document, updated as work progresses |
SDD also operates at a different architectural layer than the code-level methodologies I work with day-to-day. These distinctions matter because they let me integrate SDD alongside TDD and BDD rather than replace either.
| Dimension | TDD | BDD | Vibe Coding | SDD |
|---|---|---|---|---|
| Primary artifact | Unit tests | Given-When-Then scenarios | Natural language prompts | Executable specifications |
| Scope | Individual function correctness | Cross-functional behavior | Full application generation | System-wide architectural contracts |
| Validation mechanism | Automated test suites | Human-referenced documentation | Manual review (if any) | Build fails on spec divergence |
| AI governance | None built-in | None built-in | None built-in | Constitutional constraints and checkpoints |
| Where truth lives | Test suite | Workshop artifacts | Prompt history | Versioned specification |
TDD drives interface design through red-green-refactor cycles at the unit level. I keep TDD for implementation verification and layer SDD on top for architectural constraints.
BDD creates Given-When-Then scenarios through cross-functional workshops. SDD can incorporate these scenarios, but with executability: BDD scenarios often exist as documentation teams reference, while SDD transforms them into executable validation gates.
Vibe coding uses AI models to build applications from natural language prompts with minimal structured review. The MSR '26 study (arXiv, Nov 2025) of Cursor AI adoption across 807 GitHub repositories found transient velocity gains alongside persistent code complexity increases. SDD defines constraints up front to prevent that drift.
The Six Elements of a Good Spec
A spec for an AI agent needs to answer six questions. Leave any of them open and the agent will answer them for you, in ways you won't like.
1. Outcomes when the work is done. Not "build an auth flow." Something closer to: "A user can sign up with email/password, receive a verification email, and log in without error. The session persists across page refreshes." Outcome statements force clarity that feature names don't.
2. In-scope and explicitly out-of-scope boundaries. The out-of-scope list matters at least as much as the in-scope list. Agents expand scope if you don't close the door on it. "OAuth is out of scope for this task" is not obvious to an agent that has learned that auth systems usually include OAuth.
3. Constraints and assumptions. Existing tech stack decisions, third-party API limits, performance requirements. If it affects implementation choices and isn't obvious from the codebase alone, it belongs in the spec. Pairing specs with an AGENTS.md file gives the agent persistent project context alongside task-specific scope.
4. Decisions already made. If you've chosen the database schema or the encryption library, say so. Agents that don't know a decision has been made will make their own. Document your decisions before delegating the work.
5. Task breakdown. One of the biggest AI failure modes is asking for too much in one shot. A breakdown into discrete sub-tasks lets individual agents work on each one, verify as they go, and operate in parallel when they're not touching the same files.
6. Verification criteria. Acceptance criteria and verification steps. Not "does it work" but: what tests pass and what edge cases are handled. This is what the verifier uses. If you're running an adversarial agent pattern (below), the verification plan is what it checks against.
Core SDD Patterns: Spec-First, Spec-Anchored, and Spec-as-Source
When I'm adopting SDD with a team, I pick one of three patterns based on context. Each represents a different level of specification authority over code generation.
| Pattern | Specification Role | Code Role | Best For |
|---|---|---|---|
| Spec-First | Guides and constrains AI output | Primary deliverable | Teams beginning SDD adoption |
| Spec-Anchored | Governs with checkpoints and constitutional constraints | Validated deliverable | Enterprise teams needing audit trails |
| Spec-as-Source | Literal source code | Generated artifact | API-first domains with mature tooling |
Spec-first development is where I start most teams. Specs come before code and constrain what AI agents generate, while code remains the primary deliverable.
Spec-anchored development adds governance layers, constitutional constraints, and supervision checkpoints. I reach for this pattern when regulatory requirements demand audit trails, multiple teams coordinate across services, or AI-generated code needs human approval before merging. The follow-on Constitutional SDD paper (arXiv, Feb 2026) formalizes it, embedding security constraints with explicit CWE vulnerability mappings.
Spec-as-source development is the furthest end of the spectrum: specs literally become source code. The ThoughtWorks Technology Radar (Volume 33, 2025) places SDD in the "Assess" ring and warns of "a bias toward heavy up-front specification and big-bang releases" as an antipattern.
The Adversarial Agent Pattern
The most underused pattern in spec-driven development is assigning a separate agent to check the work rather than trusting the implementing agent to self-verify.
The structure: a Coordinator breaks down the spec and delegates tasks to Implementor sub-agents. Each Implementor works from its own sub-spec. A Verifier agent then checks the output against the spec before marking the work complete. The Implementors and the Verifier have opposing goals. One is optimizing for completing the task, the other for finding failures.
Implementing agents are optimistic about their own output. A separate Verifier has a cleaner signal. The pattern forces the spec to contain explicit verification criteria, which improves the spec itself. It also makes parallel agent workflows safer: multiple Implementors can run simultaneously while the Verifier catches conflicts before they merge.
Sub-agents update the spec in real time as they progress, so the Coordinator always has a current picture of where things stand.
See how Intent coordinates Implementor and Verifier agents through shared living specs, across 400,000+ files.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
When to Use a Spec, and When Not To
Most spec-driven development guides skip this part, which makes them read like a pitch. The caveat: not every task needs a detailed spec. Spec overhead is real cost, and I've wasted it on small fixes before where one prompt to one agent would have been faster.
| Write a Spec When | Skip the Spec When |
|---|---|
| Work spans multiple agent sessions | Work is exploratory or experimental |
| Multiple services or repositories are involved | A single prompt can produce usable output |
| Reversing a wrong interpretation is expensive | Output can be reviewed in under five minutes |
| Compliance or audit trail is required | Prototype is meant to be thrown away |
| Review will require real attention (component logic, end-to-end flows) | Change is mechanical or low-risk |
The trigger I use: if I'd be annoyed to have the agent interpret requirements differently than I meant, I write the spec. If I could fix the output in a quick follow-up prompt, I skip the spec and prompt directly.
How Spec Kit Enforces Specifications
GitHub Spec Kit is the open-source scaffolding I recommend for teams getting started. It's a Python CLI with 88k stars and 129 releases through April 2026, supporting 28 named AI agent platforms. The workflow runs through four commands: /speckit.specify captures business context and success criteria, /speckit.plan translates specs into architectural decisions, /speckit.tasks decomposes plans into testable units, and /speckit.implement runs AI agents under those constraints.
The payoff is what InfoQ analysis emphasizes: "With AI-generated code, a code issue is an outcome of a gap in the specification. Because of non-determinism in AI generation, that gap keeps resurfacing in different forms whenever the code is regenerated." Here's what that looks like concretely.
Before SDD (without spec): A payment endpoint ships without an idempotency constraint. Retry logic creates duplicate charges in production. The team patches the code, but the next AI regeneration cycle reintroduces the same vulnerability because no specification encodes the constraint.
After SDD (with spec):
The build fails before code reaches review whenever any AI agent generates a charges endpoint without idempotency enforcement.
SDD Tooling Comparison
The spec-driven development tooling I evaluate teams against spans open-source frameworks, API specification platforms, and enterprise-grade control planes.
| Tool | Spec Formats | CI/CD Enforcement | AI Agent Compatible | Best For |
|---|---|---|---|---|
| GitHub Spec Kit | Markdown/structured | Via agent workflows | 28 platforms | Teams adopting SDD workflows with AI agents |
| SwaggerHub / API Hub | OpenAPI, AsyncAPI | CLI + Git integration | MCP Server | API-first teams needing lifecycle management |
| Postman Spec Hub | OpenAPI, multi-protocol | GitHub sync, CI runner | MCP servers; Claude plugin | Full API lifecycle with governance |
| Spectral | OpenAPI, AsyncAPI, JSON Schema | CLI exit codes | Indirect | API linting and standards enforcement |
| PactFlow | Pact + OpenAPI | can-i-deploy gating | Partial | Contract testing across service boundaries |
| Specmatic | OpenAPI (executable) | Yes | Agent-ready | Executable API contract enforcement |
| TypeSpec | TypeSpec → OpenAPI | Via downstream toolchain | Yes (generates OpenAPI) | Azure/Microsoft ecosystem teams |
InfoQ notes the limitation I hit most often with enterprise teams: current tools "typically keep specs co-located with code in a single repository," while "modern architectures span microservices, shared libraries and infrastructure repositories."
Intent's Context Engine addresses this gap by processing 400,000+ files through semantic dependency graph analysis, making multi-repository coordination viable at enterprise scale. Intent carries SOC 2 Type II and ISO/IEC 42001 certifications, the first AI coding assistant with ISO/IEC 42001 for AI-specific governance.
What This Looks Like in Practice
The six-element framework applied to a real project: a 10-page product site, fully comped out in Figma with a design system covering navigation, landing pages, feature pages, pricing, and docs layout. The kind of project that takes a couple of days for an experienced frontend engineer, longer if you're learning the component library.
Step 1: Figma MCP Pulls the Design System Into Context
With the Figma MCP connected to Intent, the Coordinator agent reads the Figma file before writing a single line of the spec. It pulls component specs (button variants, card layouts, typography scale), layout constraints (grid structure, spacing tokens, breakpoints), and the page hierarchy. The Coordinator writes that design context into the spec, so the agents implementing each page work from the same component library and spacing rules the designer defined.
Without structured design context, agents invent their own component patterns. One agent builds a card with 16px padding; another uses 24px. One agent creates a button variant that doesn't exist in the design system. Connecting the Figma MCP eliminates that divergence before code generation starts.

The Coordinator extracts brand colors from the Figma design system and writes them into the spec alongside acceptance criteria, non-goals, and assumptions.
Step 2: The Coordinator Decomposes by Page
The Coordinator generated a spec with page-level task assignments: one task per page, each with its own acceptance criteria, component requirements, and layout constraints pulled from the Figma context. Page-level decomposition works for parallel execution here because pages in a static site don't write to the same files. The Coordinator assigned shared components (navigation, footer) to a dedicated task that ran first, so page-level agents could import from a stable component library rather than each building their own.
The spec included constraints: responsive breakpoints matching the Figma frames, design token values for colors and spacing, and which shared components each page needed to import. Each Implementor agent received a self-contained task contract with enough context to build its page without needing to read the other agents' work.
Step 3: Parallel Agents Execute in Isolated Worktrees
Sub-agents handled pages in parallel, each in its own git worktree. The navigation and footer agent finished first. Page agents picked up the shared components and built against them. The Verifier checked each page against its spec criteria: correct components used, design tokens applied, responsive behavior matching the Figma breakpoints.
The project hit 95% completion in about 45 minutes across a few sessions. The remaining 5% was fine-detail work: spacing tweaks and hover state refinements.

Three landing page tasks completed in parallel, with four more queued. The Coordinator confirms page tasks can run simultaneously since they write to separate files, with the Verifier running after all pages are built.
The Designer Handoff
The designer on the project had never used Git. In a standard workflow, that 5% of fine-tuning would have gone back to the engineer as a list of revision requests, each requiring a context switch between Figma and code. Instead, the designer opened the project in Intent and iterated on the details without engineering support. Adjusting a spacing value or swapping a color token didn't require understanding the codebase, because the spec-driven structure had organized the project into discrete, navigable components tied to specific Figma frames.
That handoff wouldn't work with prompt-driven output. When agents produce monolithic code from a vague prompt, only the person who wrote the prompt can make sense of the result. When agents build from a structured spec with page-level decomposition, the project is navigable by anyone who can read the spec.
Why Context Precision Matters More Than Context Volume
The other takeaway from this workflow: the less irrelevant context each agent carried, the better the output. A million tokens of codebase context is not the advantage it appears to be. Agents perform better with precise, task-relevant context than with broad exposure to the full repository. The Figma MCP + spec combination works because it narrows each agent's context to the design constraints and acceptance criteria for its specific task, rather than flooding it with the entire codebase.
Model Tiering Inside Spec-Driven Workflows
The Coordinator/Implementor/Verifier pattern lets you assign different models to different roles based on what each role needs.
Writing the spec deserves the most capable model available. Errors in the spec propagate through everything downstream, so underinvesting here is the most expensive mistake you can make.
Implementing works well with a mid-range model running moderate thinking (Sonnet-class, GPT-5.1-Codex). You don't need the most expensive model for execution once the spec is solid.
Verifying needs a fast model. The Verifier checks specific criteria against specific output: it needs accuracy and low cost, not deep reasoning.
Multi-agent workflows run more model calls than single-agent prompting. Spending top-model allocation on every sub-task multiplies cost without proportional benefit. Getting the allocation backward (cheap model on the spec, expensive model on implementation) costs more in correction loops than the tiering saves.
Brownfield Adoption: Applying SDD to Existing Codebases
Brownfield SDD is categorically different from greenfield, and I approach it in three phases. The foundational SDD paper (arXiv, Feb 2026) articulates why: "By extracting specs from legacy code, teams can verify that modernization efforts preserve required functionality while eliminating undocumented behaviors."
Phase 1: Reconstruct existing behavior before writing new specs. AI-assisted reverse engineering works best starting from visible artifacts (UI elements, binaries, data lineage), enriching them incrementally, and maintaining traceability back to source.
Phase 2: Spec the area of change, not the whole system. Trying to retroactively spec entire systems is impractical. The InfoQ enterprise adoption analysis is explicit: "the spec needs to be most granular near the area of change." Each bug fix, feature addition, or refactoring becomes an opportunity to add specifications for the code being touched.
Phase 3: Enforce specs in CI incrementally. Preventing drift from accumulating is more practical than periodically reconciling diverged specifications.
As InfoQ acknowledges: "SDD does not remove complexity; it simply relocates it." Specifications inherit all the properties of source code: technical debt, cross-team coupling, and architectural gravity. Intent supports enterprise-scale SDD adoption by processing 400,000+ files through architectural analysis.
Limitations of Spec-Driven Development
SDD isn't right for every context. Here's where I've seen it struggle or not earn its overhead:
- Exploratory work: SDD struggles when requirements can't be known upfront. R&D work and experimentation benefit from lighter approaches.
- Rapid prototyping: When the timeline to first user feedback is measured in days, SDD's upfront specification requirements create expensive regeneration cycles.
- Small teams and high-change environments: For teams of 2-5 developers, specification overhead can consume a disproportionate amount of development time.
- Legacy systems requiring extensive documentation: Creating specifications accurate enough for AI generation requires reverse-engineering years of implicit business logic. A known limitation in Spec Kit (GitHub issue #1191) is that the workflow is optimized for net-new feature creation, making it difficult to update existing specifications.
Start Enforcing Specs Before Your Next AI-Generated Deployment
Spec-driven development shifts specifications from passive documentation to executable build gates that enforce architectural contracts across every code generation cycle. LLMs optimize narrowly for functional correctness. Enterprise systems need architectural consistency and regulatory compliance on top of that, and SDD closes the gap.
Start with a Spec-First pattern on a single service with an existing OpenAPI contract, integrate GitHub Spec Kit into your CI/CD pipeline, and expand to Spec-Anchored governance as multi-team coordination grows. For teams managing multi-repository architectures, Intent's Context Engine processes 400,000+ files through semantic dependency graph analysis, backed by SOC 2 Type II and ISO/IEC 42001 governance.
See how Intent's living specs keep parallel agents aligned as your plan evolves.
Free tier available · VS Code extension · Takes 2 minutes
FAQs about Spec-Driven Development
Related Guides
Written by

Molisha Shah
Molisha is an early GTM and Customer Champion at Augment Code, where she focuses on helping developers understand and adopt modern AI coding practices. She writes about clean code principles, agentic development environments, and how teams are restructuring their workflows around AI agents. She holds a degree in Business and Cognitive Science from UC Berkeley.
