What makes spec-driven development different from writing detailed requirements documents?

Traditional requirements documents are passive artifacts read by humans. SDD specifications serve as validation gates in CI/CD pipelines, automatically failing builds when implementations diverge from defined constraints.

How do teams implement SDD without replacing existing TDD or BDD practices?

SDD operates at the architectural layer while TDD operates at the unit level. I keep TDD for implementation verification and fold BDD scenarios into specs as executable validation rules for cross-service contracts. SDD adds a layer; it doesn't replace one.

What failure modes does SDD prevent, and what tradeoffs remain?

SDD catches architectural violations, API contract drift, and recurring security anti-patterns that functional tests miss across service boundaries. It doesn't remove complexity, though; it relocates it into specifications that still need to be reviewed, versioned, and maintained. Stale specs create false confidence, so spec upkeep is ongoing work.

Does spec-driven development support compliance and audit requirements?

The EU AI Act already treats technical documentation and downstream compliance information as important evidence, with high-risk obligations currently set to apply from August 2, 2026. SDD creates an auditable trail from requirement to implementation, which is why teams in regulated environments use it to strengthen governance.

Does spec-driven development work for large codebases and multiple repositories?

Single-repository SDD tools face limitations in multi-service architectures, especially when architectural context spans shared libraries, infrastructure repositories, and services. Intent is positioned for that environment by processing 400,000+ files through semantic dependency graph analysis.

What Is Spec-Driven Development? A Complete Guide

Spec-driven development is a methodology that treats specifications as executable contracts from which AI agents derive code, preventing architectural drift through automated enforcement rather than passive documentation.

TL;DR

Spec-driven development (SDD) turns specifications from passive documentation into executable contracts that constrain what AI agents generate. A good spec defines six elements: outcomes, scope boundaries, constraints, prior decisions, task breakdown, and verification criteria. SDD catches architectural violations and API contract drift that unit tests structurally cannot, and pairs with multi-agent patterns (Coordinator, Implementor, Verifier) to scale across parallel work.

Most teams I've worked with discover SDD reactively: AI-generated code passes unit tests but violates architectural patterns, breaks API integration contracts, or introduces security anti-patterns that surface only in production. The arXiv paper "Spec-Driven Development: From Code to Contract in the Age of AI" (Feb 2026) frames the core distinction: traditional specs are read by humans, while SDD specs execute as validation gates.

See how Intent turns executable specs into enforced architectural contracts, with semantic dependency mapping across 400,000+ files.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

Why Spec-Driven Development Matters Now

Three forces converged in 2025-2026 that make SDD the workflow I default to when AI-generated code needs to survive in production.

AI code generation works at scale, and so do its vulnerabilities. LLMs generate vulnerable code at rates ranging from 9.8% to 42.1% across benchmarks (Yan et al., 2025), and surviving AI-introduced issues in production repositories had topped 110,000 by February 2026. SDD embeds executable specifications as active validation gates against exactly these failures.

Compliance requirements now treat specifications as evidence. The EU AI Act requires high-risk AI systems to comply with obligations starting August 2, 2026, with fines of up to €15 million or 3% of global annual turnover for non-compliance with high-risk obligations.

Distributed architectures demand formal governance. Deloitte's State of AI 2026 reports that only one in five companies has a mature governance model for autonomous AI agents. Without structured specifications governing cross-service coordination, I've watched teams hit compounding integration failures as their multi-repository architectures scale.

The Data-Backed Case: Why AI-Generated Code Needs Specification Gates

A SonarQube analysis of five LLMs (arXiv, Aug 2025) generating Java code found over 70% of Llama 3.2 90B's detected vulnerabilities rated BLOCKER severity, and roughly two-thirds of GPT-4o's and OpenCoder-8B's rated BLOCKER or CRITICAL. The pattern repeats across the literature.

Pearce et al. (IEEE S&P, 2023) found roughly 40% of programs generated in security-sensitive contexts contained vulnerabilities. Yan et al. (2025) put the range at 9.8% to 42.1% across their benchmarks. A catalog by Fu et al. (ACM TOSEM, 2025) identified 43 CWEs across three AI code-generation tools. By February 2026, a large-scale empirical study (arXiv, 2026) had counted more than 110,000 surviving AI-introduced issues in production repositories.

These findings match what I've seen in practice. Unit tests verify individual functions; they don't catch architectural violations, API contract drift, or security anti-patterns that emerge across service boundaries. SDD specifications operate at the system level, catching defect classes unit tests structurally cannot.

How SDD Differs from PRDs, Design Docs, TDD, and BDD

An SDD spec isn't a PRD or a design doc with a new label. The distinction I keep coming back to: a PRD or design doc is written for human readers who can interpret ambiguity and fill gaps from organizational context. AI agents fill gaps too, but not in the way you'd want. Without explicit scope, agents make assumptions and head in the wrong direction fast.

Artifact	Primary Reader	How Ambiguity Is Resolved	Update Cadence
PRD	Product and engineering humans	Conversation, tribal knowledge	Infrequent, often stale
Design Doc	Engineering peers	Shared context, review comments	Point-in-time artifact
SDD Spec	AI agent + CI pipeline	Explicit constraints + verification rules	Living document, updated as work progresses

SDD also operates at a different architectural layer than the code-level methodologies I work with day-to-day. These distinctions matter because they let me integrate SDD alongside TDD and BDD rather than replace either.

Dimension	TDD	BDD	Vibe Coding	SDD
Primary artifact	Unit tests	Given-When-Then scenarios	Natural language prompts	Executable specifications
Scope	Individual function correctness	Cross-functional behavior	Full application generation	System-wide architectural contracts
Validation mechanism	Automated test suites	Human-referenced documentation	Manual review (if any)	Build fails on spec divergence
AI governance	None built-in	None built-in	None built-in	Constitutional constraints and checkpoints
Where truth lives	Test suite	Workshop artifacts	Prompt history	Versioned specification

TDD drives interface design through red-green-refactor cycles at the unit level. I keep TDD for implementation verification and layer SDD on top for architectural constraints.

BDD creates Given-When-Then scenarios through cross-functional workshops. SDD can incorporate these scenarios, but with executability: BDD scenarios often exist as documentation teams reference, while SDD transforms them into executable validation gates.

Vibe coding uses AI models to build applications from natural language prompts with minimal structured review. The MSR '26 study (arXiv, Nov 2025) of Cursor AI adoption across 807 GitHub repositories found transient velocity gains alongside persistent code complexity increases. SDD defines constraints up front to prevent that drift.

The Six Elements of a Good Spec

A spec for an AI agent needs to answer six questions. Leave any of them open and the agent will answer them for you, in ways you won't like.

1. Outcomes when the work is done. Not "build an auth flow." Something closer to: "A user can sign up with email/password, receive a verification email, and log in without error. The session persists across page refreshes." Outcome statements force clarity that feature names don't.

2. In-scope and explicitly out-of-scope boundaries. The out-of-scope list matters at least as much as the in-scope list. Agents expand scope if you don't close the door on it. "OAuth is out of scope for this task" is not obvious to an agent that has learned that auth systems usually include OAuth.

3. Constraints and assumptions. Existing tech stack decisions, third-party API limits, performance requirements. If it affects implementation choices and isn't obvious from the codebase alone, it belongs in the spec. Pairing specs with an AGENTS.md file gives the agent persistent project context alongside task-specific scope.

4. Decisions already made. If you've chosen the database schema or the encryption library, say so. Agents that don't know a decision has been made will make their own. Document your decisions before delegating the work.

5. Task breakdown. One of the biggest AI failure modes is asking for too much in one shot. A breakdown into discrete sub-tasks lets individual agents work on each one, verify as they go, and operate in parallel when they're not touching the same files.

6. Verification criteria. Acceptance criteria and verification steps. Not "does it work" but: what tests pass and what edge cases are handled. This is what the verifier uses. If you're running an adversarial agent pattern (below), the verification plan is what it checks against.

Core SDD Patterns: Spec-First, Spec-Anchored, and Spec-as-Source

When I'm adopting SDD with a team, I pick one of three patterns based on context. Each represents a different level of specification authority over code generation.

Pattern	Specification Role	Code Role	Best For
Spec-First	Guides and constrains AI output	Primary deliverable	Teams beginning SDD adoption
Spec-Anchored	Governs with checkpoints and constitutional constraints	Validated deliverable	Enterprise teams needing audit trails
Spec-as-Source	Literal source code	Generated artifact	API-first domains with mature tooling

Spec-first development is where I start most teams. Specs come before code and constrain what AI agents generate, while code remains the primary deliverable.

Spec-anchored development adds governance layers, constitutional constraints, and supervision checkpoints. I reach for this pattern when regulatory requirements demand audit trails, multiple teams coordinate across services, or AI-generated code needs human approval before merging. The follow-on Constitutional SDD paper (arXiv, Feb 2026) formalizes it, embedding security constraints with explicit CWE vulnerability mappings.

Spec-as-source development is the furthest end of the spectrum: specs literally become source code. The ThoughtWorks Technology Radar (Volume 33, 2025) places SDD in the "Assess" ring and warns of "a bias toward heavy up-front specification and big-bang releases" as an antipattern.

The Adversarial Agent Pattern

The most underused pattern in spec-driven development is assigning a separate agent to check the work rather than trusting the implementing agent to self-verify.

The structure: a Coordinator breaks down the spec and delegates tasks to Implementor sub-agents. Each Implementor works from its own sub-spec. A Verifier agent then checks the output against the spec before marking the work complete. The Implementors and the Verifier have opposing goals. One is optimizing for completing the task, the other for finding failures.

Implementing agents are optimistic about their own output. A separate Verifier has a cleaner signal. The pattern forces the spec to contain explicit verification criteria, which improves the spec itself. It also makes parallel agent workflows safer: multiple Implementors can run simultaneously while the Verifier catches conflicts before they merge.

Sub-agents update the spec in real time as they progress, so the Coordinator always has a current picture of where things stand.

See how Intent coordinates Implementor and Verifier agents through shared living specs, across 400,000+ files.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline

···

$ cat build.log | auggie --print --quiet \

"Summarize the failure"

Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42

Fix: npm install lodash @types/lodash

When to Use a Spec, and When Not To

Most spec-driven development guides skip this part, which makes them read like a pitch. The caveat: not every task needs a detailed spec. Spec overhead is real cost, and I've wasted it on small fixes before where one prompt to one agent would have been faster.

Write a Spec When	Skip the Spec When
Work spans multiple agent sessions	Work is exploratory or experimental
Multiple services or repositories are involved	A single prompt can produce usable output
Reversing a wrong interpretation is expensive	Output can be reviewed in under five minutes
Compliance or audit trail is required	Prototype is meant to be thrown away
Review will require real attention (component logic, end-to-end flows)	Change is mechanical or low-risk

The trigger I use: if I'd be annoyed to have the agent interpret requirements differently than I meant, I write the spec. If I could fix the output in a quick follow-up prompt, I skip the spec and prompt directly.

How Spec Kit Enforces Specifications

GitHub Spec Kit is the open-source scaffolding I recommend for teams getting started. It's a Python CLI with 88k stars and 129 releases through April 2026, supporting 28 named AI agent platforms. The workflow runs through four commands: /speckit.specify captures business context and success criteria, /speckit.plan translates specs into architectural decisions, /speckit.tasks decomposes plans into testable units, and /speckit.implement runs AI agents under those constraints.

The payoff is what InfoQ analysis emphasizes: "With AI-generated code, a code issue is an outcome of a gap in the specification. Because of non-determinism in AI generation, that gap keeps resurfacing in different forms whenever the code is regenerated." Here's what that looks like concretely.

Before SDD (without spec): A payment endpoint ships without an idempotency constraint. Retry logic creates duplicate charges in production. The team patches the code, but the next AI regeneration cycle reintroduces the same vulnerability because no specification encodes the constraint.

After SDD (with spec):

text

# Specification correction propagates to all generated output
- endpoint: POST /charges
  constraints:
    - idempotency_key: required    # enforced in CI
    - retry_window: 24h            # added after production incident

The build fails before code reaches review whenever any AI agent generates a charges endpoint without idempotency enforcement.

SDD Tooling Comparison

The spec-driven development tooling I evaluate teams against spans open-source frameworks, API specification platforms, and enterprise-grade control planes.

Tool	Spec Formats	CI/CD Enforcement	AI Agent Compatible	Best For
GitHub Spec Kit	Markdown/structured	Via agent workflows	28 platforms	Teams adopting SDD workflows with AI agents
SwaggerHub / API Hub	OpenAPI, AsyncAPI	CLI + Git integration	MCP Server	API-first teams needing lifecycle management
Postman Spec Hub	OpenAPI, multi-protocol	GitHub sync, CI runner	MCP servers; Claude plugin	Full API lifecycle with governance
Spectral	OpenAPI, AsyncAPI, JSON Schema	CLI exit codes	Indirect	API linting and standards enforcement
PactFlow	Pact + OpenAPI	can-i-deploy gating	Partial	Contract testing across service boundaries
Specmatic	OpenAPI (executable)	Yes	Agent-ready	Executable API contract enforcement
TypeSpec	TypeSpec → OpenAPI	Via downstream toolchain	Yes (generates OpenAPI)	Azure/Microsoft ecosystem teams

InfoQ notes the limitation I hit most often with enterprise teams: current tools "typically keep specs co-located with code in a single repository," while "modern architectures span microservices, shared libraries and infrastructure repositories."

Intent's Context Engine addresses this gap by processing 400,000+ files through semantic dependency graph analysis, making multi-repository coordination viable at enterprise scale. Intent carries SOC 2 Type II and ISO/IEC 42001 certifications, the first AI coding assistant with ISO/IEC 42001 for AI-specific governance.

What This Looks Like in Practice

The six-element framework applied to a real project: a 10-page product site, fully comped out in Figma with a design system covering navigation, landing pages, feature pages, pricing, and docs layout. The kind of project that takes a couple of days for an experienced frontend engineer, longer if you're learning the component library.

Step 1: Figma MCP Pulls the Design System Into Context

With the Figma MCP connected to Intent, the Coordinator agent reads the Figma file before writing a single line of the spec. It pulls component specs (button variants, card layouts, typography scale), layout constraints (grid structure, spacing tokens, breakpoints), and the page hierarchy. The Coordinator writes that design context into the spec, so the agents implementing each page work from the same component library and spacing rules the designer defined.

Without structured design context, agents invent their own component patterns. One agent builds a card with 16px padding; another uses 24px. One agent creates a button variant that doesn't exist in the design system. Connecting the Figma MCP eliminates that divergence before code generation starts.

The Coordinator extracts brand colors from the Figma design system and writes them into the spec alongside acceptance criteria, non-goals, and assumptions.

Step 2: The Coordinator Decomposes by Page

The Coordinator generated a spec with page-level task assignments: one task per page, each with its own acceptance criteria, component requirements, and layout constraints pulled from the Figma context. Page-level decomposition works for parallel execution here because pages in a static site don't write to the same files. The Coordinator assigned shared components (navigation, footer) to a dedicated task that ran first, so page-level agents could import from a stable component library rather than each building their own.

The spec included constraints: responsive breakpoints matching the Figma frames, design token values for colors and spacing, and which shared components each page needed to import. Each Implementor agent received a self-contained task contract with enough context to build its page without needing to read the other agents' work.

Step 3: Parallel Agents Execute in Isolated Worktrees

Sub-agents handled pages in parallel, each in its own git worktree. The navigation and footer agent finished first. Page agents picked up the shared components and built against them. The Verifier checked each page against its spec criteria: correct components used, design tokens applied, responsive behavior matching the Figma breakpoints.

The project hit 95% completion in about 45 minutes across a few sessions. The remaining 5% was fine-detail work: spacing tweaks and hover state refinements.

Intent workspace showing three completed landing page tasks with green checkmarks, four queued tasks including extracting the header and footer into reusable components and building additional pages, and a verification task to check all pages against Figma. The Coordinator notes that page tasks can run in parallel since they write to separate files.

Three landing page tasks completed in parallel, with four more queued. The Coordinator confirms page tasks can run simultaneously since they write to separate files, with the Verifier running after all pages are built.

The Designer Handoff

The designer on the project had never used Git. In a standard workflow, that 5% of fine-tuning would have gone back to the engineer as a list of revision requests, each requiring a context switch between Figma and code. Instead, the designer opened the project in Intent and iterated on the details without engineering support. Adjusting a spacing value or swapping a color token didn't require understanding the codebase, because the spec-driven structure had organized the project into discrete, navigable components tied to specific Figma frames.

Open source

augmentcode/augment.vim★610

Star on GitHub

That handoff wouldn't work with prompt-driven output. When agents produce monolithic code from a vague prompt, only the person who wrote the prompt can make sense of the result. When agents build from a structured spec with page-level decomposition, the project is navigable by anyone who can read the spec.

Why Context Precision Matters More Than Context Volume

The other takeaway from this workflow: the less irrelevant context each agent carried, the better the output. A million tokens of codebase context is not the advantage it appears to be. Agents perform better with precise, task-relevant context than with broad exposure to the full repository. The Figma MCP + spec combination works because it narrows each agent's context to the design constraints and acceptance criteria for its specific task, rather than flooding it with the entire codebase.

Model Tiering Inside Spec-Driven Workflows

The Coordinator/Implementor/Verifier pattern lets you assign different models to different roles based on what each role needs.

Writing the spec deserves the most capable model available. Errors in the spec propagate through everything downstream, so underinvesting here is the most expensive mistake you can make.

Implementing works well with a mid-range model running moderate thinking (Sonnet-class, GPT-5.1-Codex). You don't need the most expensive model for execution once the spec is solid.

Verifying needs a fast model. The Verifier checks specific criteria against specific output: it needs accuracy and low cost, not deep reasoning.

Multi-agent workflows run more model calls than single-agent prompting. Spending top-model allocation on every sub-task multiplies cost without proportional benefit. Getting the allocation backward (cheap model on the spec, expensive model on implementation) costs more in correction loops than the tiering saves.

Brownfield Adoption: Applying SDD to Existing Codebases

Brownfield SDD is categorically different from greenfield, and I approach it in three phases. The foundational SDD paper (arXiv, Feb 2026) articulates why: "By extracting specs from legacy code, teams can verify that modernization efforts preserve required functionality while eliminating undocumented behaviors."

Phase 1: Reconstruct existing behavior before writing new specs. AI-assisted reverse engineering works best starting from visible artifacts (UI elements, binaries, data lineage), enriching them incrementally, and maintaining traceability back to source.

Phase 2: Spec the area of change, not the whole system. Trying to retroactively spec entire systems is impractical. The InfoQ enterprise adoption analysis is explicit: "the spec needs to be most granular near the area of change." Each bug fix, feature addition, or refactoring becomes an opportunity to add specifications for the code being touched.

Phase 3: Enforce specs in CI incrementally. Preventing drift from accumulating is more practical than periodically reconciling diverged specifications.

As InfoQ acknowledges: "SDD does not remove complexity; it simply relocates it." Specifications inherit all the properties of source code: technical debt, cross-team coupling, and architectural gravity. Intent supports enterprise-scale SDD adoption by processing 400,000+ files through architectural analysis.

Limitations of Spec-Driven Development

SDD isn't right for every context. Here's where I've seen it struggle or not earn its overhead:

Exploratory work: SDD struggles when requirements can't be known upfront. R&D work and experimentation benefit from lighter approaches.
Rapid prototyping: When the timeline to first user feedback is measured in days, SDD's upfront specification requirements create expensive regeneration cycles.
Small teams and high-change environments: For teams of 2-5 developers, specification overhead can consume a disproportionate amount of development time.
Legacy systems requiring extensive documentation: Creating specifications accurate enough for AI generation requires reverse-engineering years of implicit business logic. A known limitation in Spec Kit (GitHub issue #1191) is that the workflow is optimized for net-new feature creation, making it difficult to update existing specifications.

Start Enforcing Specs Before Your Next AI-Generated Deployment

Spec-driven development shifts specifications from passive documentation to executable build gates that enforce architectural contracts across every code generation cycle. LLMs optimize narrowly for functional correctness. Enterprise systems need architectural consistency and regulatory compliance on top of that, and SDD closes the gap.

Start with a Spec-First pattern on a single service with an existing OpenAPI contract, integrate GitHub Spec Kit into your CI/CD pipeline, and expand to Spec-Anchored governance as multi-team coordination grows. For teams managing multi-repository architectures, Intent's Context Engine processes 400,000+ files through semantic dependency graph analysis, backed by SOC 2 Type II and ISO/IEC 42001 governance.

See how Intent's living specs keep parallel agents aligned as your plan evolves.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

What Is Spec-Driven Development? A Complete Guide

TL;DR

See how Intent turns executable specs into enforced architectural contracts, with semantic dependency mapping across 400,000+ files.

Why Spec-Driven Development Matters Now

The Data-Backed Case: Why AI-Generated Code Needs Specification Gates

How SDD Differs from PRDs, Design Docs, TDD, and BDD

The Six Elements of a Good Spec

Core SDD Patterns: Spec-First, Spec-Anchored, and Spec-as-Source

The Adversarial Agent Pattern

See how Intent coordinates Implementor and Verifier agents through shared living specs, across 400,000+ files.

When to Use a Spec, and When Not To

How Spec Kit Enforces Specifications

SDD Tooling Comparison

What This Looks Like in Practice

Step 1: Figma MCP Pulls the Design System Into Context

Step 2: The Coordinator Decomposes by Page

Step 3: Parallel Agents Execute in Isolated Worktrees

The Designer Handoff

Why Context Precision Matters More Than Context Volume

Model Tiering Inside Spec-Driven Workflows

Brownfield Adoption: Applying SDD to Existing Codebases

Limitations of Spec-Driven Development

Start Enforcing Specs Before Your Next AI-Generated Deployment

See how Intent's living specs keep parallel agents aligned as your plan evolves.

FAQs about Spec-Driven Development

Written by

Molisha Shah

Give your codebase the agents it deserves

TL;DR

See how Intent turns executable specs into enforced architectural contracts, with semantic dependency mapping across 400,000+ files.

Why Spec-Driven Development Matters Now

The Data-Backed Case: Why AI-Generated Code Needs Specification Gates

How SDD Differs from PRDs, Design Docs, TDD, and BDD

The Six Elements of a Good Spec

Core SDD Patterns: Spec-First, Spec-Anchored, and Spec-as-Source

The Adversarial Agent Pattern

See how Intent coordinates Implementor and Verifier agents through shared living specs, across 400,000+ files.

When to Use a Spec, and When Not To

How Spec Kit Enforces Specifications

SDD Tooling Comparison

What This Looks Like in Practice

Step 1: Figma MCP Pulls the Design System Into Context

Step 2: The Coordinator Decomposes by Page

Step 3: Parallel Agents Execute in Isolated Worktrees

The Designer Handoff

Why Context Precision Matters More Than Context Volume

Model Tiering Inside Spec-Driven Workflows

Brownfield Adoption: Applying SDD to Existing Codebases

Limitations of Spec-Driven Development

Start Enforcing Specs Before Your Next AI-Generated Deployment

See how Intent's living specs keep parallel agents aligned as your plan evolves.

FAQs about Spec-Driven Development

What makes spec-driven development different from writing detailed requirements documents?

How do teams implement SDD without replacing existing TDD or BDD practices?

What failure modes does SDD prevent, and what tradeoffs remain?

Does spec-driven development support compliance and audit requirements?

Does spec-driven development work for large codebases and multiple repositories?

Related Guides

Written by

Molisha Shah

Give your codebase the agents it deserves