idea-factory turns Claude Code into a virtual startup.
You're the CEO — describe what you want in one line, and a team of AI agents builds it.
v7.1 (2026-04-11) is a hotfix over v7 that removes blocking PreToolUse hooks which were stalling downstream autonomous workflows (see #2). templates/settings.json now ships with defaultMode: "bypassPermissions" and an empty PreToolUse — the deny list still blocks destructive ops (rm -rf /, sudo *, .env*, credentials, secrets).
v7 (2026-04-04) was the harness engineering overhaul: 11 battle-tested patterns from market-dashboard-v5 — Quality Ratchet, Protected Files, 5-Stage PR Pipeline, 6-Gate Deploy Consensus, CONTRACT FAQ, agent depth guidance, and more. See HARNESS-GUIDE.md for the full changelog.
v6.1 laid the foundation: 4-reviewer gate, isolated worktrees, two-pass evaluation, Playwright MCP, phase handoffs, Codex Gate. Informed by Anthropic's harness engineering research (March 2026).
| Feature | What it does |
|---|---|
| 4-Reviewer Gate | architect + critic + code-reviewer + qa-tester review in parallel (was 3) |
| Fresh Context Isolation | Every reviewer runs in an isolated worktree — no self-praise bias |
| Two-Pass Evaluation | Pass 1: adversarial defect hunt (mandatory). Pass 2: structured scoring (optional, fresh context). Eliminates "Evaluator Leniency" |
| Playwright MCP | qa-tester opens the live app in a real browser, clicks buttons, fills forms — not just code review |
| Phase Handoff Documents | MVP, Harden, Ship phases produce handoff docs preserving context across transitions |
| Codex Gate | Optional cross-model review via OpenAI Codex CLI for a second opinion on diffs |
| Safety via deny-list | permissions.deny blocks destructive ops (rm -rf /, sudo *) and sensitive reads (.env*, credentials, secrets). No blocking PreToolUse hooks — autonomous workflows stay zero-friction. |
| CLAUDE.md 80-Line Limit | Generated project CLAUDE.md stays under 80 lines (HumanLayer research: compliance drops beyond ~150 instructions) |
| Fix-Loop Circuit Breaker | Same failure 3 times = stop and escalate to CEO (no more infinite token-burning loops) |
| HARNESS-GUIDE.md | New design document explaining every architectural decision with evidence |
One command. A complete MVP in under an hour.
$ claude
> /start-company 프리랜서 수입 지출 자동 관리 앱
[ANALYZE] analyst + architect analyzing in parallel...
→ Service: CashFreel (캐시프릴)
→ Type: SaaS — Freelancer tax prediction
→ Team: PM + Developer + Designer
[SCAFFOLD] Creating project from templates...
→ CLAUDE.md, agents, hooks, settings ✓
→ git init ✓
[KICKOFF] CEO, 4 quick questions:
1. Design feel? → Toss style (minimal, big numbers)
2. MVP scope? → Tax prediction + income/expense tracking
3. Revenue? → Free first, decide later
4. Income scope? → Domestic + international
[BUILD] ralph loop running MVP stories...
✅ MVP-001: Next.js + Tailwind + shadcn/ui
✅ MVP-002: Income registration (KRW + USD + EUR)
✅ MVP-003: Expense tracking + auto-categorization
✅ MVP-004: Real-time tax dashboard + charts
✅ MVP-005: Cash flow report + CSV export
[VALIDATE] 4 independent reviewers (isolated worktrees):
✅ architect (opus): no structural blockers for Phase 2
✅ critic (opus): no essence drift detected
✅ code-reviewer (opus): 0 critical, 2 medium (non-blocking)
✅ qa-tester (playwright): all 7 flows pass in real browser
→ MVP complete. Phase 2 ready when you are.
Result: CashFreel now has a working prototype. Next phase: connect real tax APIs, add authentication, harden security. CEO didn't write a single line of code.
Vibe coding is fast, but chaotic. You get code — not a product.
Real startups don't just have developers. They have process: a PM who says "no", a designer who researches before drawing, a QA who breaks things on purpose, and a critic who asks "but why?"
idea-factory gives you both: the speed of AI + the discipline of a real team.
| Tool | Approach | You need to be |
| Vibe coding | "Just build it" | A developer |
| gstack | Engineering team | A developer |
| idea-factory | Full startup team | Just the CEO |
You: /start-company a portfolio tracker for busy investors
ANALYZE ──────── Two agents dissect your idea in parallel
│ (market fit, tech stack, team composition)
▼
SCAFFOLD ─────── Project created from templates, not from scratch
│
▼
KICKOFF ──────── 3-5 plain-language questions — no jargon, just choices
│
▼
BUILD MVP ────── Mock data first. Core flow only.
│ Every feature checked: "Does this serve the Why?"
▼
VALIDATE ─────── 4 reviewers in isolated worktrees (defects first):
│ Architect + Critic + Code-Reviewer + QA (Playwright)
│ ↳ critical defect? fix and retry. all clear? CEO confirms.
▼
HARDEN ──────── Real APIs, tests, security — only after MVP is validated
│
▼
SHIP ────────── Deploy + retrospective
Most AI tools rush to connect APIs and deploy. We do the opposite.
| Phase | What happens | Real APIs? | Deploy? |
|---|---|---|---|
| 1 — Prototype | Mock data, core flow, validate the "wow" | No | No |
| 2 — Harden | Real APIs, error handling, tests, security | Yes | No |
| 3 — Ship | Deploy after security audit passes | Yes | Yes |
Why? Because connecting a payment API before knowing if anyone wants your product is a waste of everyone's time.
Every feature is checked against your service's "Why":
essence.md
├── One-Line Definition: what this is
├── Why This Exists: the problem it solves
├── Wow Factor: what makes users go "wow"
├── Differentiator: what competitors don't do
└── Key Metric: the one number that matters
- After every story: does this serve the Why?
- At every gate: is the codebase drifting from the vision?
- Drift too far → the system flags it and suggests a pivot
# One-liner
curl -fsSL https://raw.githubusercontent.com/gguloadoong/idea-factory/main/install.sh | bash
# Or clone locally
git clone https://github.com/gguloadoong/idea-factory.git
cd idea-factory && bash install.sh| Required | Optional |
|---|---|
| Claude Code CLI | oh-my-claudecode (for ralph autonomous loop) |
| Node.js 18+ | Gemini CLI (external perspective) |
| Git |
/start-company a portfolio tracking app for busy investors
That's it. The system will:
- Analyze your idea and form a minimum team
- Set up the project with proper structure
- Ask you 3-5 simple questions
- Start building autonomously
| It asks | It doesn't ask |
|---|---|
| Design feel (A/B/C choices) | Tech stack decisions |
| MVP scope | Architecture choices |
| Revenue model | Code review results |
| "Is this the right direction?" | Bug fixes |
| API keys when actually needed | Anything it can decide |
idea-factory/
├── skills/start-company/ # The trigger (/start-company)
│ ├── SKILL.md # execution flow (current: v7.1)
│ └── HARNESS-GUIDE.md # design decisions + evidence (22 KB)
│
├── templates/ # Scaffold copied into each new project
│ │ # ── Core (every install gets these) ──
│ ├── CLAUDE.md.tmpl # project constitution (80-line limit)
│ ├── settings.json # permissions + deny-list (safety baseline)
│ ├── agents/ # 7 roles: pm · developer · designer · architect · critic · code-reviewer · qa-tester
│ ├── hooks/ # 18+ hooks: safety / quality / governance / loop-breaker
│ ├── documents/ # PRD · essence · CONTRACT · handoff · quality-baseline
│ ├── scripts/ # codex-review-gate · copy-drift · contract · temporal-lint
│ ├── .github/workflows/ # CI + PR labeling
│ │ # ── Advanced (opt-in patterns) ──
│ ├── contract-rules/ # CONTRACT FAQ rules (drift guardrails)
│ ├── gate-presets/ # 6-Gate Deploy Consensus configs
│ ├── gate-rules.yml # per-stage gate rules
│ ├── ratchet.yml.tmpl # Quality Ratchet (regression floor)
│ ├── protected-files.yml # protected-files hook allow-list
│ ├── .protected-files.tmpl # (template of above for downstreams)
│ ├── handoff-checklist.md.tmpl # phase handoff checklist
│ ├── research-report.md.tmpl # researcher agent output template
│ ├── experiments/ # numerical tuning harness (v8)
│ ├── cron-bot/ # scheduled bot scaffold (v8)
│ ├── settings-extensions/ # per-project-type settings overlays
│ ├── lints/temporal-leakage/ # date/time hardcoding lint
│ ├── workflows/ # opinionated ralph/ulw workflows
│ ├── .githooks/ # pre-commit hooks for downstreams
│ ├── .coderabbit.yaml # CodeRabbit review config
│ ├── COPIED-FROM.md.tmpl # template provenance stamp
│ └── vercel.json # Vercel preview-deployment safety defaults
│
├── scripts/ # Meta-utilities (NOT copied to downstreams)
│ ├── sync-downstream.sh # push template updates to N downstream repos
│ ├── sync-lib.py # sync library
│ ├── audit-backlog.py # v8 backlog tracking
│ ├── check-contract.sh # CONTRACT FAQ drift check
│ ├── check-copy-drift.sh # template-vs-copy drift check
│ ├── lint-temporal-leakage.sh # date lint
│ ├── merge-settings.sh # settings.json merge helper
│ ├── record-failure.sh # failure capture for learning layer
│ ├── run-gate.sh # gate orchestrator
│ ├── tuning-gate.sh # numerical tuning gate
│ └── validate-handoff.sh # handoff validator
│
├── sync-manifest.json # managed / computed / customized classification
├── downstream-registry.json # downstream repos this factory tracks
├── install.sh # one-command installer
├── tests/ # harness invariant tests
│
├── docs/ # project documentation
│ ├── ko/ # 한국어 가이드
│ └── research/ # internal research memos
│
├── CLAUDE.md # working-in-this-repo rules (Claude Code)
├── AGENTS.md # OMC entry point
├── ARCHITECTURE.md # system architecture overview
├── CHANGELOG.md # version history
├── CONTRIBUTING.md # contribution guide
├── SECURITY.md # security policy
├── CODE_OF_CONDUCT.md # community standards
└── LICENSE # MIT
For the rationale behind every design decision, see HARNESS-GUIDE.md.
| Decision | Why |
|---|---|
| Templates, not generation | Creating 30 files from scratch wastes the context window on boilerplate |
| ralph as backbone | Post-condition chaining between skills is unreliable; a state-machine loop isn't |
| 4 isolated reviewers | Same-session role-play isn't real analysis; worktree-isolated agents with fresh context are |
| Defects first, scores second | Scoring alone triggers "Evaluator Leniency" (AI gives 9/10). Defect hunt first forces honesty |
| Playwright MCP for QA | Code review alone misses UI bugs; real browser interaction catches what humans catch |
| essence.md as North Star | Without it, features drift from the original vision within 2 sprints |
| CLAUDE.md under 80 lines | AI compliance drops beyond ~150 instructions; system prompt uses ~50, leaving ~100 for the project |
| Fix-loop circuit breaker | Without a cap, agents burn tokens in infinite retry loops on the same error |
/start-company a pet health management app
/start-company subscription meal delivery for seniors
/start-company hospital booking and report automation tool
/start-company freelancer income/expense auto-tracker
/start-company AI-powered study planner for college students- gstack — Sprint pipeline, meta-skills
- Citadel — Single entry point routing
- oh-my-claudecode — Agent orchestration
- everything-claude-code — Cross-platform skills
PRs welcome! Whether it's new agent templates, better hooks, or translations.
MIT