An enforcement-first harness template for the Claude Code + Codex CLI pair.
A reusable starting point for teams who want their AI coding assistants to follow rules — not just hope they do. Hooks block destructive shell, gate code-path edits behind a TDD ledger, and run the same verification on both Claude Code and Codex CLI.
If you have ever asked an AI to "be careful" and watched it overwrite a config file anyway, this is the answer: replace politely-worded prompts with executable guards.
A directory layout + hook scripts + rule documents you copy into a new (or existing) repository so that Claude Code and Codex CLI behave like team members under the same playbook.
What you get out of the box:
- 12-agent topology (all active by default) — main-orchestrator (Claude), executor-agent
(Codex), codex-final-reviewer (Codex), quality-agent (Claude), fleet-doc-steward (governance),
plus 7 specialists active out of the box: CWE auditor, dependency auditor, UI reviewer,
pipeline validator, ontology validator, runtime-contract reviewer, template-sync validator.
Each agent declares its execution backend in frontmatter so the orchestrator knows exactly
which CLI subprocess to use. Trim per-family by editing
active_agentsinconfig/repos/<slug>.jsonwhen a project does not need a specialist. - 11-skill library — design, verify, code-review, testing, ui-design, governance, knowledge, automation, efficiency, bluebricks, commit. Skills load on demand when the request matches a trigger keyword. No token cost when unused.
- Per-family JSON registry —
config/repo-agent-management.jsoncatalogs agents and skills. When you fork and add repositories, each family gets its ownconfig/repos/<slug>.jsonwith per-family agent topology, skill pack, and specialist overrides. - Pre-tool-use guard — denies destructive shell patterns and protected paths before the tool runs.
- Post-edit checks — flag debug statements and credential leaks immediately after every Edit/Write.
- Composite TDD ledger — implementation edits are blocked unless
tasks/tdd.jsonhas a planning entry covering the file. - Pre-commit verification — the planning entry's verification commands must pass before the commit lands.
- Session lifecycle — auto-loaded plan/lessons/memory at session start, auto-saved snapshots at session end, auto-handoff at compact.
- Dual CLI parity — the same hooks fire from both Claude Code (
.claude/settings.json) and Codex CLI (.codex/hooks.json). The wire format is shared, so you author once.
What this is not: a runtime, a framework, or a service. There is no daemon. There is no SaaS. The harness is just files in your repo. If you delete the directory, your project goes back to behaving like it did before.
Claude Code and Codex CLI overlap in capability but differ in token budget, scoping, and review style. Most non-trivial work benefits from using both. Whichever CLI you open becomes the control-plane main (requirements, planning, design, orchestration, judgment); delegated sub-agent execution stays Codex-centered (code writing, TDD, review). The two mains share one contract — the rules, hooks, memory, and architecture apply identically no matter which CLI you launched. This template assumes you will run both — and pins the rules so they cannot drift apart.
The hook events shared by both CLIs — PreToolUse, PostToolUse, PreCompact,
SessionStart, Stop, PermissionRequest — get the same script. Claude Code's additional
events (TaskCreated, TaskCompleted, StopFailure) get Claude-only enforcement.
# 1. Clone or copy the template into your repo.
git clone https://github.com/youngjin39/claude-codex-harness.git my-project
cd my-project
# 2. Run the setup script.
./setup.sh
# 3. Open the project in Claude Code.
claude .
# 4. Or in Codex CLI.
codexThe setup script:
- Makes the hook scripts executable
- Creates an empty
tasks/tdd.jsonif one is not present - Prints a one-line summary of what just got installed
Both CLIs will pick up the hooks on next launch. No daemon, no background process.
| Agent | Backend | Role | Purpose |
|---|---|---|---|
main-orchestrator |
Claude | control_plane | Entry point, task classification, orchestration |
executor-agent |
Codex | execution | Codex-lane code writing and TDD execution |
codex-final-reviewer |
Codex | review | Final design-vs-code consistency check (read-only) |
quality-agent |
Claude | review | Fallback quality review, tie-break synthesis (read-only) |
| Agent | Backend | Role | Purpose |
|---|---|---|---|
fleet-doc-steward |
Claude | governance | CLAUDE.md / AGENTS.md central governance |
| Agent | Backend | Scope | Purpose |
|---|---|---|---|
cwe-auditor |
Claude | code_app, hybrid_pipeline | CWE-pattern static security scan |
dep-auditor |
Claude | code_app, hybrid_pipeline | Dependency drift and license audit |
ui-reviewer |
Claude | code_app | UI component and accessibility review |
pipeline-validator |
Codex | hybrid_pipeline | Data pipeline schema validation |
ontology-validator |
Claude | content_workspace | Content taxonomy and ontology check |
runtime-contract-reviewer |
Claude | infra_runtime | Exception class and public API contract check |
template-sync-validator |
Claude | template_transitional | Public template sync sanitize validation |
The execution_backend field in each agent's frontmatter is the single declarative surface that
tells the orchestrator whether to dispatch via the Codex CLI subprocess or a direct Claude agent
session. The agent loader (tools/agent_loader) validates this frontmatter on demand.
Dispatch rule (ADR-09): Any agent declaring execution_backend: codex must be dispatched
via the codex exec subprocess pattern, NOT via Claude's direct Agent tool. Violation logs are
written to tasks/log/dispatch-log.jsonl.
| Skill | Trigger keywords | Absorbs legacy slugs |
|---|---|---|
design |
design, brainstorm, architecture, plan, interview | brainstorming, writing-plans, deep-interview, + more |
verify |
verify, done check, proof, spec check, audit | verification, verify-against-spec, self-audit, review-code |
code-review |
review, PR, quality, merge check | — |
testing |
test, TDD, unit test, integration test | — |
ui-design |
UI, UX, interface, wireframe, component spec | ux-ui-design |
governance |
CLAUDE.md, AGENTS.md, fleet governance, project doctor | fleet-instruction-doc-ops, project-doctor, + more |
knowledge |
knowledge, wiki, ingest, knowledge graph | knowledge-ingest, knowledge-lint |
automation |
runner, long-running, background, monitor, browser | runner, browser-automation |
efficiency |
token efficiency, AI readiness, cost analysis | improve-token-efficiency, ai-readiness-cartography |
bluebricks |
code, debug, refactor, architecture, module | ai-ready-bluebricks-development |
commit |
commit, git, save changes | git-commit |
Skills load only when triggered. Body lives at .claude/skills/<name>/SKILL.md.
The registry uses a per-family JSON split (ADR-15 v3.7):
config/
repo-agent-management.json # root catalog (agents, skills, templates)
repo-agent-management.schema.json
repos/ # one file per family (empty in template)
<your-repo>.json # per-family entry (add when you fork)
Each per-family file declares:
active_agents— which agents are enabled (subset of catalog)active_skills— which skill groups are enabledagent_overrides.add_specialists— opt-in specialists beyond the template defaultagent_overrides.scope_patterns_overrides— per-specialist file-scope narrowingorchestration_profile— standard / bounded / minimal
To validate the registry:
python3 scripts/verify_repo_agent_management.py.
├── CLAUDE.md # Claude Code workspace rules (orchestration, role policy, gates)
├── AGENTS.md # Codex CLI mirror — same rules, Codex-flavored
├── ARCHITECTURE.md # component map — Conductor / Engine / Worker layers
├── setup.sh # one-command bootstrap
├── README.md # (this file)
├── LICENSE # MIT
├── CONTRIBUTING.md # how to extend the template
│
├── .claude/ # Claude Code surface
│ ├── settings.json # hook + permission config (9 hook surfaces)
│ ├── hooks/ # shell scripts (PreToolUse, PostToolUse, ...)
│ ├── skills/ # 11 trigger-loaded skill groups
│ └── agents/ # 12 sub-agent personas
│
├── .codex/ # Codex CLI surface
│ ├── hooks.json # 6-trigger mirror of .claude/settings.json
│ └── agents/ # 12 .toml mirrors of .claude/agents/*.md
│
├── .ai-harness/ # the rules (CLI-agnostic)
│ ├── common-ai-rules.md # loaded on every task
│ ├── development-ai-rules.md # loaded on code tasks
│ ├── deny-list.yaml # destructive patterns the hook blocks
│ ├── tdd-matrix.md # the 12-category TDD ledger spec
│ ├── session-closeout.md # end-of-session checklist
│ └── failure-patterns.md # recurring AI mistakes worth pinning
│
├── config/ # agent-management registry
│ ├── repo-agent-management.json # root catalog
│ ├── repo-agent-management.schema.json # JSONSchema
│ └── repos/ # per-family entries (empty in template)
│
├── tools/ # harness tooling
│ ├── catalog_loader.py # ADR-15 v3.7 per-family catalog aggregator
│ ├── agent_loader/ # ADR-09 frontmatter parser + validator
│ └── profile_compiler/ # role-policy compiler stub (extend for your fleet)
│
├── scripts/
│ └── verify_repo_agent_management.py # registry verifier
│
├── tasks/ # the working ledger
│ ├── plan.md # current phase summary
│ ├── tdd.json # composite TDD ledger (the gate)
│ ├── change_log.md # what changed and why
│ ├── lessons.md # patterns promoted to rules
│ ├── sessions/ # session snapshots
│ └── handoffs/ # inter-session handoffs
│
├── docs/ # prose memory + generated md projections
│ ├── memory-map.md # keyword → file index (generated from .mir/memory.db)
│ └── decisions/ # ADRs
│
├── .mir/ # canonical memory DB (.mir/memory.db, gitignored)
│
└── examples/ # short walk-throughs
Before Claude Code or Codex CLI runs Bash, Edit, Write, or apply_patch, the hook reads
the deny-list and:
- blocks patterns marked
severity: block(e.g.rm -rf /,git push --force) - warns on
severity: warn - exits 0 otherwise
Code paths (tools/, src/, lib/) additionally require an active Codex session; direct
Claude Edit/Write to those paths is blocked.
After every Edit/Write, the hook scans the changed file for debug statements (console.log,
print( in non-test code) and credential-shaped strings (AWS keys, JWTs, etc.). Flags are
surfaced to the agent so it has a chance to clean up before commit.
Implementation files (anything under src/, app/, or lib/ ending in .py/.ts/.go/…)
are blocked from editing unless tasks/tdd.json contains a change entry whose targets list
includes the file. Planning is required before coding.
On git commit, the hook walks the changed files, finds the matching ledger entry, and runs its
categories.*.command strings. If any test marked pass does not actually pass, the commit is
blocked.
The ledger has 12 categories — unit, integration, e2e, browser, edge, architecture,
availability, load, soak, security, compatibility, transaction_locking. Each is either
pass (with a runnable command), covered_existing, or not_applicable (with a written reason).
Create config/repos/my-repo.json:
{
"slug": "my-repo",
"display_name": "My Repo",
"registry_path": "/path/to/my-repo",
"profile_slug": "my-repo",
"repository_type": "code_app",
"rollout_class": "immediate_migrate",
"overlay_archetype": "code_app",
"status": "active",
"management_template_id": "code_app",
"management_mode": "harness-managed",
"profile_source": {"kind": "live-profile", "path": ".mir/repo-profile.toml"},
"managed_domains": [
"central_ownership_contract",
"repository_overlay",
"generation_verification_pipeline",
"operating_contract",
"harness_structure",
"harness_format",
"agent_management"
],
"fleet_management": {
"active_target": true,
"control_repo": false,
"runtime_contract_exception": false,
"diet_mode": "normal"
},
"exception_review": {
"requires_repo_specific_runtime_review": false,
"protected_categories": []
},
"evidence_trace": {
"source_documents": [],
"open_questions": [],
"assumptions": []
},
"notes": [],
"active_agents": ["main-orchestrator", "executor-agent", "codex-final-reviewer", "quality-agent"],
"active_skills": ["design", "verify", "testing", "code-review", "bluebricks"]
}Edit .ai-harness/deny-list.yaml — add or remove patterns the pre-tool-use hook blocks. Each
entry has id, pattern (regex), severity (block / warn), and reason.
Edit CLAUDE.md (the role policy table) and AGENTS.md (the Codex-side mirror).
Run python3 scripts/verify_repo_agent_management.py to confirm the registry is consistent.
This template is opinionated about one specific thing: enforcement. It exists because advisory rules in markdown — "please don't push to main", "remember to write tests" — are read by AI agents the same way humans read EULAs. Rules need to be code that runs, not text that gets glanced at.
Specifically, this template is the only one in the comparison table that:
- Wires both Claude Code and Codex CLI to the same hook scripts, so when you fix a deny-list pattern it fixes both CLIs without a copy.
- Gates implementation edits behind a typed TDD ledger (
tasks/tdd.json), not a free-form list. The 12-category matrix is the contract. - Carries a 12-agent catalog with declared execution backends so the orchestrator knows at
dispatch time whether to use
claudeorcodex exec— no runtime guessing. - Treats hook bypass attempts (e.g.
--no-verify) as deny-list patterns themselves, so the gate cannot be lifted by inviting the agent to lift it. - Ships in a form you can rip out. There is no runtime, no service, no schema migration.
Delete
.claude/,.codex/,.ai-harness/,config/,tools/and your repo behaves like a normal repo again.
- A repo where you run both Claude Code and Codex CLI and want them to stay coherent.
- A team where "please don't" notes have failed before.
- A project that can express its TDD plan in 12 categories before each change.
- A solo developer who wants the Saturday-morning AI-edit session to not destroy Friday-night's work.
- You need a single-agent setup with no enforcement (use Claude Code default).
- You want a managed multi-agent platform (use Archon, autoGPT family, etc.).
- Your project does not have a TDD culture and cannot adopt one — the gates here will fight you the whole way.
- You need cross-language hooks beyond shell (the hook scripts are bash).
See CONTRIBUTING.md. Issues and PRs welcome — particularly adding new deny-list patterns, new skills, and new examples. Avoid implementation-specific code; this is a template, not a runtime.
MIT — see LICENSE. Use it, fork it, strip it for parts.