Two-tier autonomous research: experiments + meta-analysis
Built on Karpathy's autoresearch and ShinMegamiBoson's OpenPlanter. deep-loop combines both into a single system where one agent runs experiments and another periodically steps back to find what's working and generate smarter hypotheses.
deep-loop has two modes:
A Claude Code agent modifies train.py, runs 5-minute training jobs on GPU, logs results, and iterates. Every 12 experiments (or after a significant improvement), OpenPlanter analyzes all results to identify patterns and propose smarter hypotheses.
A Claude Code agent picks questions from program.md, searches the web, synthesizes findings into report.md, and records entries in knowledge_index.tsv. Every 5 entries, the meta-analysis tier evaluates research quality and rewrites the strategy.
Both modes share the same core loop: experiment → record → meta-analyze → adapt → repeat.
┌─────────────────────────────────────────────────────────────────┐
│ orchestrate.py │
│ (launches and monitors) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Claude Code Agent │
│ │
│ reads: program.md (research direction / hypotheses) │
│ edits: train.py (ML mode) or report.md (research mode) │
│ runs: uv run train.py (ML) or web search (research) │
│ logs: results.tsv (ML) or knowledge_index.tsv (research) │
│ │
│ At regular intervals: │
│ ─────────────────────────────────────────────────────────── │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ meta_analyze.py │ │
│ │ │ │
│ │ reads: results + git log + current state │ │
│ │ calls: OpenPlanter or Anthropic API │ │
│ │ writes: next-hypotheses.md / rewrites program.md │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Agent adapts strategy based on meta-analysis output │
│ │
└─────────────────────────────────────────────────────────────────┘
# Clone and setup
git clone https://github.com/kilroycreative/deep-loop && cd deep-loop
./setup.sh
# Run in tmux
tmux new-session -s deep-loop
python orchestrate.py --tag exp1
# Monitor from another terminal
python orchestrate.py --status- Edit
program.md— replaceYOUR_TOPIC_HEREwith your research topic - Run
/loopin Claude Code (reads.claude/commands/loop.md) - The agent will search, synthesize, and build
report.mdautonomously - Every 5 entries, meta-analysis evaluates and adapts the strategy
| File | Role | Modify? |
|---|---|---|
program.md |
Research direction + strategy (mutable by meta-agent) | Set topic, then let meta-agent evolve |
CLAUDE.md |
Research constitution (invariants) | Rarely |
train.py |
Model + training loop (ML mode) | Agent only |
prepare.py |
Data pipeline + eval (from autoresearch) | NEVER |
orchestrate.py |
Main entry point, launches agent | No |
meta_analyze.py |
Meta-analysis integration | No |
notify.py |
Sends events on breakthroughs | No |
report.md |
Research output (research mode) | Agent only |
knowledge_index.tsv |
Research audit trail | Agent only |
process_log.md |
Meta-analysis methodology log | meta_analyze.py only |
results.tsv |
Experiment log (ML mode) | Agent only |
openplanter/ |
OpenPlanter agent source | No |
For an overnight ML run (~12 hours, ~100+ experiments):
| Component | Cost |
|---|---|
| H100 compute (~12h @ $2/hr) | ~$24 |
| Claude Opus agent (experiments) | ~$20 |
| Claude Sonnet meta-analysis (~8 runs) | ~$0.50 |
| Total | ~$45 |
Domain research mode costs vary by topic breadth — roughly $5–15 of API usage per 20-entry research session.
When breakthroughs occur, notify.py pings OpenClaw:
# ML mode — val_bpb threshold crossed
python notify.py --event breakthrough --val 0.9891
# Research mode — significant insight
python notify.py --event breakthrough --val "insight: discovered dominant standard"# ML: Start experiment loop
python orchestrate.py --tag <name>
# ML: Check progress
python orchestrate.py --status
# ML: Run meta-analysis manually
python orchestrate.py --meta-only
# Research: Start autonomous research loop
# (run /loop in Claude Code)- autoresearch by Andrej Karpathy — the autonomous ML experiment loop that deep-loop's inner tier is built on
- OpenPlanter by ShinMegamiBoson — the meta-analysis agent that powers deep-loop's outer tier
MIT