Arbor¶
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Arbor is an autonomous research agent that turns a long-horizon objective into a cumulative search. Give it a benchmark and a goal; it proposes hypotheses, edits code, runs real experiments, learns from the results, and keeps the improvements that hold up on held-out data.
Instead of one-shot attempts that forget what failed, Arbor grows a hypothesis tree: every idea becomes a branch — pruned if it fails, harvested if it works — and insights propagate back up the tree so later ideas start smarter.
-
Get running in minutes
Clone,
pip install -e .,arbor setup, thenarbor. -
Run your first study
Point Arbor at a benchmark and watch the Idea Tree grow.
-
Understand the method
The arbor cycle, the Idea Tree, git isolation, and held-out discipline.
-
Configure everything
Providers, budgets, timeouts, and human-in-the-loop modes.
Two cooperating agents¶
| Agent | Role |
|---|---|
| Coordinator | The research director. Maintains the Idea Tree, drives the search via the arbor cycle, and dispatches experiments. |
| Executor | The research engineer. Given one idea, it implements the code changes, runs the experiment in an isolated git worktree, and reports evidence. |
Why Arbor¶
- Grows evidence, not logs. Results, failure modes, and distilled insights live in a persistent Idea Tree — not a scrollback buffer.
- Held-out discipline by default. Executors iterate on a dev split; only improvements that clear a configurable margin on a held-out test split are merged.
- Isolated, reversible experiments. Every experiment runs in its own git worktree on
a dedicated branch. Your
mainis never touched until you merge. - Backpropagated insight. After each experiment, an LLM abstracts what was learned and pushes it up the tree, so sibling and descendant ideas inherit hard-won context.
- Use any model. Anthropic, OpenAI / Responses API, or anything OpenAI-compatible through LiteLLM (DeepSeek, Gemini, Qwen, vLLM, Ollama, local gateways).
- Domain adaptation without code changes. A one-line
plugin:retargets the agent; Skills are markdown playbooks loaded on demand.
New here?
Start with Installation → Quickstart, then read How It Works to understand the moving parts.