Skip to content

Arbor

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Arbor is an autonomous research agent that turns a long-horizon objective into a cumulative search. Give it a benchmark and a goal; it proposes hypotheses, edits code, runs real experiments, learns from the results, and keeps the improvements that hold up on held-out data.

Instead of one-shot attempts that forget what failed, Arbor grows a hypothesis tree: every idea becomes a branch — pruned if it fails, harvested if it works — and insights propagate back up the tree so later ideas start smarter.

  • Get running in minutes

    Clone, pip install -e ., arbor setup, then arbor.

    Installation

  • Run your first study

    Point Arbor at a benchmark and watch the Idea Tree grow.

    Quickstart

  • Understand the method

    The arbor cycle, the Idea Tree, git isolation, and held-out discipline.

    How It Works

  • Configure everything

    Providers, budgets, timeouts, and human-in-the-loop modes.

    Configuration

Two cooperating agents

Agent Role
Coordinator The research director. Maintains the Idea Tree, drives the search via the arbor cycle, and dispatches experiments.
Executor The research engineer. Given one idea, it implements the code changes, runs the experiment in an isolated git worktree, and reports evidence.

Why Arbor

  • Grows evidence, not logs. Results, failure modes, and distilled insights live in a persistent Idea Tree — not a scrollback buffer.
  • Held-out discipline by default. Executors iterate on a dev split; only improvements that clear a configurable margin on a held-out test split are merged.
  • Isolated, reversible experiments. Every experiment runs in its own git worktree on a dedicated branch. Your main is never touched until you merge.
  • Backpropagated insight. After each experiment, an LLM abstracts what was learned and pushes it up the tree, so sibling and descendant ideas inherit hard-won context.
  • Use any model. Anthropic, OpenAI / Responses API, or anything OpenAI-compatible through LiteLLM (DeepSeek, Gemini, Qwen, vLLM, Ollama, local gateways).
  • Domain adaptation without code changes. A one-line plugin: retargets the agent; Skills are markdown playbooks loaded on demand.

New here?

Start with InstallationQuickstart, then read How It Works to understand the moving parts.