AiScientist: A File-as-Bus Research Lab

Long-horizon ML research needs File-as-Bus coordination, not just message handoffs.
Talk is cheap, show me your files.

AiScientist is built for long-horizon ML research engineering, where agents must maintain coherent progress across heterogeneous stages while preserving evolving project state over time.

Across a 24-hour autonomous run, AiScientist repeatedly implements, tests, keeps, and discards candidate ideas while pushing the running best upward. This trajectory shows long-horizon improvement through 78 experiment cycles, with diverse solution strategies explored along the way rather than a single lucky guess.

📰 News

2026-04-13: Initial public release of AiScientist.
2026-04-13: The public README now reflects the File-as-Bus thesis, hierarchical research team design, and long-horizon paper/MLE workflows.
Future updates will include benchmarks, release notes, and project milestones.

🔬 What AiScientist Is

AiScientist is an artifact-mediated virtual research lab for long-horizon ML research engineering. It treats long-horizon performance as a joint systems problem: agents must not only orchestrate the right expertise at the right stage, but also preserve evolving project state with enough fidelity for later decisions to stay coherent.

paper: given a paper markdown or bundle plus a GPU and time budget, AiScientist autonomously drives the full reproduction loop from reading and planning to implementation, experimentation, debugging, and final self-check.
mle: given an ML task plus a GPU and time budget, AiScientist autonomously conducts research for stronger solutions through repeated implementation-and-experiment cycles that improve the target metric over time.

File-as-Bus is the core coordination protocol. Instead of compressing progress into lossy conversational handoffs, AiScientist turns workspace files into the system of record for plans, code, experiments, logs, and validation artifacts.

A short look at AiScientist in motion.

AiScientist.mp4

✨ Why It Feels Different

Hierarchical Research Team

A hierarchical research team pairs a top-level Orchestrator with specialists and focused subagents to sustain coherent progress over multi-day workloads.

File-as-Bus Coordination

Agents coordinate through evolved workspace files instead of relying only on lossy message handoffs between prompts.

Workspace as System of Record

Workspace as the system of record illustration

A permission-scoped workspace and compact workspace map keep plans, code, experiments, and validation as the durable source of truth for both agents and operators.

Thin Control over Thick State

The Orchestrator keeps control thin through stage-level directives, concise summaries, and a workspace map, while specialists progressively disclose thick state by reading task-relevant artifacts on demand.

⚙️ How It Works

Stage the workspace. AiScientist stages the inputs into a permission-scoped workspace and builds a compact workspace map that acts as the lightweight entry point into the run state.
Launch the sandbox. A Docker sandbox mounts the workspace into canonical paths under /home, giving agents an isolated execution environment with shared persistent state.
Keep control thin. The Orchestrator makes stage-level decisions and delegates heavy work to specialists through the Agent-as-Tool pattern.
Keep state thick. Specialists and focused subagents coordinate through File-as-Bus artifacts: they read task-relevant files on demand and write back plans, code, experiments, logs, and validation results.
Leave an inspectable run behind. The run finishes with a workspace, logs, artifacts, and export bundle that can be resumed, validated, diffed, or audited without reconstructing state from memory.

This is the core shift from message handoffs to File-as-Bus coordination: control stays lightweight, while project state remains durable, readable, and reusable on disk.

🧭 Two Tracks

AiScientist uses one control plane for two long-horizon workloads: paper reproduction and Kaggle-style MLE competitions.

Track	Primary entrypoints	What the loop optimizes for	Validation endpoint
`paper`	`--paper-md`, `--zip`	turn paper context into a runnable reproduction through reading, planning, implementation, experimentation, debugging, and final self-check	final self-check plus `validation_report.json`
`mle`	exactly one of `--zip`, `--name`, `--workspace-zip`, `--competition-bundle-zip`, or `--data-dir`	search for stronger solutions through repeated implementation-and-experiment cycles that improve the target metric over time	submission-format or grading validation

Both tracks share the same workspace model: durable files on disk become the common state that agents, operators, and validation flows can all inspect later.

Paper Track

paper is the paper-grounded long-horizon ML research track. Starting from --paper-md or a bundled --zip, AiScientist carries work across paper understanding, task planning, implementation, experimentation, debugging, and final self-check under a fixed compute and time budget.

MLE Track

mle is the competition-style long-horizon ML engineering track. Starting from the most self-contained --zip path or a prepared-cache --name, AiScientist iterates through implementation-and-experiment cycles to explore stronger solutions and continuously improve the target metric over time.

💾 What Lands On Disk

Each run leaves a concrete, inspectable tree under jobs/<job_id>/. The full job directory is the durable run record, but workspace/ is the agent-visible File-as-Bus: it is where plans, code, experiments, and submissions persist as the primary system of record for ongoing coordination.

jobs/<job_id>/
├── input/
├── workspace/                  # primary File-as-Bus / system of record
│   ├── paper/ or data/
│   ├── code/                    # mle
│   ├── submission/
│   │   ├── submission.csv
│   │   ├── submission_registry.jsonl
│   │   └── candidates/          # mle
│   └── agent/
│       ├── paper_analysis/ or analysis/
│       ├── prioritized_tasks.md
│       ├── plan.md
│       ├── impl_log.md
│       ├── exp_log.md
│       └── final_self_check.{md,json}   # paper
├── logs/                        # operator / trace layer
├── artifacts/                   # validation / champion reports
├── export/                      # packaged outputs
└── state/                       # host-side runtime metadata

The files inside workspace/ are the bus:

analysis becomes workspace/agent/paper_analysis/*.md for paper and workspace/agent/analysis/summary.md for mle
planning becomes workspace/agent/prioritized_tasks.md and, when needed, workspace/agent/plan.md
implementation and experiments become workspace/agent/impl_log.md and workspace/agent/exp_log.md
MLE candidate search becomes workspace/submission/submission.csv, workspace/submission/submission_registry.jsonl, and workspace/submission/candidates/
paper reproducibility becomes workspace/agent/final_self_check.md, workspace/agent/final_self_check.json, and workspace/submission/reproduce.sh

Outside the bus, the host still preserves logs/, artifacts/, and state/ so the run can be inspected, resumed, validated, exported, and audited later.

🚀 Quick Start

Environment Note
The current Dockerfiles are still tuned for our operator environment. Both docker/paper-agent.Dockerfile and docker/mle-agent.Dockerfile reference internal Ubuntu images and package mirrors. If you are outside that environment, replace those base-image and mirror lines before the first build. See the full notes in the Operator Guide.

Profile Note
The shipped LLM defaults are not symmetric: paper=glm-5, mle=gpt-5.4 in config/llm_profiles.yaml. If you only have OPENAI_API_KEY, run paper commands with --llm-profile gpt-5.4 and use AISCI_PAPER_DOCTOR_PROFILE=gpt-5.4 for paper doctor, or update the default profile locally.

The main README keeps only the shortest runnable happy path. For the full setup, GPU and Docker prerequisites, profile caveats, example scripts, and validation/resume flows, use the Operator Guide.

1. Configure the host

git clone https://github.com/AweAI-Team/AiScientist.git
cd AiScientist

cp .env.example .env
# Fill either OpenAI or Azure OpenAI credentials.
uv sync --dev

Host-side requirements:

Python 3.12+
Docker with a reachable daemon
uv
API credentials for at least one configured LLM backend
Optional NVIDIA GPUs if you want GPU-bound runs, with NVIDIA Container Toolkit configured for Docker

2. Build the default runtime images

If you are not supplying your own runtime images, these are the intended local tags:

bash docker/build_paper_image.sh
bash docker/build_mle_image.sh

aisci-paper:latest
aisci-mle:test

3. Run the built-in health checks

AISCI_PAPER_DOCTOR_PROFILE=gpt-5.4 uv run aisci paper doctor
uv run aisci mle doctor

If you use the shipped Azure-backed glm-5 paper profile, you can drop the AISCI_PAPER_DOCTOR_PROFILE override.

4. Launch one paper run

uv run aisci --env-file .env paper run \
  --paper-md /abs/path/to/paper.md \
  --image aisci-paper:latest \
  --llm-profile gpt-5.4 \
  --gpu-ids 0 \
  --time-limit 24h \
  --wait \
  --tui

5. Launch one MLE run

uv run aisci --env-file .env mle run \
  --zip /abs/path/to/competition.zip \
  --name <competition-slug> \
  --image aisci-mle:test \
  --llm-profile gpt-5.4 \
  --gpu-ids 0 \
  --time-limit 12h \
  --wait \
  --tui

🔍 Inspect, Resume, and Validate

Highest-signal inspection commands:

uv run aisci jobs list
uv run aisci jobs show <job_id>
uv run aisci logs tail <job_id> --kind conversation
uv run aisci artifacts ls <job_id>
uv run aisci export <job_id>

For validation, resume, lifecycle helpers, and detailed troubleshooting, see the Operator Guide.

🗺️ Repo Map

config/                   shared LLM, image, and paper-subagent registries
docker/                   default paper and MLE runtime image recipes
scripts/                  example launch scripts
src/aisci_app/            CLI, job service, presentation, TUI
src/aisci_core/           job models, paths, store, exporter, runner
src/aisci_runtime_docker/ Docker session manager and image profile resolver
src/aisci_domain_paper/   paper-grounded long-horizon ML research engineering
src/aisci_domain_mle/     competition-style long-horizon ML engineering
tests/                    host-side regression tests

AiScientist is opinionated enough to run real work, but still transparent enough that you can inspect every file the lab leaves behind.

❤️ Acknowledgments

AiScientist builds on prior work in research automation, evaluation, and ML task environments, especially:

We are grateful to the authors and maintainers of these projects for making this line of work more concrete, reproducible, and comparable.

📄 License

Released under the MIT License. See LICENSE.

📬 Contact

For questions, collaboration, or bug reports, please open an issue or email 📧 gx.chen.chn@gmail.com.

If AiScientist is useful in your research or engineering workflow, consider starring 🌟 the repo and citing the project.

Quick Start · Two Tracks · Operator Guide

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
config		config
docker		docker
docs		docs
examples		examples
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AiScientist: A File-as-Bus Research Lab

📰 News

🔬 What AiScientist Is

✨ Why It Feels Different

Hierarchical Research Team

File-as-Bus Coordination

Workspace as System of Record

Thin Control over Thick State

⚙️ How It Works

🧭 Two Tracks

Paper Track

MLE Track

💾 What Lands On Disk

🚀 Quick Start

1. Configure the host

2. Build the default runtime images

3. Run the built-in health checks

4. Launch one paper run

5. Launch one MLE run

🔍 Inspect, Resume, and Validate

🗺️ Repo Map

❤️ Acknowledgments

📄 License

📬 Contact

If AiScientist is useful in your research or engineering workflow, consider starring 🌟 the repo and citing the project.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AiScientist: A File-as-Bus Research Lab

📰 News

🔬 What AiScientist Is

✨ Why It Feels Different

Hierarchical Research Team

File-as-Bus Coordination

Workspace as System of Record

Thin Control over Thick State

⚙️ How It Works

🧭 Two Tracks

Paper Track

MLE Track

💾 What Lands On Disk

🚀 Quick Start

1. Configure the host

2. Build the default runtime images

3. Run the built-in health checks

4. Launch one paper run

5. Launch one MLE run

🔍 Inspect, Resume, and Validate

🗺️ Repo Map

❤️ Acknowledgments

📄 License

📬 Contact

If AiScientist is useful in your research or engineering workflow, consider starring 🌟 the repo and citing the project.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages