Long-horizon ML research needs File-as-Bus coordination, not just message handoffs.
Talk is cheap, show me your files.
AiScientist is built for long-horizon ML research engineering, where agents must maintain coherent progress across heterogeneous stages while preserving evolving project state over time.
Across a 24-hour autonomous run, AiScientist repeatedly implements, tests, keeps, and discards candidate ideas while pushing the running best upward. This trajectory shows long-horizon improvement through 78 experiment cycles, with diverse solution strategies explored along the way rather than a single lucky guess.
2026-04-13: Initial public release of AiScientist.2026-04-13: The public README now reflects theFile-as-Busthesis, hierarchical research team design, and long-horizon paper/MLE workflows.- Future updates will include benchmarks, release notes, and project milestones.
AiScientist is an artifact-mediated virtual research lab for long-horizon ML research engineering. It treats long-horizon performance as a joint systems problem: agents must not only orchestrate the right expertise at the right stage, but also preserve evolving project state with enough fidelity for later decisions to stay coherent.
-
paper: given a paper markdown or bundle plus a GPU and time budget, AiScientist autonomously drives the full reproduction loop from reading and planning to implementation, experimentation, debugging, and final self-check. -
mle: given an ML task plus a GPU and time budget, AiScientist autonomously conducts research for stronger solutions through repeated implementation-and-experiment cycles that improve the target metric over time.
File-as-Bus is the core coordination protocol. Instead of compressing progress into lossy conversational handoffs, AiScientist turns workspace files into the system of record for plans, code, experiments, logs, and validation artifacts.
A short look at AiScientist in motion.
AiScientist.mp4 |
- Stage the workspace. AiScientist stages the inputs into a permission-scoped workspace and builds a compact
workspace mapthat acts as the lightweight entry point into the run state. - Launch the sandbox. A Docker sandbox mounts the workspace into canonical paths under
/home, giving agents an isolated execution environment with shared persistent state. - Keep control thin. The
Orchestratormakes stage-level decisions and delegates heavy work to specialists through theAgent-as-Toolpattern. - Keep state thick. Specialists and focused subagents coordinate through
File-as-Busartifacts: they read task-relevant files on demand and write back plans, code, experiments, logs, and validation results. - Leave an inspectable run behind. The run finishes with a workspace, logs, artifacts, and export bundle that can be resumed, validated, diffed, or audited without reconstructing state from memory.
This is the core shift from message handoffs to File-as-Bus coordination: control stays lightweight, while project state remains durable, readable, and reusable on disk.
AiScientist uses one control plane for two long-horizon workloads: paper reproduction and Kaggle-style MLE competitions.
| Track | Primary entrypoints | What the loop optimizes for | Validation endpoint |
|---|---|---|---|
paper |
--paper-md, --zip |
turn paper context into a runnable reproduction through reading, planning, implementation, experimentation, debugging, and final self-check | final self-check plus validation_report.json |
mle |
exactly one of --zip, --name, --workspace-zip, --competition-bundle-zip, or --data-dir |
search for stronger solutions through repeated implementation-and-experiment cycles that improve the target metric over time | submission-format or grading validation |
Both tracks share the same workspace model: durable files on disk become the common state that agents, operators, and validation flows can all inspect later.
paper is the paper-grounded long-horizon ML research track. Starting from --paper-md or a bundled --zip, AiScientist carries work across paper understanding, task planning, implementation, experimentation, debugging, and final self-check under a fixed compute and time budget.
mle is the competition-style long-horizon ML engineering track. Starting from the most self-contained --zip path or a prepared-cache --name, AiScientist iterates through implementation-and-experiment cycles to explore stronger solutions and continuously improve the target metric over time.
Each run leaves a concrete, inspectable tree under jobs/<job_id>/. The full job directory is the durable run record, but workspace/ is the agent-visible File-as-Bus: it is where plans, code, experiments, and submissions persist as the primary system of record for ongoing coordination.
jobs/<job_id>/
├── input/
├── workspace/ # primary File-as-Bus / system of record
│ ├── paper/ or data/
│ ├── code/ # mle
│ ├── submission/
│ │ ├── submission.csv
│ │ ├── submission_registry.jsonl
│ │ └── candidates/ # mle
│ └── agent/
│ ├── paper_analysis/ or analysis/
│ ├── prioritized_tasks.md
│ ├── plan.md
│ ├── impl_log.md
│ ├── exp_log.md
│ └── final_self_check.{md,json} # paper
├── logs/ # operator / trace layer
├── artifacts/ # validation / champion reports
├── export/ # packaged outputs
└── state/ # host-side runtime metadata
The files inside workspace/ are the bus:
- analysis becomes
workspace/agent/paper_analysis/*.mdforpaperandworkspace/agent/analysis/summary.mdformle - planning becomes
workspace/agent/prioritized_tasks.mdand, when needed,workspace/agent/plan.md - implementation and experiments become
workspace/agent/impl_log.mdandworkspace/agent/exp_log.md - MLE candidate search becomes
workspace/submission/submission.csv,workspace/submission/submission_registry.jsonl, andworkspace/submission/candidates/ - paper reproducibility becomes
workspace/agent/final_self_check.md,workspace/agent/final_self_check.json, andworkspace/submission/reproduce.sh
Outside the bus, the host still preserves logs/, artifacts/, and state/ so the run can be inspected, resumed, validated, exported, and audited later.
|
Environment Note |
Profile Note |
The main README keeps only the shortest runnable happy path. For the full setup, GPU and Docker prerequisites, profile caveats, example scripts, and validation/resume flows, use the Operator Guide.
git clone https://github.com/AweAI-Team/AiScientist.git
cd AiScientist
cp .env.example .env
# Fill either OpenAI or Azure OpenAI credentials.
uv sync --devHost-side requirements:
- Python 3.12+
- Docker with a reachable daemon
uv- API credentials for at least one configured LLM backend
- Optional NVIDIA GPUs if you want GPU-bound runs, with NVIDIA Container Toolkit configured for Docker
If you are not supplying your own runtime images, these are the intended local tags:
bash docker/build_paper_image.sh
bash docker/build_mle_image.shaisci-paper:latestaisci-mle:test
AISCI_PAPER_DOCTOR_PROFILE=gpt-5.4 uv run aisci paper doctor
uv run aisci mle doctorIf you use the shipped Azure-backed glm-5 paper profile, you can drop the AISCI_PAPER_DOCTOR_PROFILE override.
uv run aisci --env-file .env paper run \
--paper-md /abs/path/to/paper.md \
--image aisci-paper:latest \
--llm-profile gpt-5.4 \
--gpu-ids 0 \
--time-limit 24h \
--wait \
--tuiuv run aisci --env-file .env mle run \
--zip /abs/path/to/competition.zip \
--name <competition-slug> \
--image aisci-mle:test \
--llm-profile gpt-5.4 \
--gpu-ids 0 \
--time-limit 12h \
--wait \
--tuiHighest-signal inspection commands:
uv run aisci jobs list
uv run aisci jobs show <job_id>
uv run aisci logs tail <job_id> --kind conversation
uv run aisci artifacts ls <job_id>
uv run aisci export <job_id>For validation, resume, lifecycle helpers, and detailed troubleshooting, see the Operator Guide.
config/ shared LLM, image, and paper-subagent registries
docker/ default paper and MLE runtime image recipes
scripts/ example launch scripts
src/aisci_app/ CLI, job service, presentation, TUI
src/aisci_core/ job models, paths, store, exporter, runner
src/aisci_runtime_docker/ Docker session manager and image profile resolver
src/aisci_domain_paper/ paper-grounded long-horizon ML research engineering
src/aisci_domain_mle/ competition-style long-horizon ML engineering
tests/ host-side regression tests
AiScientist is opinionated enough to run real work, but still transparent enough that you can inspect every file the lab leaves behind.
AiScientist builds on prior work in research automation, evaluation, and ML task environments, especially:
We are grateful to the authors and maintainers of these projects for making this line of work more concrete, reproducible, and comparable.
Released under the MIT License. See LICENSE.
For questions, collaboration, or bug reports, please open an issue or email 📧 gx.chen.chn@gmail.com.




