Orchestra-o1 is a multi-agent orchestration framework that decomposes complex omnimodal tasks into parallel subtasks, delegating them to specialized SubAgents for execution. It features a MainAgent + SubAgent architecture where the MainAgent (orchestrator) plans and coordinates, while SubAgents execute specific subtasks using various tools.
- [2026.06] We uploaded the arXiv paper [π arXiv] and released the Orchestra-o1-8B model weights [π€ Model].
- [2026.05] Orchestra-o1 achieves 72.8% accuracy on the OmniGAIA benchmark, ranking 1st and outperforming the second-best method by 10.3 percentage points.
Figure 1. Overview of the Orchestra-o1 framework. The MainAgent orchestrates multi-turn interactions by decomposing omnimodal tasks into independent/dependent subtasks, creating specialized SubAgents with perception tools (image, audio, video analysis) and action tools (web search, page visit, code execution), and executing them in parallel. An online sub-agent specialization module handles sub-task preparation, model selection, tool integration, and memory allocation.
The paper is available at: [π arXiv]
The trained Orchestra-o1-8B model weights are available at: [π€ Model]
- π― Hierarchical Multi-Agent Architecture: MainAgent orchestrates task decomposition; SubAgents execute subtasks with specialized tools
- β‘ Parallel Subtask Execution: Independent subtasks run simultaneously, maximizing throughput
- π§ Rich Tool Ecosystem: Web search, code execution, video/audio/image analysis, URL extraction
- π§ GRPO Training: Train open-source models (Qwen3-8B) as MainAgent using Group Relative Policy Optimization with LLM-as-judge reward
- π OmniGAIA Benchmark: Comprehensive evaluation on omnimodal question-answering tasks
Orchestra-o1/
βββ bench_orchestra_o1_omnigaia.py # π Inference with commercial models (e.g., GPT-5)
βββ bench_qwen/ # π Inference with trained open-source models
β βββ run_qwen3_8b_grpo.sh # One-click: vLLM + benchmark (Qwen3-8B GRPO)
β βββ bench_qwen_omnigaia.py # Benchmark runner for Qwen3-8B
β βββ eval_qwen.py # Evaluation report generator
β βββ model_config_qwen.yaml # Model config (local vLLM + commercial APIs)
β βββ orchestra_o1_omnigaia_qwen_grpo.yaml # Benchmark config
βββ benchmark/ # Benchmark framework
β βββ common/ # Runner, environment abstractions
β βββ omnigaia/ # OmniGAIA benchmark implementation & tools
βββ base/ # Base framework
β βββ agent/ # Agent abstractions (BaseAgent, Memory, ReAct)
β βββ engine/ # LLM engine, cost monitoring, logging
βββ orchestra_o1/ # π΅ Core orchestration framework
β βββ main_agent.py # MainAgent (orchestrator)
β βββ config.py # Configuration loader
β βββ prompts/ # Prompt templates (OmniGAIA)
β βββ runners/ # Benchmark runners
β βββ subagents/ # SubAgent implementations (ReAct)
β βββ tools/ # Orchestration tools (delegate, complete, trace)
βββ train_qwen3_8b/ # ποΈ GRPO Training
β βββ grpo/ # GRPO training pipeline
β β βββ train_grpo_qwen3_8b.sh # Training script (8ΓH20 GPUs)
β β βββ reward_fn.py # LLM-as-judge multi-dimensional reward
β βββ ds_config.json # DeepSpeed config
βββ config/ # Configuration files
β βββ model_config.yaml # LLM API configuration
β βββ benchmarks/ # Benchmark configurations
βββ eval/ # Evaluation scripts
βββ requirements.txt # Python dependencies
βββ README.md
git clone https://github.com/zfkarl/Orchestra-o1.git
cd Orchestra-o1
# Install dependencies
pip install -r requirements.txt
# Create .env file and fill in your API keys (see "Environment Variables" section below)
touch .envEdit config/model_config.yaml with your LLM API credentials:
models:
"gpt-5":
api_type: "openai"
base_url: "https://api.openai.com/v1/"
api_key: "your_api_key_here"Download the OmniGAIA dataset and place it under data/OmniGAIA/.
python bench_orchestra_o1_omnigaia.py --config config/benchmarks/orchestra_o1_omnigaia.yaml# One-click: starts vLLM server β waits for ready β runs benchmark β generates report
VLLM_MODEL_PATH=/path/to/your/grpo/checkpoint/huggingface \
CUDA_VISIBLE_DEVICES=0 \
bash bench_qwen/run_qwen3_8b_grpo.shTrain Qwen3-8B as the MainAgent using GRPO (Group Relative Policy Optimization) with an LLM-as-judge reward function.
Figure 2. Orchestra-o1-8B training pipeline. (a) Training Data Curation: Starting from seed data, we run Orchestra-o1 (GPT-5) to collect trajectories, extract anchor facts across modalities, apply QA rewrites (pivot swapping, temporal shifting, etc.), and filter & verify to produce 1.2K high-quality training samples. (b) DA-GRPO Training: We reconstruct decision examples from expert trajectories, sample G candidate decisions from the base model (Qwen3-8B), score each on 4 dimensions via a rubric reward (format, action, tool, decision quality), compute relative advantages, and optimize with DA-GRPO to produce Orchestra-o1-8B.
- Hardware: 8Γ H20 (96GB) GPUs (single node)
- Software: verl framework
- Data: GPT-5 expert trajectories from OmniGAIA benchmark
cd train_qwen3_8b/grpo
# Launch GRPO training
VERL_DIR=/path/to/verl MODEL_PATH=/path/to/Qwen3-8B \
bash train_grpo_qwen3_8b.shThe reward function (reward_fn.py) uses LLM-as-judge (claude-haiku-4-5) to evaluate 4 dimensions:
| Dimension | Weight | Range | Description |
|---|---|---|---|
| format_correct | 0.10 | {0, 1} | JSON format correctness |
| action_valid | 0.10 | {0, 1} | Action validity (delegate_task / complete) |
| tool_reasonable | 0.20 | [0, 1] | Tool selection & subtask assignment quality |
| decision_quality | 0.60 β | [0, 1] | Overall decision quality (references GPT-5 expert) |
score = 0.10 Γ format + 0.10 Γ action + 0.20 Γ tool + 0.60 Γ decision β [0, 1]
- Analyze the question and omnimodal inputs
- Decompose into independent subtasks (Phase 1)
- Delegate all independent subtasks in parallel
- Evaluate results β sufficient? β complete; need more? β plan Phase 2
- Iterate until answer is found or budget exhausted
Each SubAgent follows the ReAct (Reasoning + Acting) paradigm:
- Think: Reason about the current state
- Act: Use tools (search, code execution, media analysis)
- Observe: Process tool outputs
- Repeat until task is complete
After running the benchmark, generate evaluation reports:
# For commercial model results
python eval/eval.py --main_agent gpt-5
# For Qwen3-8B GRPO results
python bench_qwen/eval_qwen.py --csv_path logs/omnigaia_qwen_8b_grpo/omnigaia_qwen_xxx.csv --main_agent qwen3-8b-grpoCreate a .env file in the project root directory with the following content:
# ======== OmniGAIA Benchmark Tools ========
# Jina API - used for web content extraction (ExtractUrlContentAction)
JINA_API_KEY=your_jina_api_key_here
# Serper API - used for Google Search (GoogleSearchAction)
SERPER_API_KEY=your_serper_api_key_here
SERPER_BASE_URL=https://google.serper.dev/search
# ======== LLM API Config ========
# Can also be configured in config/model_config.yaml
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1/| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key (for GPT-5, GPT-4o, etc.) |
OPENAI_BASE_URL |
OpenAI API base URL |
JINA_API_KEY |
Jina API key (for web content extraction) |
SERPER_API_KEY |
Serper API key (for Google search) |
SERPER_BASE_URL |
Serper API base URL |
| Parameter | Default | Description |
|---|---|---|
TRAIN_BSZ |
24 | Training batch size |
ROLLOUT_N |
8 | GRPO group size |
MAX_PROMPT_LEN |
24576 | Max prompt length (tokens) |
MAX_RESP_LEN |
4096 | Max response length (tokens) |
LR |
5e-6 | Learning rate |
EPOCHS |
5 | Number of training epochs |
TP_SIZE |
8 | vLLM tensor parallel size |
This project is licensed under the MIT License - see the LICENSE file for details.
- verl β Reinforcement learning framework for LLM training
- vLLM β High-throughput LLM serving engine
- OmniGAIA β Omnimodal benchmark dataset
If you find Orchestra-o1 useful, please cite our paper:
@misc{zhang2026orchestrao1omnimodalagentorchestration,
title={Orchestra-o1: Omnimodal Agent Orchestration},
author={Fan Zhang and Vireo Zhang and Shengju Qian and Haoxuan Li and Hao Wu and Jinyang Wu and Donghao Zhou and Zhihong Zhu and Zheng Lian and Xin Wang and Pheng-Ann Heng},
year={2026},
eprint={2606.13707},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2606.13707},
}
