Orchestra-o1: Omnimodal Agent Orchestration

Orchestra-o1 is a multi-agent orchestration framework that decomposes complex omnimodal tasks into parallel subtasks, delegating them to specialized SubAgents for execution. It features a MainAgent + SubAgent architecture where the MainAgent (orchestrator) plans and coordinates, while SubAgents execute specific subtasks using various tools.

📰 News

[2026.06] We uploaded the arXiv paper [📄 arXiv] and released the Orchestra-o1-8B model weights [🤗 Model].
[2026.05] Orchestra-o1 achieves 72.8% accuracy on the OmniGAIA benchmark, ranking 1st and outperforming the second-best method by 10.3 percentage points.

🏗️ Architecture

Figure 1. Overview of the Orchestra-o1 framework. The MainAgent orchestrates multi-turn interactions by decomposing omnimodal tasks into independent/dependent subtasks, creating specialized SubAgents with perception tools (image, audio, video analysis) and action tools (web search, page visit, code execution), and executing them in parallel. An online sub-agent specialization module handles sub-task preparation, model selection, tool integration, and memory allocation.

📄 Paper

The paper is available at: [📄 arXiv]

📦 Model Weights

The trained Orchestra-o1-8B model weights are available at: [🤗 Model]

✨ Key Features

🎯 Hierarchical Multi-Agent Architecture: MainAgent orchestrates task decomposition; SubAgents execute subtasks with specialized tools
⚡ Parallel Subtask Execution: Independent subtasks run simultaneously, maximizing throughput
🔧 Rich Tool Ecosystem: Web search, code execution, video/audio/image analysis, URL extraction
🧠 GRPO Training: Train open-source models (Qwen3-8B) as MainAgent using Group Relative Policy Optimization with LLM-as-judge reward
📊 OmniGAIA Benchmark: Comprehensive evaluation on omnimodal question-answering tasks

📁 Project Structure

Orchestra-o1/
├── bench_orchestra_o1_omnigaia.py    # 🚀 Inference with commercial models (e.g., GPT-5)
├── bench_qwen/                       # 🚀 Inference with trained open-source models
│   ├── run_qwen3_8b_grpo.sh          #    One-click: vLLM + benchmark (Qwen3-8B GRPO)
│   ├── bench_qwen_omnigaia.py        #    Benchmark runner for Qwen3-8B
│   ├── eval_qwen.py                  #    Evaluation report generator
│   ├── model_config_qwen.yaml        #    Model config (local vLLM + commercial APIs)
│   └── orchestra_o1_omnigaia_qwen_grpo.yaml  # Benchmark config
├── benchmark/                        #    Benchmark framework
│   ├── common/                       #    Runner, environment abstractions
│   └── omnigaia/                     #    OmniGAIA benchmark implementation & tools
├── base/                             #    Base framework
│   ├── agent/                        #    Agent abstractions (BaseAgent, Memory, ReAct)
│   └── engine/                       #    LLM engine, cost monitoring, logging
├── orchestra_o1/                     # 🎵 Core orchestration framework
│   ├── main_agent.py                 #    MainAgent (orchestrator)
│   ├── config.py                     #    Configuration loader
│   ├── prompts/                      #    Prompt templates (OmniGAIA)
│   ├── runners/                      #    Benchmark runners
│   ├── subagents/                    #    SubAgent implementations (ReAct)
│   └── tools/                        #    Orchestration tools (delegate, complete, trace)
├── train_qwen3_8b/                   # 🏋️ GRPO Training
│   ├── grpo/                         #    GRPO training pipeline
│   │   ├── train_grpo_qwen3_8b.sh    #    Training script (8×H20 GPUs)
│   │   └── reward_fn.py              #    LLM-as-judge multi-dimensional reward
│   └── ds_config.json                #    DeepSpeed config
├── config/                           #    Configuration files
│   ├── model_config.yaml             #    LLM API configuration
│   └── benchmarks/                   #    Benchmark configurations
├── eval/                             #    Evaluation scripts
├── requirements.txt                  #    Python dependencies
└── README.md

🚀 Quick Start

1. Installation

git clone https://github.com/zfkarl/Orchestra-o1.git
cd Orchestra-o1

# Install dependencies
pip install -r requirements.txt

# Create .env file and fill in your API keys (see "Environment Variables" section below)
touch .env

2. Configure Models

Edit config/model_config.yaml with your LLM API credentials:

models:
  "gpt-5":
    api_type: "openai"
    base_url: "https://api.openai.com/v1/"
    api_key: "your_api_key_here"

3. Prepare Dataset

Download the OmniGAIA dataset and place it under data/OmniGAIA/.

4. Run Inference

Mode A: Commercial Model (GPT-5) as MainAgent

python bench_orchestra_o1_omnigaia.py --config config/benchmarks/orchestra_o1_omnigaia.yaml

Mode B: Trained Open-Source Model (Qwen3-8B GRPO) as MainAgent

# One-click: starts vLLM server → waits for ready → runs benchmark → generates report
VLLM_MODEL_PATH=/path/to/your/grpo/checkpoint/huggingface \
CUDA_VISIBLE_DEVICES=0 \
bash bench_qwen/run_qwen3_8b_grpo.sh

🏋️ GRPO Training

Train Qwen3-8B as the MainAgent using GRPO (Group Relative Policy Optimization) with an LLM-as-judge reward function.

Figure 2. Orchestra-o1-8B training pipeline. (a) Training Data Curation: Starting from seed data, we run Orchestra-o1 (GPT-5) to collect trajectories, extract anchor facts across modalities, apply QA rewrites (pivot swapping, temporal shifting, etc.), and filter & verify to produce 1.2K high-quality training samples. (b) DA-GRPO Training: We reconstruct decision examples from expert trajectories, sample G candidate decisions from the base model (Qwen3-8B), score each on 4 dimensions via a rubric reward (format, action, tool, decision quality), compute relative advantages, and optimize with DA-GRPO to produce Orchestra-o1-8B.

Prerequisites

Hardware: 8× H20 (96GB) GPUs (single node)
Software: verl framework
Data: GPT-5 expert trajectories from OmniGAIA benchmark

Training Pipeline

cd train_qwen3_8b/grpo

# Launch GRPO training
VERL_DIR=/path/to/verl MODEL_PATH=/path/to/Qwen3-8B \
bash train_grpo_qwen3_8b.sh

Reward Function

The reward function (reward_fn.py) uses LLM-as-judge (claude-haiku-4-5) to evaluate 4 dimensions:

Dimension	Weight	Range	Description
format_correct	0.10	{0, 1}	JSON format correctness
action_valid	0.10	{0, 1}	Action validity (delegate_task / complete)
tool_reasonable	0.20	[0, 1]	Tool selection & subtask assignment quality
decision_quality	0.60 ★	[0, 1]	Overall decision quality (references GPT-5 expert)

score = 0.10 × format + 0.10 × action + 0.20 × tool + 0.60 × decision ∈ [0, 1]

MainAgent Decision Flow

Analyze the question and omnimodal inputs
Decompose into independent subtasks (Phase 1)
Delegate all independent subtasks in parallel
Evaluate results — sufficient? → complete; need more? → plan Phase 2
Iterate until answer is found or budget exhausted

SubAgent (ReAct)

Each SubAgent follows the ReAct (Reasoning + Acting) paradigm:

Think: Reason about the current state
Act: Use tools (search, code execution, media analysis)
Observe: Process tool outputs
Repeat until task is complete

📊 Evaluation

After running the benchmark, generate evaluation reports:

# For commercial model results
python eval/eval.py --main_agent gpt-5

# For Qwen3-8B GRPO results
python bench_qwen/eval_qwen.py --csv_path logs/omnigaia_qwen_8b_grpo/omnigaia_qwen_xxx.csv --main_agent qwen3-8b-grpo

🔧 Configuration

Environment Variables (`.env`)

Create a .env file in the project root directory with the following content:

# ======== OmniGAIA Benchmark Tools ========
# Jina API - used for web content extraction (ExtractUrlContentAction)
JINA_API_KEY=your_jina_api_key_here

# Serper API - used for Google Search (GoogleSearchAction)
SERPER_API_KEY=your_serper_api_key_here
SERPER_BASE_URL=https://google.serper.dev/search

# ======== LLM API Config ========
# Can also be configured in config/model_config.yaml
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1/

Variable	Description
`OPENAI_API_KEY`	OpenAI API key (for GPT-5, GPT-4o, etc.)
`OPENAI_BASE_URL`	OpenAI API base URL
`JINA_API_KEY`	Jina API key (for web content extraction)
`SERPER_API_KEY`	Serper API key (for Google search)
`SERPER_BASE_URL`	Serper API base URL

Key Training Hyperparameters

Parameter	Default	Description
`TRAIN_BSZ`	24	Training batch size
`ROLLOUT_N`	8	GRPO group size
`MAX_PROMPT_LEN`	24576	Max prompt length (tokens)
`MAX_RESP_LEN`	4096	Max response length (tokens)
`LR`	5e-6	Learning rate
`EPOCHS`	5	Number of training epochs
`TP_SIZE`	8	vLLM tensor parallel size

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

verl — Reinforcement learning framework for LLM training
vLLM — High-throughput LLM serving engine
OmniGAIA — Omnimodal benchmark dataset

📚 Citation

If you find Orchestra-o1 useful, please cite our paper:

@misc{zhang2026orchestrao1omnimodalagentorchestration,
      title={Orchestra-o1: Omnimodal Agent Orchestration}, 
      author={Fan Zhang and Vireo Zhang and Shengju Qian and Haoxuan Li and Hao Wu and Jinyang Wu and Donghao Zhou and Zhihong Zhu and Zheng Lian and Xin Wang and Pheng-Ann Heng},
      year={2026},
      eprint={2606.13707},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2606.13707}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orchestra-o1: Omnimodal Agent Orchestration

📰 News

🏗️ Architecture

📄 Paper

📦 Model Weights

✨ Key Features

📁 Project Structure

🚀 Quick Start

1. Installation

2. Configure Models

3. Prepare Dataset

4. Run Inference

Mode A: Commercial Model (GPT-5) as MainAgent

Mode B: Trained Open-Source Model (Qwen3-8B GRPO) as MainAgent

🏋️ GRPO Training

Prerequisites

Training Pipeline

Reward Function

MainAgent Decision Flow

SubAgent (ReAct)

📊 Evaluation

🔧 Configuration

Environment Variables (`.env`)

Key Training Hyperparameters

📄 License

🙏 Acknowledgments

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
base		base
bench_qwen		bench_qwen
benchmark		benchmark
config		config
eval		eval
figs		figs
orchestra_o1		orchestra_o1
train_qwen3_8b		train_qwen3_8b
README.md		README.md
bench_orchestra_o1_omnigaia.py		bench_orchestra_o1_omnigaia.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Orchestra-o1: Omnimodal Agent Orchestration

📰 News

🏗️ Architecture

📄 Paper

📦 Model Weights

✨ Key Features

📁 Project Structure

🚀 Quick Start

1. Installation

2. Configure Models

3. Prepare Dataset

4. Run Inference

Mode A: Commercial Model (GPT-5) as MainAgent

Mode B: Trained Open-Source Model (Qwen3-8B GRPO) as MainAgent

🏋️ GRPO Training

Prerequisites

Training Pipeline

Reward Function

MainAgent Decision Flow

SubAgent (ReAct)

📊 Evaluation

🔧 Configuration

Environment Variables (.env)

Key Training Hyperparameters

📄 License

🙏 Acknowledgments

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment Variables (`.env`)

Packages