This document introduces the verl framework, its purpose as a reinforcement learning (RL) training library for large language models (LLMs), and provides a high-level overview of the system architecture. For detailed information about specific subsystems, see the child pages: System Architecture and HybridFlow Design, Key Innovations and Design Patterns, and Supported Algorithms and Models.
verl (Volcano Engine Reinforcement Learning) is a flexible, efficient, and production-ready framework for post-training LLMs using RL algorithms. It is the open-source implementation of the HybridFlow paper README.md26-27 presented at EuroSys 2025 README.md14
The framework addresses the challenge of efficiently orchestrating complex RL training workflows that involve multiple distributed components: policy training, inference generation, reward computation, and value estimation. verl enables researchers and practitioners to:
Sources: README.md22-48 docs/index.rst1-22
verl provides a complete software stack for LLM post-training via reinforcement learning:
| Component | Purpose | Key Classes / Entities |
|---|---|---|
| Programming Model | Define RL algorithm dataflows | HybridFlow (single/multi-controller) docs/index.rst8-9 |
| Training Orchestration | Coordinate distributed execution | RayPPOTrainer verl/trainer/ppo/ray_trainer.py29 |
| Distributed Workers | Execute training/inference tasks | ActorRolloutRefWorker, TrainingWorker verl/trainer/main_ppo.py128-154 |
| Training Engines | Backend for model training | FSDPEngine, MegatronEngine, VeOmniEngine docs/index.rst91-93 docs/index.rst144 |
| Inference Engines | High-throughput generation | vLLM, SGLang, TensorRT-LLM docs/index.rst94-95 |
| Data Pipeline | Load and process training data | RLHFDataset, DataProto verl/trainer/ppo/ray_trainer.py36 docs/index.rst49 |
| Configuration System | Manage complex configurations | Hydra framework with OmegaConf verl/trainer/config/ppo_trainer.yaml1-10 |
The framework supports training on various hardware platforms including NVIDIA GPUs, AMD GPUs (ROCm), and Huawei Ascend NPUs docs/index.rst150-165
Sources: README.md24-48 docs/index.rst4-22 docs/start/install.rst10-31 verl/trainer/ppo/ray_trainer.py16-60
The HybridFlow programming model is the foundation of verl's flexibility. It combines two execution paradigms docs/index.rst8-9:
RayPPOTrainer which manages worker groups through RayWorkerGroup verl/trainer/ppo/ray_trainer.py39For details on the programming model, see System Architecture and HybridFlow Design.
Sources: README.md28-34 docs/index.rst6-10 docs/index.rst42-43 verl/trainer/ppo/ray_trainer.py16-39
The following diagram shows the verl system architecture, mapping high-level concepts to concrete code entities:
Sources: verl/trainer/ppo/ray_trainer.py17-52 verl/trainer/main_ppo.py109-154 verl/workers/engine_workers.py128-154
The RayPPOTrainer class serves as the central orchestrator verl/trainer/ppo/ray_trainer.py29 It is typically launched via a TaskRunner verl/trainer/main_ppo.py109-113 It initializes the Ray cluster, spawns worker groups (actor, critic, reward model) using ResourcePoolManager verl/trainer/ppo/ray_trainer.py39 and implements the main training loop including rollout, reward extraction verl/trainer/ppo/ray_trainer.py52 and advantage computation verl/trainer/ppo/ray_trainer.py136-144
Workers are Ray remote actors that execute specific roles:
The framework supports multiple training backends via configuration:
FSDPEngine for sharding and memory management, supporting FSDP and FSDP2 docs/start/install.rst20MegatronEngine for model parallelism and scalability, often integrated via mbridge docs/start/install.rst20 setup.py60verl integrates with high-performance inference backends:
Sources: verl/trainer/ppo/ray_trainer.py16-162 verl/trainer/main_ppo.py109-154 docs/start/install.rst12-31
The following diagram shows how data flows through a training iteration, associating system names with code entities:
Sources: verl/trainer/ppo/ray_trainer.py136-162 verl/trainer/ppo/core_algos.py70-85 verl/trainer/ppo/reward.py52
verl supports a wide range of algorithms and model architectures:
For details, see Supported Algorithms and Models.
Sources: README.md60-63 docs/index.rst71-84 verl/trainer/ppo/core_algos.py88-112
To begin using verl:
install.sh docs/start/install.rst1-100main_ppo.py entry point verl/trainer/main_ppo.py36-46For more information, see Getting Started.
Sources: docs/start/install.rst1-100 verl/trainer/main_ppo.py1-108 setup.py81-104
Refresh this wiki