NexRL is a production-ready, distributed LLM post-training framework defined by its ultra-loosely-coupled philosophy. Its service-oriented architecture provides maximum flexibility and extensibility while maintaining clean abstractions and ease of use.
- [2026.02.03] We release the support for On-Policy Distillation. Try with recipe.
- [2026.01.23] Weaver v1.1.0 is released with full fine-tuning support! Try full fine-tuning with Weaver using NexRL (recipe).
- [2026.01.21] We release a blog about the design and features of NexRL. Check it out!
- [2026.01.15] NexRL v1.0.0 is here! Train NexAU agents with zero code modification—just configs and evaluators. New training-service mode supports Weaver and Tinker APIs for effortless cloud training.
- [2025.11.18] NexRL goes open-source! Pre-release version now available.
- Training-as-a-Service & Rollout-as-a-Service: Unified API architecture that seamlessly supports different training and inference frameworks through service abstraction. Switch between training backends (FSDP, Megatron, etc.) and inference engines (SGLang, vLLM, TGI, etc.) without modifying your code.
- Decoupled Modular Architecture: Clean separation of concerns with well-defined interfaces and extensible components. Each module operates independently, enabling easy customization and maintenance.
- Zero-Code Agent-Training Support: Agents can seamlessly integrate with RL training without any RL-specific code modifications.
- Intelligent Resource Management: Configurable placement and co-location of services for optimal performance in distributed environments
- Comprehensive Monitoring: Built-in activity tracking and health checking system for production deployments
- Robust Error Handling: Centralized error reporting and recovery mechanisms for production reliability
NexRL follows a modular architecture where components communicate through explicit interfaces and APIs:
Core Components:
- DataLoader: Provides training data (supports custom datasets)
- RolloutWorker: Executes environment interactions (your agent goes here!)
- TrajectoryPool: Manages trajectory collection and batching
- Trainer: Applies algorithm logic (e.g., GRPO) and coordinates training through service APIs
- WeightSyncController: Manages model weight synchronization between training and inference
Services:
- Inference Service: Adopts the standard OpenAI API as the unified interaction interface with inference engines. This API-centric design ensures that the upper-layer modules can interact with various inference engines (such as SGLang, vLLM, etc.) in a consistent manner, eliminating the need for code modifications when switching between different inference engines.
- Train Service: Utilizes standardized forward() and forward_backward() APIs to communicate with different training backends (including FSDP, Megatron, etc.). To achieve compatibility with diverse backends, we implement lightweight adapters tailored for each backend. These adapters translate the standardized API calls into backend-specific operations, enabling seamless switching of training backends without altering the core training logic.
- Agent Service: Provides a streamlined integration path for agents to participate in RL training. Agents can directly push generated trajectories into the TrajectoryPool through this service, eliminating the need for developers to rewrite or modify agent code to adapt to RL training requirements.
- Python 3.12+
- CUDA 12.8+ (for GPU support)
- Ray 2.48+ (for distributed mode)
- kubectl installed and configured
- Access to a Kubernetes cluster
- Volcano Scheduler installed in the cluster
- High-performance network file system, e.g., GPFS
Check pyproject for the full dependency list.
Install NexRL:
git clone git@github.com:nex-agi/NexRL.git
cd NexRL
# Full install (training + all core dependencies)
pip install -e ".[core]"
# Or lightweight install (CLI job submission only, no torch/ray/etc.)
pip install -e .Zero-Setup (Quickest!)
Run immediately with built-in defaults:
nexrl -m self-hosted \
-c recipe/math/self_hosted.yaml \
--run-nexrlnexrl -m training-service \
-c recipe/math/tinker.yaml \
--run-nexrlUses public images (nexagi/nexrl:v1.4.0, lmsysorg/sglang:v0.5.4.post2) and /tmp storage - perfect for testing!
Development Setup
Use environment variables for quick configuration:
# Option 1: Use the provided setup script
source cli/setup_env.sh
# Option 2: Set variables manually
export NEXRL_STORAGE_PATH="/your/persistent/storage"
export NEXRL_WORKER_IMAGE="your-registry/nexrl:tag"
export WANDB_KEY="your-wandb-key"
# Then run
nexrl -m self-hosted -c recipe/your_recipe.yaml --run-nexrlProduction Setup
Configure cluster with custom images and persistent storage:
# Edit and apply ConfigMaps (one-time setup)
kubectl apply -f cli/setup/01-namespace.yaml
kubectl apply -f cli/setup/02-admin-config.yaml # Edit first!
kubectl apply -f cli/setup/03-user-config.yaml # Edit first!
# Run with production config
nexrl -m self-hosted \
-c recipe/single_turn_math_qwen_2a5_7b/single_turn_math_qwen2a5_7b.yaml \
--run-nexrl --tag prod-v1CLI Options:
-m, --mode:self-hostedortraining-service(required)-c, --train-config: Path to training YAML (required)-r, --run-nexrl: Auto-start training-t, --tag: Custom job tag--serving-only: [self-hosted] Only launch inference--no-serving: [self-hosted] Skip inference
Configuration Priority:
- Kubernetes ConfigMaps (production) →
kubectl apply -f cli/setup/ - Environment Variables (development) →
source cli/setup_env.shorexport NEXRL_* - Built-in Defaults (testing) → public images,
/tmpstorage
Key Variables:
NEXRL_STORAGE_PATH: Storage path (default:/tmp/nexrl)NEXRL_WORKER_IMAGE: Worker image (default:nexagi/nexrl:v1.4.0)NEXRL_CONTROLLER_IMAGE: Controller image (default:nexagi/nexrl:v1.4.0)NEXRL_INFERENCE_IMAGE: Inference image (default:lmsysorg/sglang:v0.5.4.post2)WANDB_KEY: WandB API key (optional)
See also: cli/README.md for comprehensive documentation.
- User Guide: Complete guide for developing and integrating RL algorithms. Train NexAU agents with zero code modification—just provide configuration files and task-specific evaluators.
- Developer Guide: Comprehensive documentation on architecture, APIs, and advanced usage
- Configuration Examples: Ready-to-use training recipes for various models and tasks
- Test Suite: Testing guide and examples
This release represents a foundational version of NexRL, designed to demonstrate our loosely-coupled and service-oriented architecture. We are actively working on preparing the code for open source and will release more of our work soon, including:
- More model & agent support
- Additional trainging and inference backend ntegrations
- High-performance weight synchronization
- Post-training algorithm exploration
- More usability tools
- ...
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
NexRL aims for ultimate scalability and usability, fully embracing the open-source ecosystem to minimize code adaptation costs and improve experimental efficiency. NexRL is built upon several excellent open-source frameworks, including vLLM, SGLang, FSDP, Megatron, and VeRL (the adapter for the FSDP backend adopts the implementation from VeRL). Additionally, the zero-agent code development design of the Agent Service is inspired by Agent Lightning.
