-
Notifications
You must be signed in to change notification settings - Fork 271
Description
Design doc (details on how to configure fully async RL in SkyRL, and the implementations): https://skyrl.readthedocs.io/en/latest/tutorials/fully_async.html
This issue tracks the support of fully async RL (synonymous to: in-flight weight update, and multi-turn partial rollout)
Recent literatures and findings in the RL community (AReal, PipelineRL, ScaleRL, etc.) demonstrate the importance of asynchronous RL for agentic trainings.
SkyRL-train currently supports one-step-off-policy training to allow training and rollout run concurrently (AReal paper figure 1 right): https://github.com/NovaSky-AI/SkyRL/tree/4d5ec4d13777ea1e3a36784201987ac77f6c6fb4/skyrl-train/examples/async
However, we would like to support more advanced schemes such as interruptible trajectories (or partial rollout; where the same trajectories can be completed by multiple model versions) (AReal paper figure 3).
We aim to support such a feature for all CustomGenerator -- that is, it should work out of the box for all agent harnesses (e.g. MiniSWEAgent, Terminus, etc.).
We follow the following steps to achieve this.
- A1: Add
abort_generationtovllm_engine.pythat returns already completed tokens toInferenceEngineClient. Addpause_generation()andresume_generation()toInferenceEngineClient. - A2: Actually support continue generation with
chat_completionby adding a while loop that keeps sending request to underlying engine with previously already generated tokens, referring to AReal's implementation: https://github.com/inclusionAI/AReaL/blob/ccba1bb709e0ef62ddc62b3701438ae427553385/areal/engine/vllm_remote.py#L234-L238 - A3: Add the fully async training loop that is generator-agnostic (verify with a SkyRLGymGenerator that uses HTTP endpoint)
- A4: Support in-flight weight update for
.generate()so fully async can work for any SkyRLGym examples (e.g. DAPO, search-r1, etc.) -- identical to A2 but for.generate() - A5: Add algorithmic corrections (and all metadata, such as model versions, that are required for such corrections), and run experiments for validation
- A6: Support non-batched
completionsfor async RL (cannot support this feature for batchedcompletions, since some trajectories will finish before others, breaking the API's semantic)
TODO:
- Detail the design in this tracker, and make it into a doc
- Support
RemoteInferenceEngine: [AsyncRL][1/N] Add abort_generation to vllm engine and pause/continue generation to client #537 (comment) - Consider killing some codepaths in the rollout stack (e.g. non-async vLLM engine)
- More robust abort handling, discussed here:
References:
