[Tracker][AsyncRL] Support asynchronous RL such as in-flight weight update

Design doc (details on how to configure fully async RL in SkyRL, and the implementations): https://skyrl.readthedocs.io/en/latest/tutorials/fully_async.html

This issue tracks the support of fully async RL (synonymous to: in-flight weight update, and multi-turn partial rollout)

Recent literatures and findings in the RL community (AReal, PipelineRL, ScaleRL, etc.) demonstrate the importance of asynchronous RL for agentic trainings.

SkyRL-train currently supports one-step-off-policy training to allow training and rollout run concurrently (AReal paper figure 1 right): https://github.com/NovaSky-AI/SkyRL/tree/4d5ec4d13777ea1e3a36784201987ac77f6c6fb4/skyrl-train/examples/async

However, we would like to support more advanced schemes such as interruptible trajectories (or partial rollout; where the same trajectories can be completed by multiple model versions) (AReal paper figure 3).

We aim to support such a feature for all CustomGenerator -- that is, it should work out of the box for all agent harnesses (e.g. MiniSWEAgent, Terminus, etc.).

We follow the following steps to achieve this.

- [x] A1: Add `abort_generation` to `vllm_engine.py` that returns already completed tokens to `InferenceEngineClient`. Add `pause_generation()` and `resume_generation()` to `InferenceEngineClient`.
  - https://github.com/NovaSky-AI/SkyRL/pull/537
- [x] A2: Actually support continue generation with `chat_completion` by adding a while loop that keeps sending request to underlying engine with previously already generated tokens, referring to AReal's implementation: https://github.com/inclusionAI/AReaL/blob/ccba1bb709e0ef62ddc62b3701438ae427553385/areal/engine/vllm_remote.py#L234-L238
  - https://github.com/NovaSky-AI/SkyRL/pull/557
- [x] A3: Add the fully async training loop that is generator-agnostic (verify with a SkyRLGymGenerator that uses HTTP endpoint)
  - https://github.com/NovaSky-AI/SkyRL/pull/579
- [x] A4: Support in-flight weight update for `.generate()` so fully async can work for any SkyRLGym examples (e.g. DAPO, search-r1, etc.) -- identical to A2 but for `.generate()`
  - https://github.com/NovaSky-AI/SkyRL/pull/656
- [ ] A5: Add algorithmic corrections (and all metadata, such as model versions, that are required for such corrections), and run experiments for validation
- [ ] A6: Support non-batched `completions` for async RL (cannot support this feature for batched `completions`, since some trajectories will finish before others, breaking the API's semantic)


TODO:
- [ ] Detail the design in this tracker, and make it into a doc
- [ ] Support `RemoteInferenceEngine`: https://github.com/NovaSky-AI/SkyRL/pull/537#discussion_r2461420077
- [ ] Consider killing some codepaths in the rollout stack (e.g. non-async vLLM engine)
- [ ] More robust abort handling, discussed here:
  - https://github.com/NovaSky-AI/SkyRL/pull/537#discussion_r2446770592
  - https://github.com/NovaSky-AI/SkyRL/pull/537#discussion_r2461425398

References:
- https://arxiv.org/abs/2509.19128
- https://arxiv.org/abs/2505.24298v3

<img width="1036" height="247" alt="Image" src="https://github.com/user-attachments/assets/97c69177-0da5-4e63-94b3-abdacefe211c" />

<img width="1020" height="262" alt="Image" src="https://github.com/user-attachments/assets/570ac873-451c-4884-9d96-4e858ae6e6c1" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracker][AsyncRL] Support asynchronous RL such as in-flight weight update #536

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tracker][AsyncRL] Support asynchronous RL such as in-flight weight update #536

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions