I found a fork that attempts to implement multi-turn rollouts using vLLM. I think this would be generally very useful to create reasoning models that can reason over multiple turns in a conversation.
https://github.com/cfpark00/verl/tree/multi_turn_rollout
I found a fork that attempts to implement multi-turn rollouts using vLLM. I think this would be generally very useful to create reasoning models that can reason over multiple turns in a conversation.
https://github.com/cfpark00/verl/tree/multi_turn_rollout