Checklist
Motivation
Using Server mode to generate Rollout in Agentic RL training is a very necessary and natural approach. However, the design of Agent Scaffold typically only considers compatibility with OpenAI compatible API interface, making it difficult to collect token IDs at the Agent Scaffold level—information that is essential for training. Additionally, current design couples tokenization with the inference model, which indicates it's a logically sound idea to let inference engine handle tokenization.
Thus, a tokenize and endpoint is needed.
Related resources
Maybe refer to vllm's tokenize endpoint. https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#tokenizer-api
Checklist
Motivation
Using Server mode to generate Rollout in Agentic RL training is a very necessary and natural approach. However, the design of Agent Scaffold typically only considers compatibility with OpenAI compatible API interface, making it difficult to collect token IDs at the Agent Scaffold level—information that is essential for training. Additionally, current design couples tokenization with the inference model, which indicates it's a logically sound idea to let inference engine handle tokenization.
Thus, a
tokenizeand endpoint is needed.Related resources
Maybe refer to vllm's
tokenizeendpoint. https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#tokenizer-api