Checklist
Motivation
OAI API doesn't natively support prefilling an assistants response. vLLM and Aphrodite has the additional support for continue_final_message which would be need to have for SGLang to give developers even much more control.
Should be relatively easy for someone to implement. It's simply not allowing chat template EOS to take over in a turn where assistant response is last and this flag is enabled and a generation is requested. This was originally implemented with exact same parameter name in transformers, which became a feature in vLLM and Aphrodite.
Related resources
https://huggingface.co/docs/transformers/main/en/chat_templating
https://github.com/aphrodite-engine/aphrodite-engine/blob/e64075b8937786311f6441fab5103f9ebf4e1dd8/aphrodite/endpoints/openai/protocol.py#L225-L233
https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#id7
Not seeing any extra parameter support
https://docs.sglang.ai/backend/openai_api_completions.html
Checklist
Motivation
OAI API doesn't natively support prefilling an assistants response. vLLM and Aphrodite has the additional support for
continue_final_messagewhich would be need to have for SGLang to give developers even much more control.Should be relatively easy for someone to implement. It's simply not allowing chat template EOS to take over in a turn where assistant response is last and this flag is enabled and a generation is requested. This was originally implemented with exact same parameter name in transformers, which became a feature in vLLM and Aphrodite.
Related resources
https://huggingface.co/docs/transformers/main/en/chat_templating
https://github.com/aphrodite-engine/aphrodite-engine/blob/e64075b8937786311f6441fab5103f9ebf4e1dd8/aphrodite/endpoints/openai/protocol.py#L225-L233
https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#id7
Not seeing any extra parameter support
https://docs.sglang.ai/backend/openai_api_completions.html