Skip to content

[Feature] Prefill assistant response #3971

@RealWorga

Description

@RealWorga

Checklist

Motivation

OAI API doesn't natively support prefilling an assistants response. vLLM and Aphrodite has the additional support for continue_final_message which would be need to have for SGLang to give developers even much more control.

Should be relatively easy for someone to implement. It's simply not allowing chat template EOS to take over in a turn where assistant response is last and this flag is enabled and a generation is requested. This was originally implemented with exact same parameter name in transformers, which became a feature in vLLM and Aphrodite.

Related resources

https://huggingface.co/docs/transformers/main/en/chat_templating
https://github.com/aphrodite-engine/aphrodite-engine/blob/e64075b8937786311f6441fab5103f9ebf4e1dd8/aphrodite/endpoints/openai/protocol.py#L225-L233
https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#id7

Not seeing any extra parameter support
https://docs.sglang.ai/backend/openai_api_completions.html

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions