[Feature] Prefill assistant response

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

OAI API doesn't natively support prefilling an assistants response. vLLM and Aphrodite has the additional support for `continue_final_message` which would be need to have for SGLang to give developers even much more control.

Should be relatively easy for someone to implement. It's simply not allowing chat template EOS to take over in a turn where assistant response is last and this flag is enabled and a generation is requested. This was originally implemented with exact same parameter name in transformers, which became a feature in vLLM and Aphrodite.

### Related resources

https://huggingface.co/docs/transformers/main/en/chat_templating
https://github.com/aphrodite-engine/aphrodite-engine/blob/e64075b8937786311f6441fab5103f9ebf4e1dd8/aphrodite/endpoints/openai/protocol.py#L225-L233
https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#id7

Not seeing any extra parameter support
https://docs.sglang.ai/backend/openai_api_completions.html


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Prefill assistant response #3971

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Prefill assistant response #3971

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions