add repetition penalty support#5703
Conversation
|
@merrymercy, I re-added the repetition penalty that was removed by #3988. I found this issue when comparing the sglang output with HF output when the repetition penalty was applied. Please help review it. Thank you. |
|
Why do you need this? OpenAI API does not provide this functionality. Why are the frequency and presence penalty not enough? |
|
@merrymercy , I do offline generation for one of my use cases, which wants to align with HF output, and another use case mainly uses Python Request. Do you mean if I use the frequency and presence penalty, I can get the same behavior with the repetition penalty method? |
|
Can you disable repetition penalty for your HF use cases? |
I can, but I think we shouldn't forbid this usage case for our customer users. |
|
make sense. We can merge this. Can you add some test cases here? sglang/test/srt/test_penalty.py Lines 62 to 70 in 094891c |
e78a8e4 to
9104dbd
Compare
Done. |
|
@merrymercy can we merge it? |
|
@merrymercy could you help review it? thanks! |
|
@merrymercy We initially used vllm for model inference, and recently we plan to integrate sglang, allowing users to choose between vllm and sglang for inference. However, since we previously set the repetition_penalty parameter when using vllm for some tasks, and currently sglang does not support this parameter. It is difficult for us to align the results between vllm and sglang. We hope to have the repetition penalty feature reintegrated to sglang. Thanks! |
|
Is this going to be merged? HF repetition penalty and OpenAI freq/presence penalties have highly different behavior (mainly because HF rep-pen takes the whole prefill context into account, while freq/pres. penalties are only taking currently generated tokens in account) and it is quite a pain having to manually merge this for every SGLang version for our use case. |
|
cc @merrymercy |
|
Support this PR. For small models or those without sufficient SFT, using "anyway" to prevent the model from repeating is still quite necessary. |
|
@XiaobingSuper Hi, Currently, repetition penalty only considers the generated text, not the original input text. Both HF Transformers and vLLM consider the original input text. Modify as follows: |
|
Inactive and duplicate #21258 |
Motivation
This PR is about adding repetition penalty support.
Modifications
Checklist