add repetition penalty support by XiaobingSuper · Pull Request #5703 · sgl-project/sglang

XiaobingSuper · 2025-04-24T07:12:43Z

Motivation

This PR is about adding repetition penalty support.

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

XiaobingSuper · 2025-04-24T07:15:51Z

@merrymercy, I re-added the repetition penalty that was removed by #3988. I found this issue when comparing the sglang output with HF output when the repetition penalty was applied. Please help review it. Thank you.

merrymercy · 2025-04-26T20:32:07Z

Why do you need this? OpenAI API does not provide this functionality. Why are the frequency and presence penalty not enough?

XiaobingSuper · 2025-04-27T01:10:41Z

@merrymercy , I do offline generation for one of my use cases, which wants to align with HF output, and another use case mainly uses Python Request. Do you mean if I use the frequency and presence penalty, I can get the same behavior with the repetition penalty method?

merrymercy · 2025-04-27T01:15:58Z

Can you disable repetition penalty for your HF use cases?

XiaobingSuper · 2025-04-27T01:50:01Z

Can you disable repetition penalty for your HF use cases?

I can, but I think we shouldn't forbid this usage case for our customer users.

merrymercy · 2025-04-27T02:28:56Z

make sense. We can merge this. Can you add some test cases here?

sglang/test/srt/test_penalty.py

Lines 62 to 70 in 094891c

    
           def test_frequency_penalty(self): 
        
               self.run_decode({"frequency_penalty": 2}) 
        
           def test_min_new_tokens(self): 
        
               self.run_decode({"min_new_tokens": 16}) 
        
           def test_presence_penalty(self): 
        
               self.run_decode({"presence_penalty": 2})

XiaobingSuper · 2025-04-27T06:14:20Z

make sense. We can merge this. Can you add some test cases here?

sglang/test/srt/test_penalty.py

Lines 62 to 70 in 094891c

def test_frequency_penalty(self):

self.run_decode({"frequency_penalty": 2})

def test_min_new_tokens(self):

self.run_decode({"min_new_tokens": 16})

def test_presence_penalty(self):

self.run_decode({"presence_penalty": 2})

Done.

XiaobingSuper · 2025-04-29T01:44:03Z

@merrymercy can we merge it?

XiaobingSuper · 2025-05-07T01:13:47Z

@merrymercy

XiaobingSuper · 2025-06-03T01:09:31Z

@merrymercy could you help review it? thanks!

THU-LIJX · 2025-06-06T07:21:46Z

@merrymercy We initially used vllm for model inference, and recently we plan to integrate sglang, allowing users to choose between vllm and sglang for inference. However, since we previously set the repetition_penalty parameter when using vllm for some tasks, and currently sglang does not support this parameter. It is difficult for us to align the results between vllm and sglang. We hope to have the repetition penalty feature reintegrated to sglang. Thanks!

syskn · 2025-08-12T02:55:51Z

Is this going to be merged? HF repetition penalty and OpenAI freq/presence penalties have highly different behavior (mainly because HF rep-pen takes the whole prefill context into account, while freq/pres. penalties are only taking currently generated tokens in account) and it is quite a pain having to manually merge this for every SGLang version for our use case.

XiaobingSuper · 2025-08-12T03:01:43Z

cc @merrymercy

junliu-mde · 2025-09-02T07:47:19Z

Support this PR. For small models or those without sufficient SFT, using "anyway" to prevent the model from repeating is still quite necessary.

liguodongiot · 2025-10-29T07:18:53Z

@XiaobingSuper Hi, Currently, repetition penalty only considers the generated text, not the original input text. Both HF Transformers and vLLM consider the original input text.

Modify as follows:

class BatchedRepetitionPenalizer(_BatchedPenalizer):

    def _prepare(self):
        batch_cumulated_repetition_penalties = []
        for req in self.orchestrator.reqs():
            cumulated_repetition_penalties_lst = [1] * self.orchestrator.vocab_size
            for idx in req.origin_input_ids:
                cumulated_repetition_penalties_lst[idx] = req.sampling_params.repetition_penalty
            batch_cumulated_repetition_penalties.append(cumulated_repetition_penalties_lst)

        self.cumulated_repetition_penalties = torch.tensor(
            data=batch_cumulated_repetition_penalties,
                dtype=torch.float32,
                device=self.orchestrator.device,
        )

        self.repetition_penalties = (
            torch.tensor(
                data=[
                    req.sampling_params.repetition_penalty
                    for req in self.orchestrator.reqs()
                ],
                dtype=torch.float32,
                device=self.orchestrator.device,
            )
        ).unsqueeze_(1)

hnyls2002 · 2026-03-30T03:39:34Z

Inactive and duplicate #21258

XiaobingSuper requested review from hnyls2002 and merrymercy as code owners April 24, 2025 07:12

add repetition penalty support

9104dbd

XiaobingSuper force-pushed the xiaobing/repetition_penalty branch from e78a8e4 to 9104dbd Compare April 27, 2025 06:13

XiaobingSuper requested review from Ying1123 and zhyncs as code owners April 27, 2025 06:13

XiaobingSuper added 2 commits April 29, 2025 09:44

Merge branch 'main' into xiaobing/repetition_penalty

2617be8

Merge branch 'main' into xiaobing/repetition_penalty

54f6f76

Merge branch 'main' into xiaobing/repetition_penalty

dd86390

Merge branch 'main' into xiaobing/repetition_penalty

b27beb9

Merge branch 'main' into xiaobing/repetition_penalty

c32011d

LincolnYe mentioned this pull request Sep 8, 2025

[Bug] repetition_penalty parameter is non-functional #10142

Closed

5 tasks

zRzRzRzRzRzRzR added a commit to zRzRzRzRzRzRzR/sglang that referenced this pull request Mar 24, 2026

Add repetition_penalties for sgl-project#5703

00b648a

zRzRzRzRzRzRzR mentioned this pull request Mar 24, 2026

[Feature Restoration] repetition_penalty is essential for GLM-V models #21258

Merged

hnyls2002 closed this Mar 30, 2026

Conversation

XiaobingSuper commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

XiaobingSuper commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merrymercy commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XiaobingSuper commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merrymercy commented Apr 27, 2025

Uh oh!

XiaobingSuper commented Apr 27, 2025

Uh oh!

merrymercy commented Apr 27, 2025

Uh oh!

XiaobingSuper commented Apr 27, 2025

Uh oh!

XiaobingSuper commented Apr 29, 2025

Uh oh!

XiaobingSuper commented May 7, 2025

Uh oh!

XiaobingSuper commented Jun 3, 2025

Uh oh!

THU-LIJX commented Jun 6, 2025

Uh oh!

syskn commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XiaobingSuper commented Aug 12, 2025

Uh oh!

junliu-mde commented Sep 2, 2025

Uh oh!

liguodongiot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hnyls2002 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

XiaobingSuper commented Apr 24, 2025 •

edited

Loading

XiaobingSuper commented Apr 24, 2025 •

edited

Loading

merrymercy commented Apr 26, 2025 •

edited

Loading

XiaobingSuper commented Apr 27, 2025 •

edited

Loading

syskn commented Aug 12, 2025 •

edited

Loading

liguodongiot commented Oct 29, 2025 •

edited

Loading