Skip to content

Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts#3988

Merged
merrymercy merged 26 commits intomainfrom
lianmin/many-improve
Mar 3, 2025
Merged

Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts#3988
merrymercy merged 26 commits intomainfrom
lianmin/many-improve

Conversation

@merrymercy
Copy link
Copy Markdown
Contributor

@merrymercy merrymercy commented Mar 2, 2025

  • Support penalty in overlap mode
  • Support chunked prefill + input logprob
  • Improve benchmark script and profiler
  • rename "token_ids" to "output_ids" in the return value when using --skip-tokenizer-init
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>

@merrymercy merrymercy changed the title Support penalty; return logprob with chunked prefill; improve benchmark scripts Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts Mar 2, 2025
Comment thread test/srt/test_mla.py
@merrymercy merrymercy merged commit 3f77ac7 into main Mar 3, 2025
@merrymercy merrymercy deleted the lianmin/many-improve branch March 3, 2025 08:05
merrymercy added a commit that referenced this pull request Mar 3, 2025
… improve benchmark scripts (#3988)

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
zhaochenyang20 pushed a commit that referenced this pull request Mar 4, 2025
@CSEEduanyu
Copy link
Copy Markdown

hi @merrymercy , why remove repetition_penalty.py?

@XiaobingSuper
Copy link
Copy Markdown

@merrymercy I have same question, why remove repetition_penalty.py? for now, how it works when repetition_penalty is set?

Copy link
Copy Markdown
Contributor

@elvischenv elvischenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove TPOT?

Comment on lines 1068 to 1076
print("{:<40} {:<10.2f}".format("P99 TTFT (ms):", metrics.p99_ttft_ms))
print(
"{s:{c}^{n}}".format(s="Time per Output Token (excl. 1st token)", n=50, c="-")
)
print("{:<40} {:<10.2f}".format("Mean TPOT (ms):", metrics.mean_tpot_ms))
print("{:<40} {:<10.2f}".format("Median TPOT (ms):", metrics.median_tpot_ms))
print("{:<40} {:<10.2f}".format("P99 TPOT (ms):", metrics.p99_tpot_ms))
print("{s:{c}^{n}}".format(s="Inter-token Latency", n=50, c="-"))
print("{s:{c}^{n}}".format(s="Inter-Token Latency", n=50, c="-"))
print("{:<40} {:<10.2f}".format("Mean ITL (ms):", metrics.mean_itl_ms))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove TPOT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants