[Perf] Optimize Sampler Redundant Copy for Model Runner v2, 1.8% Throughput Improvement#35214
[Perf] Optimize Sampler Redundant Copy for Model Runner v2, 1.8% Throughput Improvement#35214yewentao256 wants to merge 7 commits into
Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a performance optimization in the Sampler by avoiding a redundant copy of the logits tensor when no processing is required. This is achieved by adding a new helper method, _needs_logits_processing, which checks if any logit modifications are necessary for the current batch. The expensive copy and subsequent processing are now conditionally executed, which should improve throughput in cases where no special sampling parameters are used. The implementation appears correct and effectively delivers the intended optimization. I have reviewed the changes and found no issues.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
Hi @WoosukKwon , if you prefer a less diff version, see commit Current we build a single per-batch logits-processing plan in the caller, remove redundant checks from the callee sampling ops, and keep behavior unchanged while improving readability and reducing repeated condition evaluation; @WoosukKwon could you please help review? |
|
Hi @yewentao256, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
Part of #35335
We don't need to do an expensive copy each time we call
sampler, this copy is only needed in some special user config.This PR optimized the logic
Test
Acc
Perf