Skip to content

Sampler cudagraph#1253

Merged
merrymercy merged 16 commits intomainfrom
sampler-cudagraph
Aug 29, 2024
Merged

Sampler cudagraph#1253
merrymercy merged 16 commits intomainfrom
sampler-cudagraph

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Aug 28, 2024

Motivation

This PR fixes the bench_latency correctness problem.

When the sampler merged into the CUDA graph, the next_token_ids became a tensor with fixed physical address, which can always change mutably.

Modifications

tolist()

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@hnyls2002 hnyls2002 requested review from merrymercy and zhyncs August 28, 2024 21:36
@merrymercy merrymercy merged commit 381dd57 into main Aug 29, 2024
@merrymercy merrymercy deleted the sampler-cudagraph branch August 29, 2024 01:58
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
AniZpZ pushed a commit to AniZpZ/sglang that referenced this pull request Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants