Skip to content

Fix bugs in sampler with CUDA graph / torch.compile#1306

Merged
hnyls2002 merged 4 commits intomainfrom
fix-sampler
Sep 2, 2024
Merged

Fix bugs in sampler with CUDA graph / torch.compile#1306
hnyls2002 merged 4 commits intomainfrom
fix-sampler

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Sep 2, 2024

Motivation

To fix #1301 and the min p sampling with CUDA graph.

Modifications

  • When min p sampling is required, do not use CUDA graph.
  • Abolish torch sampler but use flashinfer's sampler, wrapping it into a torch custom operator to make it compatible with torch.compile.

Performance

  • Before: 155.51813300924485 tokens/s
  • This PR: 156.89880326398756 tokens/s

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@hnyls2002 hnyls2002 enabled auto-merge (squash) September 2, 2024 23:07
@hnyls2002 hnyls2002 merged commit a5a134f into main Sep 2, 2024
@hnyls2002 hnyls2002 deleted the fix-sampler branch September 2, 2024 23:18
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] A100 PCIE torch compile error

2 participants