Skip to content

Optimize conflicts between CUDA graph and vocab mask tensors#1392

Merged
merrymercy merged 11 commits intomainfrom
vocab-mask-cuda-graph
Sep 14, 2024
Merged

Optimize conflicts between CUDA graph and vocab mask tensors#1392
merrymercy merged 11 commits intomainfrom
vocab-mask-cuda-graph

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Sep 11, 2024

Motivation

This PR moves the sampler out of the CUDA graph and removes the synchronization in the cuda graph replay.

  • Vocab mask, penalties calculation are done between model forward and sampling
  • Make sampling a part of model_runner.

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@hnyls2002 hnyls2002 force-pushed the vocab-mask-cuda-graph branch from 0d2f2c5 to 1ef88ff Compare September 11, 2024 21:12
@hnyls2002 hnyls2002 marked this pull request as draft September 12, 2024 19:58
@hnyls2002 hnyls2002 force-pushed the vocab-mask-cuda-graph branch from 09c900d to 92b9594 Compare September 13, 2024 00:15
@hnyls2002 hnyls2002 force-pushed the vocab-mask-cuda-graph branch 3 times, most recently from 6e20c93 to 4f0d0cb Compare September 13, 2024 22:32
@hnyls2002 hnyls2002 force-pushed the vocab-mask-cuda-graph branch from 4f0d0cb to e893a95 Compare September 13, 2024 22:36
@hnyls2002 hnyls2002 force-pushed the vocab-mask-cuda-graph branch 2 times, most recently from f46a050 to 3273d3f Compare September 14, 2024 00:22
@hnyls2002 hnyls2002 marked this pull request as ready for review September 14, 2024 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants