Skip to content

Transpose mla weight offline#1261

Merged
zhyncs merged 2 commits intosgl-project:mainfrom
ispobock:mla_trans_weight
Aug 30, 2024
Merged

Transpose mla weight offline#1261
zhyncs merged 2 commits intosgl-project:mainfrom
ispobock:mla_trans_weight

Conversation

@ispobock
Copy link
Copy Markdown
Collaborator

Motivation

This change will boost performance slightly and reduce runtime cuda memory usage.

Modifications

Preprocess weight after weight loading.

@zhyncs zhyncs merged commit f414352 into sgl-project:main Aug 30, 2024
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants