Skip to content

[Feat] Add window attention for gemma-2#1056

Merged
Ying1123 merged 4 commits intomainfrom
ying-window
Aug 14, 2024
Merged

[Feat] Add window attention for gemma-2#1056
Ying1123 merged 4 commits intomainfrom
ying-window

Conversation

@Ying1123
Copy link
Copy Markdown
Contributor

@Ying1123 Ying1123 commented Aug 12, 2024

DO NOT turn on auto-merge.
I'll merge it manually.

Comment thread test/srt/models/test_generation_models.py Outdated
Comment thread python/sglang/srt/server_args.py Outdated
@Ying1123 Ying1123 changed the title feat: Add window attention for gemma-2 [Feat] Add window attention for gemma-2 Aug 12, 2024
@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented Aug 12, 2024

@Ying1123 We can temporarily change DEFAULT_MODEL_NAME_FOR_TEST to gemma to trigger the CI test, and then change it back to Llama after verification.

@Ying1123 Ying1123 force-pushed the ying-window branch 3 times, most recently from 7ac08ad to 0ad6781 Compare August 12, 2024 12:38
@zhyncs zhyncs added the enhancement New feature or request label Aug 12, 2024
Copy link
Copy Markdown

@NihalPotdar NihalPotdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out the engine with the gemma-2-2b, 9b, and 27b models, and looks good. Left some quick comments. Overall lgtm!

Comment thread python/sglang/srt/layers/radix_attention.py Outdated
Comment thread python/sglang/srt/model_executor/model_runner.py Outdated
@Ying1123 Ying1123 force-pushed the ying-window branch 7 times, most recently from 06d3450 to f0f9941 Compare August 13, 2024 22:04
@Ying1123 Ying1123 force-pushed the ying-window branch 2 times, most recently from 224b293 to 749a8ff Compare August 13, 2024 22:41
Comment thread python/sglang/srt/server_args.py Outdated
@Ying1123 Ying1123 merged commit 0909bb0 into main Aug 14, 2024
@Ying1123 Ying1123 deleted the ying-window branch August 14, 2024 00:01
Comment thread python/sglang/srt/models/gemma2.py
self.disable_radix_cache = True
self.disable_regex_jump_forward = True
self.disable_flashinfer = False
self.disable_cuda_graph = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda graph should be turned on

Comment thread python/sglang/test/long_prompt
Comment thread python/sglang/srt/model_executor/forward_batch_info.py
@Ying1123
Copy link
Copy Markdown
Contributor Author

Additional comments addressed in #1090

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants