[Performance][DSR1]: Fused RoPE+KVCache+q_concat for MLA#40392
Conversation
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces the MLARoPEKVCacheCatFusionPass to optimize MLA RoPE KV cache updates by fusing concatenation and caching operations. The implementation includes updates to the CUDA kernels for flexible data type support, new pattern matchers for DeepSeek scaling and standard RoPE, and integration into the compilation pipeline. Feedback identifies a typo in a test import path and a configuration error where the fusion was enabled for the O0 optimization level, which should remain unoptimized.
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
4bb9f57 to
4d8d6ef
Compare
|
|
||
| class MLARoPEKVCacheCatFusionPass(VllmFusionPatternMatcherPass): | ||
| def __init__(self, config: VllmConfig) -> None: | ||
| super().__init__(config, "mla_rope_kv_cache_fusion_pass") |
There was a problem hiding this comment.
nit: can you make this name consistent with other passes?
There was a problem hiding this comment.
Do you mean camel case or something else? I was trying to keep it consistent with MLAAttentionQuantFusionPass and RopeKVCacheFusionPass
There was a problem hiding this comment.
Yes, this is what I mean -- would it make sense to make MLAAttentionQuantFusionPass camel case as well so all pass names are consistent?
There was a problem hiding this comment.
It's already camelcase: MLAAttnQuantFusionPass
|
@Rohan138 can you try to address this issue? |
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…t#40392) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
…t#40392) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>
…t#40392) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>
…t#40392) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>
…t#40392) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>
…t#40392) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…t#40392) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>
Purpose
Reland updated version of #35245 #35879 #38646, to fuse MLA RoPE and KV Cache ops. Adding some pattern matching fixes/minimization on top of #35879.
Test Plan
Test Result
Main:
Fused:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.