[ROCm][WIP]: Fused aiter rope kvcache mla#35245
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a fusion for RoPE, KV cache update, and concatenation operations for MLA on ROCm with AITER, which is a significant performance optimization. The changes include adding a new custom op for rotary embedding, refactoring the MLA attention layer to separate KV cache updates, and updating tests and configurations. The implementation looks mostly correct, but I've found a potential logic bug in the configuration and a minor inconsistency in function signatures that should be addressed.
| rocm_aiter_ops.is_rmsnorm_enabled() | ||
| and not rocm_aiter_ops.is_triton_gemm_enabled() |
There was a problem hiding this comment.
There seems to be a logic inversion here. The docstring on line 143 states that this fusion should be enabled when using 'AITER Triton GEMMs', but the code on line 149 checks for not rocm_aiter_ops.is_triton_gemm_enabled(). This appears to be a bug that would incorrectly disable the fusion when Triton GEMMs are in use. Should this be rocm_aiter_ops.is_triton_gemm_enabled() to match the docstring and the previous logic?
| rocm_aiter_ops.is_rmsnorm_enabled() | |
| and not rocm_aiter_ops.is_triton_gemm_enabled() | |
| rocm_aiter_ops.is_rmsnorm_enabled() | |
| and rocm_aiter_ops.is_triton_gemm_enabled() |
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has merge conflicts that must be resolved before it can be |
8b5ed5d to
4d6956f
Compare
4d6956f to
f7cad67
Compare
Fuse RoPE+cache+cat ops for MLA e.g. DeepSeek, Kimi on ROCm + AITER similar to #33443.
Requires:
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.