-
Notifications
You must be signed in to change notification settings - Fork 16
[Instruction][tcgen05] Add copy and commit instruction #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yaoyao Ding <dingyaoyao.cs@gmail.com>
Signed-off-by: Yaoyao Ding <dingyaoyao.cs@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds the tcgen05 copy instruction to the Tilus framework, enabling copy operations between shared memory and tensor memory (TMEM) on NVIDIA GPUs. The implementation includes comprehensive layout handling, instruction emitters, and test coverage for various swizzle modes.
- Adds tcgen05 copy instruction with comprehensive TMEM support
- Implements shared memory layout canonicalization for tcgen05 operations
- Reorganizes layout operations into a structured module hierarchy
Reviewed Changes
Copilot reviewed 53 out of 58 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/ir/layout/test_tcgen05_smem.py |
Test cases for tcgen05 shared memory layouts with various swizzle modes |
tests/instructions/test_tcgen05_copy.py |
Integration tests for tcgen05 copy instruction functionality |
python/tilus/ir/layout/cuda/tcgen05_smem.py |
Core implementation of tcgen05 shared memory layout canonicalization |
python/tilus/backends/emitters/cuda/tcgen05/copy.py |
Code emitter for tcgen05 copy instructions |
python/tilus/ir/instructions/cuda/tmem.py |
Instruction definitions for tcgen05 copy and commit operations |
python/tilus/extensions/hidet/ir/primitives/cuda/tcgen05.py |
Low-level tcgen05 primitive functions and encoding |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Yaoyao Ding <dingyaoyao.cs@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 53 out of 58 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
python/tilus/ir/layout/cuda/tcgen05_smem.py:1
- The calculation of
rhs_indicesis incorrect. It should useaxes[i] % rhs.shape[i]notaxes[i] // rhs.shape[i]. The current implementation duplicates thelhs_indicescalculation.
from __future__ import annotations
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
cdd17f5 to
829a7a0
Compare
Signed-off-by: Yaoyao Ding <dingyaoyao.cs@gmail.com>
This PR adds the following tcgen05 instructions: