feat: support kFactor 8 used in mma tensor op tile iterator by gavinchen430 · Pull Request #1512 · NVIDIA/cutlass

gavinchen430 · 2024-04-29T03:36:58Z

support Layout::kFactor with 8
loading data from shared memory:

0	16	32	48	64	80	96	112
T0	T1	T2	T3	T4	T5	T6	T7
T9	T8	T11	T10	T13	T12	T15	T14
T18	T19	T16	T17	T22	T23	T20	T21
T27	T26	T25	T24	T31	T30	T29	T28

gavinchen430 · 2024-05-07T07:33:09Z

@hwu36 Could you help review my pull request, please

github-actions · 2024-06-06T08:06:29Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2024-09-04T09:05:31Z

This PR has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates.

* Handle MNK Sm90{Row, Col}Reduction problem shapes (NVIDIA#1803) * add is_last_tile * Improve sm90 mixed dtype kernel (NVIDIA#1883) * Add GMMA shape m64n40k16 (NVIDIA#1864) * Add all supported GMMA shapes (NVIDIA#1890) * add maximum support (NVIDIA#1833) * fix typo (NVIDIA#1853) * fix by adding public (NVIDIA#1753) * added mapping for bf16 to torch::kBFloat16 (NVIDIA#1843) Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> * Fix README (NVIDIA#1658) * Fix README * Improve README --------- Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> * Adjusting code indentation (NVIDIA#1639) * Include of regular_tile_iterator.h fixed for NVRTC (NVIDIA#1765) * Include of regular_tile_iterator.h fixed for NVRTC * More include fixed for NVRTC * Update gemm_f16n_f16t_f32t_tensor_op_f32_sm80.cu with include "cutlass/gemm/device/gemm_universal.h" (NVIDIA#1569) fix compile with `cmake .. -DCUTLASS_ENABLE_TESTS=ON -DCUTLASS_TEST_LEVEL=2` * remove redundant hardcoded packing configs in mixed dtype gemm (NVIDIA#1894) Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> * fix wrong A/BLayout in MMA_Traits for binary mma and append other MMA_Traits support (NVIDIA#1856) * fix wrong A/BLayout in MMA_Traits<SM80_16x8x256_S32U1U1S32_TN_XORPOPC> and append support for m8n8k128, m16n8k128 mma.and.popc in MMA_Traits instantiation * add "print" template for subbyte_reference<T> * Add a print for the uint{x}b_t type. (NVIDIA#1871) * Refactor some GroupedGEMM logic (NVIDIA#1899) * feat: support kFactor 8 used in mma tensor op tile iterator (NVIDIA#1512) * Update publications (NVIDIA#1912) * remove restriction of stride == kernel in nhwc_pooling (NVIDIA#1896) * fix undefined in device code error (NVIDIA#1880) * Fix the racing condition of mixed-input gemm when writing the registers (NVIDIA#1931) * move two warpgroup_wait * merge main --------- Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> * Fix `cutlass` python library with cuda `12.6.2.post1` (NVIDIA#1942) * Fix `cutlass` python library with cuda `12.6.2.post1` Previously we had this error: ``` File "/storage/home/cutlass/python/cutlass/backend/operation.py", line 39, in <listcomp> _version_splits = [int(x) for x in __version__.split("rc")[0].split(".")] ^^^^^^ ValueError: invalid literal for int() with base 10: 'post1' ``` * Update sm90_utils.py * Update generator.py * Update python/cutlass_library/generator.py Co-authored-by: Jack Kosaian <jackkosaian@gmail.com> * Update python/cutlass_library/sm90_utils.py Co-authored-by: Jack Kosaian <jackkosaian@gmail.com> --------- Co-authored-by: Jack Kosaian <jackkosaian@gmail.com> * add {uint4, uint2, int2} => {fp16, bf16} conversion (NVIDIA#1966) * Improve mixed dtype GEMM (NVIDIA#1972) * update * fix a typo * fix a typo that fails the compiling when ElementScale is not the same as MmaType (NVIDIA#1977) * Fix CuTe README Typo (NVIDIA#1951) * Fix Typo (NVIDIA#1962) * 3.6.0 update (NVIDIA#2005) * 3.6.0 update * doc and swap stuff --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> * Update CHANGELOG.md * Update 0x_gemm_tutorial.md (NVIDIA#1982) Shouldn't this be BLK_M, BLK_**K**, k * fix bug: arch/mma_sm60.h Mma<2,2,1> calculate wrong (NVIDIA#1989) * fix mem fence (NVIDIA#2030) Co-authored-by: yuzhai <yuzhai@nvidia.com> * Add half->int8 saturate conversion to promise valid range (NVIDIA#1983) * Add half->int8 saturate conversion to promise valid range * add gpu only macro --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com> * Add vector-types back to platform.h (NVIDIA#2026) * Fix typo in library_defaults.py (NVIDIA#2024) * Fix Typos (NVIDIA#2021) * Fix Typo * Fix Typo * Add Line Break (NVIDIA#2020) * Blockwise Scaling for FP8 (NVIDIA#1932) * F8 Blockwise Scaling * two more NumProducerThreadEvents --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com> * fix assertion in integer_subbytes.h (NVIDIA#1961) * CUTLASS 3.7 (NVIDIA#2045) * CUTLASS 3.7 * clean up changelog --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> * update 3.7 docs (NVIDIA#2051) * update docs * update docs * update docs * update docs * update docs --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> * CUTLASS 3.8 Release (NVIDIA#2059) * CUTLASS 3.8 Release * update * Update README.md * Revert "Update README.md" This reverts commit b353e36. * update * update --------- Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> * fix cuda 12.6 issues (NVIDIA#2066) * fix a readme broken link (NVIDIA#2069) * Update README.md * Groupwise scaling along M for FP8 gemm (NVIDIA#2037) * FP8 groupwise scaling along M * small updates --------- Co-authored-by: zl <zl@deepseek.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> * bugfix generic-k code in top-k with softmax (NVIDIA#1993) * bugfix generic-k code in top-k with softmax * Update include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp Co-authored-by: Ali Hassani <68103095+alihassanijr@users.noreply.github.com> * Update examples/61_hopper_gemm_with_topk_and_softmax/61_hopper_gemm_with_topk_and_softmax.cu Co-authored-by: Ali Hassani <68103095+alihassanijr@users.noreply.github.com> --------- Co-authored-by: Ali Hassani <68103095+alihassanijr@users.noreply.github.com> * [EVT] Add support for Row/Col broadcast PtrArray (NVIDIA#2033) * Add group support to EVT row/col broadcast. * small modifications --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com> * v3.8.0 update (NVIDIA#2082) * 3.8 update * fix Markus' name --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> * [WA] Fix compiling errors --------- Co-authored-by: Saagar Jha <saagar@saagarjha.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> Co-authored-by: Sergey Klevtsov <141879860+sklevtsov-nvidia@users.noreply.github.com> Co-authored-by: Tri Dao <tridao@users.noreply.github.com> Co-authored-by: Xinyu Yang <ltyxy@buaa.edu.cn> Co-authored-by: sijialou <sijia.lou@intel.com> Co-authored-by: Bogumil Sapinski Mobica <48835513+Bogumil-Sapinski-Mobica@users.noreply.github.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> Co-authored-by: Lei Mao <dukeleimao@gmail.com> Co-authored-by: 103yiran <1039105206@qq.com> Co-authored-by: MaxAkaAltmer <MaxAkaAltmer@yandex.ru> Co-authored-by: 侯奇 <houqi1993@gmail.com> Co-authored-by: Lain <28486541+IwakuraRein@users.noreply.github.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Caleb_Du <59528230+CalebDu@users.noreply.github.com> Co-authored-by: LiYu Lu <luliyucoordinate@outlook.com> Co-authored-by: azhurkevich <101208641+azhurkevich@users.noreply.github.com> Co-authored-by: chenwei <15601910741@163.com> Co-authored-by: Wenlei Bao <142055114+wenlei-bao@users.noreply.github.com> Co-authored-by: LiuQiang <thorneliu@gmail.com> Co-authored-by: dan_the_3rd <43445237+danthe3rd@users.noreply.github.com> Co-authored-by: Jack Kosaian <jackkosaian@gmail.com> Co-authored-by: Yujia Zhai <yzhai015@ucr.edu> Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Andrew O'Neill <foolusion@gmail.com> Co-authored-by: Dongxu.Wang <wangdongxuking61@gmail.com> Co-authored-by: ZZK <359521840@qq.com> Co-authored-by: Driss Guessous <32754868+drisspg@users.noreply.github.com> Co-authored-by: ZincCat <52513999+zinccat@users.noreply.github.com> Co-authored-by: Manish Gupta <mgupta.iitr@gmail.com> Co-authored-by: bobliao <codechaser@163.com> Co-authored-by: mihir-awatramani <162148077+mihir-awatramani@users.noreply.github.com> Co-authored-by: Liang <44948473+soundOfDestiny@users.noreply.github.com> Co-authored-by: zl <zl@deepseek.com> Co-authored-by: Tadej Ciglarič <tadej.c@gmail.com> Co-authored-by: Ali Hassani <68103095+alihassanijr@users.noreply.github.com> Co-authored-by: Josh Fromm <jwfromm@meta.com>

)

feat: support kFactor 8 used in mma tensor op tile iterator

5c2f535

gavinchen430 mentioned this pull request May 9, 2024

[feat]: Support weight only gemm with 2bit NVIDIA/TensorRT-LLM#1568

Closed

github-actions bot added the inactive-30d label Jun 6, 2024

github-actions bot added the inactive-90d label Sep 4, 2024

hwu36 approved these changes Oct 29, 2024

View reviewed changes

hwu36 merged commit 19f5159 into NVIDIA:main Oct 29, 2024

hgl71964 pushed a commit to hgl71964/cutlass that referenced this pull request Feb 21, 2025

feat: support kFactor 8 used in mma tensor op tile iterator (NVIDIA#1512

7c4d16c

)

andralex pushed a commit to andralex/cutlass that referenced this pull request Jun 14, 2025

feat: support kFactor 8 used in mma tensor op tile iterator (NVIDIA#1512

4674a60

)

Albresky pushed a commit to Albresky/cutlass that referenced this pull request Oct 11, 2025

feat: support kFactor 8 used in mma tensor op tile iterator (NVIDIA#1512

e92b5b1

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support kFactor 8 used in mma tensor op tile iterator#1512

feat: support kFactor 8 used in mma tensor op tile iterator#1512
hwu36 merged 1 commit intoNVIDIA:mainfrom
gavinchen430:feat_kfactor_8

gavinchen430 commented Apr 29, 2024

Uh oh!

gavinchen430 commented May 7, 2024

Uh oh!

github-actions bot commented Jun 6, 2024

Uh oh!

github-actions bot commented Sep 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0	16	32	48	64	80	96	112
T0	T1	T2	T3	T4	T5	T6	T7
T9	T8	T11	T10	T13	T12	T15	T14
T18	T19	T16	T17	T22	T23	T20	T21
T27	T26	T25	T24	T31	T30	T29	T28

0	16	32	48	64	80	96	112
T0	T1	T2	T3	T4	T5	T6	T7
T9	T8	T11	T10	T13	T12	T15	T14
T18	T19	T16	T17	T22	T23	T20	T21
T27	T26	T25	T24	T31	T30	T29	T28

Conversation

gavinchen430 commented Apr 29, 2024

Uh oh!

gavinchen430 commented May 7, 2024

Uh oh!

github-actions bot commented Jun 6, 2024

Uh oh!

github-actions bot commented Sep 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0	16	32	48	64	80	96	112
T0	T1	T2	T3	T4	T5	T6	T7
T9	T8	T11	T10	T13	T12	T15	T14
T18	T19	T16	T17	T22	T23	T20	T21
T27	T26	T25	T24	T31	T30	T29	T28