Pytorch Binding for GroupedTensor APIs by vthumbe1503 · Pull Request #13 · ksivaman/TransformerEngine-1

vthumbe1503 · 2026-01-23T17:54:00Z

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…bly unrelated to my changes Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

zhongbozhu · 2026-02-02T18:23:18Z

 /*! \enum Float8BlockScaleTensorFormat
 *  \brief Data format for an FP8 block-scaled tensor
 */
 enum class Float8BlockScaleTensorFormat {


we shouldn't need this class at all for grouped tensor

zhongbozhu · 2026-02-02T18:27:10Z

  return output_py;
 }

+py::object quantize_grouped(const py::handle &input, py::handle& output) {


ideally, this function should look like this:

py::object contiguous_grouped_quantize(const at::Tensor &tensor_input, const at::Tensor &m_splits_tensor, py::handle single_quantizer)

zhongbozhu · 2026-02-02T18:29:19Z

                    columnwise_scale_inv=columnwise_scale_inv,
                    fp8_dtype=self.quantizers[i].dtype,
                    quantizer=self.quantizers[i],
+                    with_gemm_swizzled_scales=self.quantizers[i].optimize_for_gemm,


for grouped tensor, it's okay to make it always on because there is no TP for group linear

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

* Python GroupedTensor and contiguous weights for GroupedLinear Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Graph safe C API for grouped RHT, needs testing Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Zhongbo Zhu <zhongboz@nvidia.com> * C++ utils, untested Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Pytorch Binding for GroupedTensor APIs (#13) * changes for pytoch extension; but everything seems to be broken probably unrelated to my changes Signed-off-by: Varun Thumbe <vthumbe@nvidia.com> * fix the issues Signed-off-by: Varun Thumbe <vthumbe@nvidia.com> * comment nvte API since Oleg's PR is not merged Signed-off-by: Varun Thumbe <vthumbe@nvidia.com> * test for all cases: Signed-off-by: Varun Thumbe <vthumbe@nvidia.com> * tensor attributes should be set later Signed-off-by: Varun Thumbe <vthumbe@nvidia.com> --------- Signed-off-by: Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix make grouped tensor api Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes to tests Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * PyTorch-Python GroupedTensor Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix test Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * All tests pass Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/tensor/storage/grouped_tensor.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove mxfp8 gq test Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * C++ PyTorch GroupedTensor changes WIP Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Compiles Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix runtime failure for test Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix IMA in mxfp8 GQ Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add CG test for grouped_quantize Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix recipe tests and FP8 weights Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix recipe tests and FP8 weights Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix device test Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Disable grouped weights for unsupported recipes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Integrate NVFP4 Graph Safe Group Quantize (#14) * nvfp4 grouped quantize Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * fix for paged stashing Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * pass all edge cases Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * clean up Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * fix for other recipes Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> --------- Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * improve mxfp8 unit test Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * pre-swizzle nvfp4 mxfp8 for MoE Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * avoid having nvte_get_grouped_tensor_param_v2 Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * more tests Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * fix group quantize mxfp8 kernel Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> * Relaxed restriction for the last dim to be a multiple of 128 Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by: vthumbe1503 <vthumbe@nvidia.com> Signed-off-by: Varun Thumbe <vthumbe@nvidia.com> Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> Co-authored-by: Zhongbo Zhu <zhongboz@nvidia.com> Co-authored-by: vthumbe1503 <vthumbe@nvidia.com> Co-authored-by: Oleg Goncharov <ogoncharov@nvidia.com>

vthumbe1503 added 2 commits January 23, 2026 17:51

changes for pytoch extension; but everything seems to be broken proba…

1809820

…bly unrelated to my changes Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

fix the issues

b8cd1b3

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

vthumbe1503 changed the title ~~Vthumbe grouped tensor~~ Pytorch Binding for GroupedTensor APIs Jan 23, 2026

vthumbe1503 added 2 commits January 23, 2026 19:00

comment nvte API since Oleg's PR is not merged

b7a355d

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

test for all cases:

de31ee2

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

zhongbozhu reviewed Feb 2, 2026

View reviewed changes

tensor attributes should be set later

4d25743

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

ksivaman merged commit 3ba639e into ksivaman:grouped_tensor_python Feb 4, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch Binding for GroupedTensor APIs#13

Pytorch Binding for GroupedTensor APIs#13
ksivaman merged 5 commits into
ksivaman:grouped_tensor_pythonfrom
vthumbe1503:vthumbe_grouped_tensor

vthumbe1503 commented Jan 23, 2026

Uh oh!

zhongbozhu Feb 2, 2026

Uh oh!

zhongbozhu Feb 2, 2026

Uh oh!

zhongbozhu Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vthumbe1503 commented Jan 23, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

zhongbozhu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

zhongbozhu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

zhongbozhu Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants