Skip to content

Pytorch Binding for GroupedTensor APIs#13

Merged
ksivaman merged 5 commits into
ksivaman:grouped_tensor_pythonfrom
vthumbe1503:vthumbe_grouped_tensor
Feb 4, 2026
Merged

Pytorch Binding for GroupedTensor APIs#13
ksivaman merged 5 commits into
ksivaman:grouped_tensor_pythonfrom
vthumbe1503:vthumbe_grouped_tensor

Conversation

@vthumbe1503

Copy link
Copy Markdown
Collaborator

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

…bly unrelated to my changes

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
@vthumbe1503 vthumbe1503 changed the title Vthumbe grouped tensor Pytorch Binding for GroupedTensor APIs Jan 23, 2026
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
/*! \enum Float8BlockScaleTensorFormat
* \brief Data format for an FP8 block-scaled tensor
*/
enum class Float8BlockScaleTensorFormat {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't need this class at all for grouped tensor

return output_py;
}

py::object quantize_grouped(const py::handle &input, py::handle& output) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally, this function should look like this:

py::object contiguous_grouped_quantize(const at::Tensor &tensor_input,
                                       const at::Tensor &m_splits_tensor,
                                       py::handle single_quantizer)

columnwise_scale_inv=columnwise_scale_inv,
fp8_dtype=self.quantizers[i].dtype,
quantizer=self.quantizers[i],
with_gemm_swizzled_scales=self.quantizers[i].optimize_for_gemm,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for grouped tensor, it's okay to make it always on because there is no TP for group linear

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
@ksivaman ksivaman merged commit 3ba639e into ksivaman:grouped_tensor_python Feb 4, 2026
1 check failed
ksivaman added a commit that referenced this pull request Feb 27, 2026
* Python GroupedTensor and contiguous weights for GroupedLinear

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Graph safe C API for grouped RHT, needs testing

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Zhongbo Zhu <zhongboz@nvidia.com>

* C++ utils, untested

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Pytorch Binding for GroupedTensor APIs (#13)

* changes for pytoch extension; but everything seems to be broken probably unrelated to my changes

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

* fix the issues

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

* comment nvte API since Oleg's PR is not merged

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

* test for all cases:

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

* tensor attributes should be set later

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

---------

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix make grouped tensor api

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fixes to tests

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* PyTorch-Python GroupedTensor

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix test

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* All tests pass

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update transformer_engine/pytorch/tensor/storage/grouped_tensor.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Remove mxfp8 gq test

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* C++ PyTorch GroupedTensor changes WIP

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Compiles

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix runtime failure for test

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix IMA in mxfp8 GQ

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Add CG test for grouped_quantize

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix recipe tests and FP8 weights

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix recipe tests and FP8 weights

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix device test

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Disable grouped weights for unsupported recipes

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Integrate NVFP4 Graph Safe Group Quantize  (#14)

* nvfp4 grouped quantize

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* fix for paged stashing

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* pass all edge cases

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* clean up

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* fix for other recipes

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

---------

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* improve mxfp8 unit test

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* pre-swizzle nvfp4 mxfp8 for MoE

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* avoid having nvte_get_grouped_tensor_param_v2

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* more tests

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* fix group quantize mxfp8 kernel

Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>

* Relaxed restriction for the last dim to be a multiple of 128

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
Signed-off-by: vthumbe1503 <vthumbe@nvidia.com>
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Co-authored-by: Zhongbo Zhu <zhongboz@nvidia.com>
Co-authored-by: vthumbe1503 <vthumbe@nvidia.com>
Co-authored-by: Oleg Goncharov <ogoncharov@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants