support pinning for mx and nvfp4 tensors by vkuzo · Pull Request #4192 · pytorch/ao

vkuzo · 2026-03-27T13:26:32Z

Summary:

Adds memory pinning support for MXTensor and NVFP4Tensor, this is
important for supporting offloading in diffusers.

Fixes the torchao side of #4026

Still requires huggingface/diffusers#13276 to
land for things to work e2e

cc @sayakpaul

Test Plan:

e2e: makes
https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with
mxfp8 and nvfp4,
requires huggingface/diffusers#13276 (not yet
landed)

local: the new tests added in this PR pass

[ghstack-poisoned]

vkuzo · 2026-03-27T13:26:33Z

Stack from ghstack (oldest at bottom):

-> support pinning for mx and nvfp4 tensors #4192

pytorch-bot · 2026-03-27T13:26:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4192

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6e387dc with merge base c49a60e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Adds memory pinning support for `MXTensor` and `NVFP4Tensor`, this is important for supporting offloading in `diffusers`. Fixes the torchao side of #4026 Still requires huggingface/diffusers#13276 to land for things to work e2e Test Plan: e2e: makes https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with mxfp8 and nvfp4, requires huggingface/diffusers#13276 (not yet landed) local: the new tests added in this PR pass ghstack-source-id: ff0b1a2 ghstack-comment-id: 4142562671 Pull-Request: #4192

[ghstack-poisoned]

Summary: Adds memory pinning support for `MXTensor` and `NVFP4Tensor`, this is important for supporting offloading in `diffusers`. Fixes the torchao side of #4026 Still requires huggingface/diffusers#13276 to land for things to work e2e Test Plan: e2e: makes https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with mxfp8 and nvfp4, requires huggingface/diffusers#13276 (not yet landed) local: the new tests added in this PR pass ghstack-source-id: 5ad4868 ghstack-comment-id: 4142562671 Pull-Request: #4192

[ghstack-poisoned]

Summary: Adds memory pinning support for `MXTensor` and `NVFP4Tensor`, this is important for supporting offloading in `diffusers`. Fixes the torchao side of #4026 Still requires huggingface/diffusers#13276 to land for things to work e2e Test Plan: e2e: makes https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with mxfp8 and nvfp4, requires huggingface/diffusers#13276 (not yet landed) local: the new tests added in this PR pass ghstack-source-id: 7165b48 ghstack-comment-id: 4142562671 Pull-Request: #4192

andrewor14 · 2026-03-31T19:57:04Z

@claude review this please

claude · 2026-03-31T19:57:30Z

Claude finished @andrewor14's task in 2m 4s —— View job

Code Review: Support pinning for mx and nvfp4 tensors

Read changed files and PR diff
Review implementation in mx_tensor.py
Review implementation in nvfp4_tensor.py
Review tests
Post review feedback

Overall this is a clean, straightforward PR. The pattern follows existing conventions in the codebase (e.g., Float8Tensor, Int8Tensor, NF4Tensor all implement pinning the same way). Constructor argument ordering is correct for both tensor types. A few observations:

Implementation (mx_tensor.py:640-657)

The is_pinned and _pin_memory implementations for MXTensor are correct and minimal. The constructor call in mx_pin_memory correctly passes all 8 positional arguments in the right order matching MXTensor.__new__ at line 459.

Implementation (nvfp4_tensor.py:329-357)

The NVFP4Tensor implementation correctly handles the optional per_tensor_scale and act_per_tensor_scale fields in both is_pinned (checking them if non-None) and _pin_memory (conditionally pinning them). Constructor argument order matches NVFP4Tensor.__new__ at line 79.

Minor observations (non-blocking):

aten._pin_memory.default has a device kwarg — The ATen schema for _pin_memory is _pin_memory(Tensor self, Device? device=None) -> Tensor. The device arg is used for pinning to a specific CUDA device. This PR ignores it (doesn't forward kwargs to the inner .pin_memory() calls), but this is consistent with how every other tensor subclass in the repo handles it (Float8Tensor, Int8Tensor, NF4Tensor all ignore it too). So this is fine as-is — just noting for awareness.
MX test only covers float8_e4m3fn and float8_e5m2 — MXTensor also supports torch.float4_e2m1fn_x2. If FP4 MX tensors are used in diffusers offloading, it might be worth adding a parametrized case for that dtype too. But this is a nice-to-have, not a blocker.

Tests

Both test functions follow a solid pattern:

Create on CUDA → move to CPU → verify not pinned → pin → verify pinned → verify inner tensors pinned → verify dequantization equivalence

The NVFP4 test additionally parametrizes over use_per_tensor_scale, which covers the optional field path. The assert not x_cpu.is_pinned() after pinning confirms that pin_memory() returns a new tensor rather than mutating in-place — good.

Verdict: LGTM. Clean implementation, correct constructor calls, good test coverage, and consistent with existing patterns in the codebase.

[ghstack-poisoned]

Summary: Adds memory pinning support for `MXTensor` and `NVFP4Tensor`, this is important for supporting offloading in `diffusers`. Fixes the torchao side of #4026 Still requires huggingface/diffusers#13276 to land for things to work e2e Test Plan: e2e: makes https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with mxfp8 and nvfp4, requires huggingface/diffusers#13276 (not yet landed) local: the new tests added in this PR pass ghstack-source-id: dbbdf11 ghstack-comment-id: 4142562671 Pull-Request: #4192

Update

80c28ee

[ghstack-poisoned]

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 27, 2026

vkuzo added the module: inference quantize_ api inference flow label Mar 27, 2026

vkuzo requested review from andrewor14 and jerryzh168 March 27, 2026 13:28

sayakpaul approved these changes Mar 27, 2026

View reviewed changes

Update

26bfd9f

[ghstack-poisoned]

Update

831ee7e

[ghstack-poisoned]

jerryzh168 reviewed Apr 1, 2026

View reviewed changes

Comment thread test/prototype/mx_formats/test_nvfp4_tensor.py Outdated

andrewor14 approved these changes Apr 1, 2026

View reviewed changes

Update

6e387dc

[ghstack-poisoned]

vkuzo merged commit a7bb275 into main Apr 2, 2026
57 checks passed

vkuzo mentioned this pull request Apr 2, 2026

Fix #4026: Pinning is not supported when using offloading with NVFP4 #4221

Closed

asomoza mentioned this pull request Apr 6, 2026

[core] fix group offloading when using torchao huggingface/diffusers#13276

Merged

Freed-Wu mentioned this pull request Apr 12, 2026

Add torch.uint16, torch.uint32 #4269

Open

sayakpaul mentioned this pull request Apr 17, 2026

Pinning is not supported when using offloading with NVFP4 #4026

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support pinning for mx and nvfp4 tensors#4192

support pinning for mx and nvfp4 tensors#4192
vkuzo merged 4 commits intomainfrom
gh/vkuzo/238/head

vkuzo commented Mar 27, 2026 •

edited

Loading

Uh oh!

vkuzo commented Mar 27, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

andrewor14 commented Mar 31, 2026

Uh oh!

claude Bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vkuzo commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4192

✅ No Failures

Uh oh!

andrewor14 commented Mar 31, 2026

Uh oh!

claude Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Support pinning for mx and nvfp4 tensors

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vkuzo commented Mar 27, 2026 •

edited

Loading

vkuzo commented Mar 27, 2026 •

edited

Loading

pytorch-bot Bot commented Mar 27, 2026 •

edited

Loading

claude Bot commented Mar 31, 2026 •

edited

Loading