Skip to content

support pinning for mx and nvfp4 tensors#4192

Merged
vkuzo merged 4 commits intomainfrom
gh/vkuzo/238/head
Apr 2, 2026
Merged

support pinning for mx and nvfp4 tensors#4192
vkuzo merged 4 commits intomainfrom
gh/vkuzo/238/head

Conversation

@vkuzo
Copy link
Copy Markdown
Contributor

@vkuzo vkuzo commented Mar 27, 2026

Summary:

Adds memory pinning support for MXTensor and NVFP4Tensor, this is
important for supporting offloading in diffusers.

Fixes the torchao side of #4026

Still requires huggingface/diffusers#13276 to
land for things to work e2e

cc @sayakpaul

Test Plan:

e2e: makes
https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with
mxfp8 and nvfp4,
requires huggingface/diffusers#13276 (not yet
landed)

local: the new tests added in this PR pass

[ghstack-poisoned]
@vkuzo
Copy link
Copy Markdown
Contributor Author

vkuzo commented Mar 27, 2026

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4192

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6e387dc with merge base c49a60e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo added a commit that referenced this pull request Mar 27, 2026
Summary:

Adds memory pinning support for `MXTensor` and `NVFP4Tensor`, this is
important for supporting offloading in `diffusers`.

Fixes the torchao side of #4026

Still requires huggingface/diffusers#13276 to
land for things to work e2e

Test Plan:

e2e: makes
https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with
mxfp8 and nvfp4,
requires huggingface/diffusers#13276 (not yet
landed)

local: the new tests added in this PR pass
ghstack-source-id: ff0b1a2
ghstack-comment-id: 4142562671
Pull-Request: #4192
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 27, 2026
@vkuzo vkuzo added the module: inference quantize_ api inference flow label Mar 27, 2026
@vkuzo vkuzo requested review from andrewor14 and jerryzh168 March 27, 2026 13:28
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Mar 31, 2026
Summary:

Adds memory pinning support for `MXTensor` and `NVFP4Tensor`, this is
important for supporting offloading in `diffusers`.

Fixes the torchao side of #4026

Still requires huggingface/diffusers#13276 to
land for things to work e2e

Test Plan:

e2e: makes
https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with
mxfp8 and nvfp4,
requires huggingface/diffusers#13276 (not yet
landed)

local: the new tests added in this PR pass
ghstack-source-id: 5ad4868
ghstack-comment-id: 4142562671
Pull-Request: #4192
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Mar 31, 2026
Summary:

Adds memory pinning support for `MXTensor` and `NVFP4Tensor`, this is
important for supporting offloading in `diffusers`.

Fixes the torchao side of #4026

Still requires huggingface/diffusers#13276 to
land for things to work e2e

Test Plan:

e2e: makes
https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with
mxfp8 and nvfp4,
requires huggingface/diffusers#13276 (not yet
landed)

local: the new tests added in this PR pass
ghstack-source-id: 7165b48
ghstack-comment-id: 4142562671
Pull-Request: #4192
@andrewor14
Copy link
Copy Markdown
Contributor

@claude review this please

@claude
Copy link
Copy Markdown

claude Bot commented Mar 31, 2026

Claude finished @andrewor14's task in 2m 4s —— View job


Code Review: Support pinning for mx and nvfp4 tensors

  • Read changed files and PR diff
  • Review implementation in mx_tensor.py
  • Review implementation in nvfp4_tensor.py
  • Review tests
  • Post review feedback

Overall this is a clean, straightforward PR. The pattern follows existing conventions in the codebase (e.g., Float8Tensor, Int8Tensor, NF4Tensor all implement pinning the same way). Constructor argument ordering is correct for both tensor types. A few observations:

Implementation (mx_tensor.py:640-657)

The is_pinned and _pin_memory implementations for MXTensor are correct and minimal. The constructor call in mx_pin_memory correctly passes all 8 positional arguments in the right order matching MXTensor.__new__ at line 459.

Implementation (nvfp4_tensor.py:329-357)

The NVFP4Tensor implementation correctly handles the optional per_tensor_scale and act_per_tensor_scale fields in both is_pinned (checking them if non-None) and _pin_memory (conditionally pinning them). Constructor argument order matches NVFP4Tensor.__new__ at line 79.

Minor observations (non-blocking):

  1. aten._pin_memory.default has a device kwarg — The ATen schema for _pin_memory is _pin_memory(Tensor self, Device? device=None) -> Tensor. The device arg is used for pinning to a specific CUDA device. This PR ignores it (doesn't forward kwargs to the inner .pin_memory() calls), but this is consistent with how every other tensor subclass in the repo handles it (Float8Tensor, Int8Tensor, NF4Tensor all ignore it too). So this is fine as-is — just noting for awareness.

  2. MX test only covers float8_e4m3fn and float8_e5m2MXTensor also supports torch.float4_e2m1fn_x2. If FP4 MX tensors are used in diffusers offloading, it might be worth adding a parametrized case for that dtype too. But this is a nice-to-have, not a blocker.

Tests

Both test functions follow a solid pattern:

  • Create on CUDA → move to CPU → verify not pinned → pin → verify pinned → verify inner tensors pinned → verify dequantization equivalence

The NVFP4 test additionally parametrizes over use_per_tensor_scale, which covers the optional field path. The assert not x_cpu.is_pinned() after pinning confirms that pin_memory() returns a new tensor rather than mutating in-place — good.

Verdict: LGTM. Clean implementation, correct constructor calls, good test coverage, and consistent with existing patterns in the codebase.


Comment thread test/prototype/mx_formats/test_nvfp4_tensor.py Outdated
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Apr 2, 2026
Summary:

Adds memory pinning support for `MXTensor` and `NVFP4Tensor`, this is
important for supporting offloading in `diffusers`.

Fixes the torchao side of #4026

Still requires huggingface/diffusers#13276 to
land for things to work e2e

Test Plan:

e2e: makes
https://gist.github.com/vkuzo/9ca863e559eb8af18f6e6afb079bf74f work with
mxfp8 and nvfp4,
requires huggingface/diffusers#13276 (not yet
landed)

local: the new tests added in this PR pass
ghstack-source-id: dbbdf11
ghstack-comment-id: 4142562671
Pull-Request: #4192
@vkuzo vkuzo merged commit a7bb275 into main Apr 2, 2026
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: inference quantize_ api inference flow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants