Support using Int4PreshuffledTensor after loading by jerryzh168 · Pull Request #26066 · vllm-project/vllm

jerryzh168 · 2025-10-02T01:08:52Z

Summary:
Int4PreshuffledTensor has fasted int4 kernel for int4 weight only and fp8 act + int4 weight in fbgemm, but we can't slice the Tensor due to the preshuffling (and slice has to preserve alias) so we have to use Int4Tensor (plain format) so it can be sliced during loading, and convert the Tensor to preshuffled format after loading using torchao.prototype.tensor_conversion.api.convert_to_packed_tensor_based_on_current_hardware function.

Test Plan:
pytest tests/quantization/test_torchao.py -k test_opt_125m_int4wo_model_running_preshuffled_kernel For test we uploaded a plain int4 tensor checkpoint https://huggingface.co/torchao-testing/opt-125m-Int4WeightOnlyConfig-v2-0.14.0.dev and load it in vllm, then check the model is transformed to use Int4PreshuffledTensor before inference

Reviewers:

Subscribers:

Tasks:

Tags:

mergify · 2025-10-08T05:04:16Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jerryzh168.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

chatgpt-codex-connector · 2025-10-24T21:40:41Z

💡 Codex Review

https://github.com/vllm-project/vllm/blob/ed36abce2e9fc2610de62da6ed187f4727a10302/vllm/model_executor/layers/quantization/torchao.py#L308-L317
Preserve weight metadata when converting to packed tensor

Replacing layer.weight with a fresh Parameter after calling convert_to_packed_tensor_based_on_current_hardware drops all of the attributes that were attached during create_weights (input_dim, output_dim, weight_loader, etc.) and also flips requires_grad back to True. Those attributes are used by the loader/reload path (for example gpu_model_runner.reload_weights expects weight_loader and input_dim to exist), so after the conversion any attempt to reload weights or reshard the tensor will raise an AttributeError or operate on the wrong dimensions. The new parameter should preserve the original metadata and requires_grad=False instead of creating a bare Parameter.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

houseroad · 2025-10-31T05:53:15Z

CI is broken, can we take a look?

jerryzh168 · 2025-10-31T17:26:15Z

oh OK, will fix

Summary: Int4PreshuffledTensor has fasted int4 kernel for int4 weight only and fp8 act + int4 weight in fbgemm, but we can't slice the Tensor due to the preshuffling (and slice has to preserve alias) so we have to use Int4Tensor (plain format) so it can be sliced during loading, and convert the Tensor to preshuffled format after loading using `torchao.prototype.tensor_conversion.api.convert_to_packed_tensor_based_on_current_hardware` function. Test Plan: pytest tests/quantization/test_torchao.py -k test_opt_125m_int4wo_model_running_preshuffled_kernel For test we uploaded a plain int4 tensor checkpoint https://huggingface.co/torchao-testing/opt-125m-Int4WeightOnlyConfig-v2-0.14.0.dev and load it in vllm, then check the model is transformed to use Int4PreshuffledTensor before inference Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>

jerryzh168 · 2025-11-01T02:52:05Z

@houseroad all checks have passed now, please merge

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>

jerryzh168 force-pushed the support-int4-preshuffle branch from f02db41 to 34d63df Compare October 2, 2025 01:13

jerryzh168 marked this pull request as ready for review October 3, 2025 23:39

jerryzh168 requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 3, 2025 23:39

jerryzh168 marked this pull request as draft October 3, 2025 23:39

mergify Bot added the needs-rebase label Oct 8, 2025

jerryzh168 mentioned this pull request Oct 16, 2025

NotImplementedError: Int4PreshuffledTensor dispatch: attempting to run unimplemented operator/function: pytorch/ao#3144

Open

jerryzh168 force-pushed the support-int4-preshuffle branch from 34d63df to a6302bd Compare October 24, 2025 21:24

mergify Bot removed the needs-rebase label Oct 24, 2025

jerryzh168 force-pushed the support-int4-preshuffle branch from a6302bd to ed36abc Compare October 24, 2025 21:37

jerryzh168 marked this pull request as ready for review October 24, 2025 21:37

jerryzh168 requested a review from pavanimajety as a code owner October 24, 2025 21:37

jerryzh168 force-pushed the support-int4-preshuffle branch from ed36abc to bab418c Compare October 24, 2025 21:38

jerryzh168 force-pushed the support-int4-preshuffle branch 4 times, most recently from 27db8c4 to cfc8ea0 Compare October 27, 2025 17:49

houseroad approved these changes Oct 31, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 31, 2025

jerryzh168 force-pushed the support-int4-preshuffle branch from 7eb3fa3 to 2749270 Compare October 31, 2025 20:41

jerryzh168 force-pushed the support-int4-preshuffle branch from 2749270 to 43ffb17 Compare November 1, 2025 00:11

mgoin approved these changes Nov 4, 2025

View reviewed changes

mgoin merged commit 03c4c4a into vllm-project:main Nov 4, 2025
52 checks passed

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

Support using Int4PreshuffledTensor after loading (vllm-project#26066)

db38b73

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

Support using Int4PreshuffledTensor after loading (vllm-project#26066)

8c0cc0d

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>

jerryzh168 mentioned this pull request Dec 18, 2025

vLLM fails to load TorchAO Int4Opaque weights due to unsupported aten.slice on Int4OpaqueTensor pytorch/ao#3499

Open

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

Support using Int4PreshuffledTensor after loading (vllm-project#26066)

9149938

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

Support using Int4PreshuffledTensor after loading (vllm-project#26066)

93d3f85

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

Support using Int4PreshuffledTensor after loading (vllm-project#26066)

662582d

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>

0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request May 19, 2026

Support using Int4PreshuffledTensor after loading (vllm-project#26066)

4bc062f

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support using Int4PreshuffledTensor after loading#26066

Support using Int4PreshuffledTensor after loading#26066
mgoin merged 1 commit into
vllm-project:mainfrom
jerryzh168:support-int4-preshuffle

jerryzh168 commented Oct 2, 2025 •

edited by github-actions Bot

Loading

Uh oh!

mergify Bot commented Oct 8, 2025

Uh oh!

chatgpt-codex-connector Bot commented Oct 24, 2025

Uh oh!

houseroad commented Oct 31, 2025

Uh oh!

jerryzh168 commented Oct 31, 2025

Uh oh!

jerryzh168 commented Nov 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jerryzh168 commented Oct 2, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented Oct 8, 2025

Uh oh!

chatgpt-codex-connector Bot commented Oct 24, 2025

💡 Codex Review

Uh oh!

houseroad commented Oct 31, 2025

Uh oh!

jerryzh168 commented Oct 31, 2025

Uh oh!

jerryzh168 commented Nov 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jerryzh168 commented Oct 2, 2025 •

edited by github-actions Bot

Loading