better error message in load_state_dict when there are inconsistent tensor sizes#2151
Merged
soumith merged 1 commit intopytorch:masterfrom Jul 19, 2017
greaber:master
Merged
better error message in load_state_dict when there are inconsistent tensor sizes#2151soumith merged 1 commit intopytorch:masterfrom greaber:master
soumith merged 1 commit intopytorch:masterfrom
greaber:master
Conversation
soumith
approved these changes
Jul 19, 2017
Collaborator
|
thanks a lot @greaber . Better messages for all :) |
xwang233
pushed a commit
to xwang233/pytorch
that referenced
this pull request
Nov 9, 2022
jagadish-amd
pushed a commit
to jagadish-amd/pytorch
that referenced
this pull request
May 15, 2025
http://rocm-ci.amd.com/blue/organizations/jenkins/rocm-pytorch-manylinux-wheel-builder/detail/rocm-pytorch-manylinux-wheel-builder/2009/pipeline/131/ `/pytorch/.github/scripts/amd/package_triton_wheel.sh: line 54: syntax error in conditional expression` Validation: http://rocm-ci.amd.com/job/mainline-pytorch2.6-manylinux-wheels/79/
mergennachin
added a commit
that referenced
this pull request
Mar 4, 2026
…iton kernels User-defined Triton kernels (via @triton.jit or @triton_op) that take bool tensor arguments produce incorrect results when compiled through AOTI. The root cause is that Triton's mangle_type maps torch.bool tensors to *i1/*u1 (1-bit pointer), but PyTorch stores bool tensors as uint8 (1 byte per element). The compiled cubin kernel generates bit-packed loads for *i1/*u1 pointers, reading garbled data from the byte-addressed memory. Inductor-generated kernels already work around this (Triton issue #2151) by adding .to(tl.int1) after loads and converting to int8 for stores. But user-defined kernels don't get these workarounds since their code is user-written. Fix: override *i1/*u1 -> *u8 in the mangle_type signature for user-defined kernels. This makes the compiled kernel use byte-addressed loads matching PyTorch's bool memory layout.
mergennachin
added a commit
that referenced
this pull request
Mar 4, 2026
…iton kernels User-defined Triton kernels (via @triton.jit or @triton_op) that take bool tensor arguments produce incorrect results when compiled through AOTI. The root cause is that Triton's mangle_type maps torch.bool tensors to *i1/*u1 (1-bit pointer), but PyTorch stores bool tensors as uint8 (1 byte per element). The compiled cubin kernel generates bit-packed loads for *i1/*u1 pointers, reading garbled data from the byte-addressed memory. Inductor-generated kernels already work around this (Triton issue #2151) by adding .to(tl.int1) after loads and converting to int8 for stores. But user-defined kernels don't get these workarounds since their code is user-written. Fix: override *i1/*u1 -> *u8 in the mangle_type signature for user-defined kernels. This makes the compiled kernel use byte-addressed loads matching PyTorch's bool memory layout.
mergennachin
added a commit
that referenced
this pull request
Mar 4, 2026
…iton kernels User-defined Triton kernels (via @triton.jit or @triton_op) that take bool tensor arguments produce incorrect results when compiled through AOTI. The root cause is that Triton's mangle_type maps torch.bool tensors to *i1/*u1 (1-bit pointer), but PyTorch stores bool tensors as uint8 (1 byte per element). The compiled cubin kernel generates bit-packed loads for *i1/*u1 pointers, reading garbled data from the byte-addressed memory. Inductor-generated kernels already work around this (Triton issue #2151) by adding .to(tl.int1) after loads and converting to int8 for stores. But user-defined kernels don't get these workarounds since their code is user-written. Fix: override *i1/*u1 -> *u8 in the mangle_type signature for user-defined kernels. This makes the compiled kernel use byte-addressed loads matching PyTorch's bool memory layout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I have often got the message "inconsistent tensor sizes" from
load_state_dict, usually just because I am trying to load a checkpoint with a different version of the code or a different configuration. This patch makes it easier to locate the problem.