Skip to content

better error message in load_state_dict when there are inconsistent tensor sizes#2151

Merged
soumith merged 1 commit intopytorch:masterfrom
greaber:master
Jul 19, 2017
Merged

better error message in load_state_dict when there are inconsistent tensor sizes#2151
soumith merged 1 commit intopytorch:masterfrom
greaber:master

Conversation

@greaber
Copy link
Copy Markdown
Contributor

@greaber greaber commented Jul 19, 2017

I have often got the message "inconsistent tensor sizes" from load_state_dict, usually just because I am trying to load a checkpoint with a different version of the code or a different configuration. This patch makes it easier to locate the problem.

@soumith soumith merged commit 95ccbf8 into pytorch:master Jul 19, 2017
@soumith
Copy link
Copy Markdown
Collaborator

soumith commented Jul 19, 2017

thanks a lot @greaber . Better messages for all :)

xwang233 pushed a commit to xwang233/pytorch that referenced this pull request Nov 9, 2022
mergennachin added a commit that referenced this pull request Mar 4, 2026
…iton kernels

User-defined Triton kernels (via @triton.jit or @triton_op) that take
bool tensor arguments produce incorrect results when compiled through
AOTI. The root cause is that Triton's mangle_type maps torch.bool
tensors to *i1/*u1 (1-bit pointer), but PyTorch stores bool tensors as
uint8 (1 byte per element). The compiled cubin kernel generates
bit-packed loads for *i1/*u1 pointers, reading garbled data from the
byte-addressed memory.

Inductor-generated kernels already work around this (Triton issue #2151)
by adding .to(tl.int1) after loads and converting to int8 for stores.
But user-defined kernels don't get these workarounds since their code is
user-written.

Fix: override *i1/*u1 -> *u8 in the mangle_type signature for
user-defined kernels. This makes the compiled kernel use byte-addressed
loads matching PyTorch's bool memory layout.
mergennachin added a commit that referenced this pull request Mar 4, 2026
…iton kernels

User-defined Triton kernels (via @triton.jit or @triton_op) that take
bool tensor arguments produce incorrect results when compiled through
AOTI. The root cause is that Triton's mangle_type maps torch.bool
tensors to *i1/*u1 (1-bit pointer), but PyTorch stores bool tensors as
uint8 (1 byte per element). The compiled cubin kernel generates
bit-packed loads for *i1/*u1 pointers, reading garbled data from the
byte-addressed memory.

Inductor-generated kernels already work around this (Triton issue #2151)
by adding .to(tl.int1) after loads and converting to int8 for stores.
But user-defined kernels don't get these workarounds since their code is
user-written.

Fix: override *i1/*u1 -> *u8 in the mangle_type signature for
user-defined kernels. This makes the compiled kernel use byte-addressed
loads matching PyTorch's bool memory layout.
mergennachin added a commit that referenced this pull request Mar 4, 2026
…iton kernels

User-defined Triton kernels (via @triton.jit or @triton_op) that take
bool tensor arguments produce incorrect results when compiled through
AOTI. The root cause is that Triton's mangle_type maps torch.bool
tensors to *i1/*u1 (1-bit pointer), but PyTorch stores bool tensors as
uint8 (1 byte per element). The compiled cubin kernel generates
bit-packed loads for *i1/*u1 pointers, reading garbled data from the
byte-addressed memory.

Inductor-generated kernels already work around this (Triton issue #2151)
by adding .to(tl.int1) after loads and converting to int8 for stores.
But user-defined kernels don't get these workarounds since their code is
user-written.

Fix: override *i1/*u1 -> *u8 in the mangle_type signature for
user-defined kernels. This makes the compiled kernel use byte-addressed
loads matching PyTorch's bool memory layout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants