Bug Report
Source: tiny-model-ground-truth parity checker (0/59 passing)
Severity: Critical — blocks ALL inference for LLaMA-style and Qwen architectures
Related: Follow-up to #231 and #232 (embedding fix applied, but lm_head has identical bug)
Description
The GH-231/232 fix correctly added embeddings to should_skip_quant() in add_f32_tensor_to_writer(), but lm_head.weight was not included. The lm_head exhibits the exact same corruption patterns the embeddings had before the fix:
- Int8: Shape mismatch (7M vs 28M elements) — quantized bytes stored as f32 without accounting for packing ratio
- Int4: All-zero data (100% density failure) — data written at wrong offset or not written
Affected Models
| Model |
Quant |
Tensor |
Expected Elements |
Got Elements |
Error |
| SmolLM-135M |
Int8 |
lm_head.weight |
28,311,552 |
7,077,889 |
F-LAYOUT-CONTRACT-001 shape mismatch |
| SmolLM-135M |
Int4 |
lm_head.weight |
28,311,552 |
28,311,552 |
F-DATA-QUALITY-001 100% zeros |
| Qwen2-0.5B |
Int8 |
lm_head.weight |
136,134,656 |
34,033,665 |
F-LAYOUT-CONTRACT-001 shape mismatch |
| Qwen2-0.5B |
Int4 |
lm_head.weight |
136,134,656 |
136,134,656 |
F-DATA-QUALITY-001 100% zeros |
Error Output (SmolLM Int8)
[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[49152, 576]
[APR-LOAD] Token 0 embedding sample: [-0.3789, -0.2188, 0.0276, -0.2617, -0.2314] ← FIXED, good data
[APR-LOAD] Embedding loaded: 28311552 elements ← FIXED, correct count
[APR-LOAD] LM head tensor 'lm_head.weight': dims=[49152, 576], dtype=9
[APR-LOAD] LM head loaded: 7077889 elements ← BROKEN, same 4:1 ratio bug
error: F-LAYOUT-CONTRACT-001 Tensor 'lm_head_weight': Shape mismatch: got 7077889, expected 28311552
Fix
Add lm_head to should_skip_quant() in src/format/converter/write.rs. The lm_head is a tied weight that mirrors the embedding — it should never be quantized.
Pattern to match: lm_head, lm_head.weight, output.weight
Reproduction
cd tiny-model-ground-truth
make clean && make convert
apr run models/smollm-135m-int8.apr -p "Hello" -n 32 --json
# → F-LAYOUT-CONTRACT-001 on lm_head_weight
Environment
Bug Report
Source:
tiny-model-ground-truthparity checker (0/59 passing)Severity: Critical — blocks ALL inference for LLaMA-style and Qwen architectures
Related: Follow-up to #231 and #232 (embedding fix applied, but lm_head has identical bug)
Description
The GH-231/232 fix correctly added embeddings to
should_skip_quant()inadd_f32_tensor_to_writer(), butlm_head.weightwas not included. The lm_head exhibits the exact same corruption patterns the embeddings had before the fix:Affected Models
Error Output (SmolLM Int8)
Fix
Add
lm_headtoshould_skip_quant()insrc/format/converter/write.rs. The lm_head is a tied weight that mirrors the embedding — it should never be quantized.Pattern to match:
lm_head,lm_head.weight,output.weightReproduction
Environment
aprv0.2.16 (f39b7df) + Int8 quantization corrupts embedding tensors (NaN/Inf + shape mismatch) #231/232 embedding fix applied