Skip to content

lm_head.weight not excluded from quantization (same bug as embeddings) #234

@noahgift

Description

@noahgift

Bug Report

Source: tiny-model-ground-truth parity checker (0/59 passing)
Severity: Critical — blocks ALL inference for LLaMA-style and Qwen architectures
Related: Follow-up to #231 and #232 (embedding fix applied, but lm_head has identical bug)

Description

The GH-231/232 fix correctly added embeddings to should_skip_quant() in add_f32_tensor_to_writer(), but lm_head.weight was not included. The lm_head exhibits the exact same corruption patterns the embeddings had before the fix:

  • Int8: Shape mismatch (7M vs 28M elements) — quantized bytes stored as f32 without accounting for packing ratio
  • Int4: All-zero data (100% density failure) — data written at wrong offset or not written

Affected Models

Model Quant Tensor Expected Elements Got Elements Error
SmolLM-135M Int8 lm_head.weight 28,311,552 7,077,889 F-LAYOUT-CONTRACT-001 shape mismatch
SmolLM-135M Int4 lm_head.weight 28,311,552 28,311,552 F-DATA-QUALITY-001 100% zeros
Qwen2-0.5B Int8 lm_head.weight 136,134,656 34,033,665 F-LAYOUT-CONTRACT-001 shape mismatch
Qwen2-0.5B Int4 lm_head.weight 136,134,656 136,134,656 F-DATA-QUALITY-001 100% zeros

Error Output (SmolLM Int8)

[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[49152, 576]
[APR-LOAD] Token 0 embedding sample: [-0.3789, -0.2188, 0.0276, -0.2617, -0.2314]  ← FIXED, good data
[APR-LOAD] Embedding loaded: 28311552 elements  ← FIXED, correct count

[APR-LOAD] LM head tensor 'lm_head.weight': dims=[49152, 576], dtype=9
[APR-LOAD] LM head loaded: 7077889 elements  ← BROKEN, same 4:1 ratio bug
error: F-LAYOUT-CONTRACT-001 Tensor 'lm_head_weight': Shape mismatch: got 7077889, expected 28311552

Fix

Add lm_head to should_skip_quant() in src/format/converter/write.rs. The lm_head is a tied weight that mirrors the embedding — it should never be quantized.

Pattern to match: lm_head, lm_head.weight, output.weight

Reproduction

cd tiny-model-ground-truth
make clean && make convert
apr run models/smollm-135m-int8.apr -p "Hello" -n 32 --json
# → F-LAYOUT-CONTRACT-001 on lm_head_weight

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions