Bug Report
Source: tiny-model-ground-truth parity checker (0/59 passing)
Severity: Critical — quantization is fundamentally broken for all tensors
Related: GH-231, GH-232, GH-234 fixes unmasked this — same corruption pattern propagates through every layer
Description
After applying GH-231/232 (embedding skip-quant) and GH-234 (lm_head skip-quant), the exact same corruption now appears in layers.0.qkv_weight — the first attention QKV weight. This proves the quantization bug is not specific to embeddings or lm_head, but affects ALL tensors:
- Int8: Element count is ~4:1 too small (quantized bytes stored as f32 without packing ratio)
- Int4: Element count is correct but data is 100% zeros (wrong offset or not written)
The should_skip_quant() approach of exempting individual tensors is whack-a-mole. The quantization pipeline itself is broken.
Error Output (SmolLM-135M Int8)
[APR-LOAD] Embedding loaded: 28311552 elements ← FIXED (skipped quant)
[APR-LOAD] LM head loaded: 28311552 elements ← FIXED (skipped quant)
error: [F-LAYOUT-CONTRACT-001] Tensor 'layers.0.qkv_weight': Shape mismatch:
got 138243 elements, expected 552960 (960x576)
138,243 ≈ 552,960 / 4 — same 4:1 ratio as the original embedding bug.
Error Output (SmolLM-135M Int4)
error: [F-DATA-QUALITY-001] Tensor 'layers.0.qkv_weight': DENSITY FAILURE:
100.0% zeros (max 80%)
Same all-zeros pattern as the original Int4 embedding bug.
Affected: ALL 3 Models, ALL Quantized Tensors
| Model |
Int8 Error |
Int4 Error |
| SmolLM-135M |
qkv_weight: 138K vs 553K expected |
qkv_weight: 100% zeros |
| Qwen2-0.5B |
qkv_weight: shape mismatch |
qkv_weight: 100% zeros |
| GPT-2 124M |
qkv_weight: shape mismatch |
qkv_weight: 100% zeros |
Root Cause
The quantization pipeline in converter/write.rs and converter/mod.rs has a fundamental data serialization bug:
Int8: When writing quantized int8 data, the writer stores raw bytes but records the tensor shape as if they were f32 elements. Since each f32 is 4 bytes and each int8 is 1 byte, the actual element count is 1/4 of expected.
Int4: The writer computes the correct element count (accounting for int4 packing), but writes the data at the wrong file offset — leaving the tensor region as zeros.
The should_skip_quant() approach only works as a workaround for embeddings/lm_head. The fix must be in the quantization serialization logic itself.
Reproduction
cd tiny-model-ground-truth
make clean && make convert
apr run models/smollm-135m-int8.apr -p "Hello" -n 32 --json
# Embedding and lm_head load fine, crashes on layers.0.qkv_weight
Environment
Bug Report
Source:
tiny-model-ground-truthparity checker (0/59 passing)Severity: Critical — quantization is fundamentally broken for all tensors
Related: GH-231, GH-232, GH-234 fixes unmasked this — same corruption pattern propagates through every layer
Description
After applying GH-231/232 (embedding skip-quant) and GH-234 (lm_head skip-quant), the exact same corruption now appears in
layers.0.qkv_weight— the first attention QKV weight. This proves the quantization bug is not specific to embeddings or lm_head, but affects ALL tensors:The
should_skip_quant()approach of exempting individual tensors is whack-a-mole. The quantization pipeline itself is broken.Error Output (SmolLM-135M Int8)
138,243 ≈ 552,960 / 4 — same 4:1 ratio as the original embedding bug.
Error Output (SmolLM-135M Int4)
Same all-zeros pattern as the original Int4 embedding bug.
Affected: ALL 3 Models, ALL Quantized Tensors
Root Cause
The quantization pipeline in
converter/write.rsandconverter/mod.rshas a fundamental data serialization bug:Int8: When writing quantized int8 data, the writer stores raw bytes but records the tensor shape as if they were f32 elements. Since each f32 is 4 bytes and each int8 is 1 byte, the actual element count is 1/4 of expected.
Int4: The writer computes the correct element count (accounting for int4 packing), but writes the data at the wrong file offset — leaving the tensor region as zeros.
The
should_skip_quant()approach only works as a workaround for embeddings/lm_head. The fix must be in the quantization serialization logic itself.Reproduction
Environment
aprv0.2.16 with Int8 quantization corrupts embedding tensors (NaN/Inf + shape mismatch) #231/232/233/234/235/236 fixes