Int8/Int4 quantization corrupts ALL tensors, not just embeddings/lm_head

## Bug Report

**Source**: `tiny-model-ground-truth` parity checker (0/59 passing)
**Severity**: Critical — quantization is fundamentally broken for all tensors
**Related**: GH-231, GH-232, GH-234 fixes unmasked this — same corruption pattern propagates through every layer

## Description

After applying GH-231/232 (embedding skip-quant) and GH-234 (lm_head skip-quant), the exact same corruption now appears in `layers.0.qkv_weight` — the first attention QKV weight. This proves the quantization bug is **not specific to embeddings or lm_head**, but affects ALL tensors:

- **Int8**: Element count is ~4:1 too small (quantized bytes stored as f32 without packing ratio)
- **Int4**: Element count is correct but data is 100% zeros (wrong offset or not written)

The `should_skip_quant()` approach of exempting individual tensors is whack-a-mole. The quantization pipeline itself is broken.

## Error Output (SmolLM-135M Int8)

```
[APR-LOAD] Embedding loaded: 28311552 elements  ← FIXED (skipped quant)
[APR-LOAD] LM head loaded: 28311552 elements    ← FIXED (skipped quant)

error: [F-LAYOUT-CONTRACT-001] Tensor 'layers.0.qkv_weight': Shape mismatch:
  got 138243 elements, expected 552960 (960x576)
```

138,243 ≈ 552,960 / 4 — same 4:1 ratio as the original embedding bug.

## Error Output (SmolLM-135M Int4)

```
error: [F-DATA-QUALITY-001] Tensor 'layers.0.qkv_weight': DENSITY FAILURE:
  100.0% zeros (max 80%)
```

Same all-zeros pattern as the original Int4 embedding bug.

## Affected: ALL 3 Models, ALL Quantized Tensors

| Model | Int8 Error | Int4 Error |
|-------|-----------|-----------|
| SmolLM-135M | qkv_weight: 138K vs 553K expected | qkv_weight: 100% zeros |
| Qwen2-0.5B | qkv_weight: shape mismatch | qkv_weight: 100% zeros |
| GPT-2 124M | qkv_weight: shape mismatch | qkv_weight: 100% zeros |

## Root Cause

The quantization pipeline in `converter/write.rs` and `converter/mod.rs` has a fundamental data serialization bug:

**Int8**: When writing quantized int8 data, the writer stores raw bytes but records the tensor shape as if they were f32 elements. Since each f32 is 4 bytes and each int8 is 1 byte, the actual element count is 1/4 of expected.

**Int4**: The writer computes the correct element count (accounting for int4 packing), but writes the data at the wrong file offset — leaving the tensor region as zeros.

The `should_skip_quant()` approach only works as a workaround for embeddings/lm_head. The fix must be in the quantization serialization logic itself.

## Reproduction

```bash
cd tiny-model-ground-truth
make clean && make convert
apr run models/smollm-135m-int8.apr -p "Hello" -n 32 --json
# Embedding and lm_head load fine, crashes on layers.0.qkv_weight
```

## Environment

- `apr` v0.2.16 with GH-231/232/233/234/235/236 fixes
- Platform: Linux x86_64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int8/Int4 quantization corrupts ALL tensors, not just embeddings/lm_head #237

Bug Report

Description

Error Output (SmolLM-135M Int8)

Error Output (SmolLM-135M Int4)

Affected: ALL 3 Models, ALL Quantized Tensors

Root Cause

Reproduction

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Int8 Error	Int4 Error
SmolLM-135M	qkv_weight: 138K vs 553K expected	qkv_weight: 100% zeros
Qwen2-0.5B	qkv_weight: shape mismatch	qkv_weight: 100% zeros
GPT-2 124M	qkv_weight: shape mismatch	qkv_weight: 100% zeros

Int8/Int4 quantization corrupts ALL tensors, not just embeddings/lm_head #237

Description

Bug Report

Description

Error Output (SmolLM-135M Int8)

Error Output (SmolLM-135M Int4)

Affected: ALL 3 Models, ALL Quantized Tensors

Root Cause

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions