Int8 quantization corrupts embedding tensors (NaN/Inf + shape mismatch)

## Bug Report

**Source**: `tiny-model-ground-truth` parity checker (0/59 passing)
**Severity**: Critical — blocks ALL Int8 inference for LLaMA-style and Qwen architectures

## Description

`apr import --quantize int8` produces APR files where the embedding tensor (`model.embed_tokens.weight`) contains NaN/Inf values and has the wrong element count. Inference fails with `F-LAYOUT-CONTRACT-001`.

## Affected Models

| Model | Architecture | Vocab | Hidden | Expected Elements | Got Elements |
|-------|-------------|-------|--------|-------------------|-------------|
| SmolLM-135M | LLaMA | 49152 | 576 | 28,311,552 | 7,077,889 |
| Qwen2-0.5B | Qwen/GQA | 151936 | 896 | 136,134,656 | 34,033,665 |

## Error Output

```
[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[49152, 576], expected [vocab=49152, hidden=576]
[APR-LOAD] Embedding dims=[49152, 576], using raw data (no transpose needed)
[APR-LOAD] ERROR: Token 0 embedding contains NaN/Inf - data corruption!
[APR-LOAD] Token 0 embedding sample: [0.0314, -10544936662718054851787026824429568.0000, -10707835954547578756780727681941504.0000, 0.0000, 0.0000]
[APR-LOAD] Embedding loaded: 7077889 elements (vocab=49152 x hidden=576)

error: Inference failed: Format error: [F-LAYOUT-CONTRACT-001] Tensor 'token_embedding': Shape mismatch: got 7077889 elements, expected 28311552 (49152x576)
```

Realizar panics at `realizar/src/apr_transformer/mod.rs:2079`:
```
range end index 576 out of range for slice of length 240
```

## Root Cause Hypothesis

Int8 quantization is not correctly handling the embedding tensor during `apr import`. The element count (7,077,889) is approximately 1/4 of expected (28,311,552), suggesting the quantizer is storing quantized bytes as if they were f32 elements without accounting for the 4:1 packing ratio. The NaN/Inf values suggest reinterpretation of quantized int8 bytes as IEEE 754 floats.

## Reproduction

```bash
cd tiny-model-ground-truth
apr pull hf://HuggingFaceTB/SmolLM-135M
apr import hf://HuggingFaceTB/SmolLM-135M --quantize int8 -o models/smollm-135m-int8.apr
apr run models/smollm-135m-int8.apr -p "Hello" -n 32 --json
# → F-LAYOUT-CONTRACT-001 shape mismatch
```

## Environment

- `apr` v0.2.16 (f39b7dfd)
- Oracle: transformers 5.1.0, torch 2.10.0, float32, CPU, greedy
- Platform: Linux x86_64

## Contract Reference

- `contracts/tensor-layout-v1.yaml` rule F-LAYOUT-CONTRACT-001
- `contracts/tensor-layout-v1.yaml` rule F-DATA-QUALITY-001

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int8 quantization corrupts embedding tensors (NaN/Inf + shape mismatch) #231

Bug Report

Description

Affected Models

Error Output

Root Cause Hypothesis

Reproduction

Environment

Contract Reference

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Architecture	Vocab	Hidden	Expected Elements	Got Elements
SmolLM-135M	LLaMA	49152	576	28,311,552	7,077,889
Qwen2-0.5B	Qwen/GQA	151936	896	136,134,656	34,033,665

Int8 quantization corrupts embedding tensors (NaN/Inf + shape mismatch) #231

Description

Bug Report

Description

Affected Models

Error Output

Root Cause Hypothesis

Reproduction

Environment

Contract Reference

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions