Bug Report
Source: tiny-model-ground-truth parity checker (0/59 passing)
Severity: Critical — blocks ALL Int8 inference for LLaMA-style and Qwen architectures
Description
apr import --quantize int8 produces APR files where the embedding tensor (model.embed_tokens.weight) contains NaN/Inf values and has the wrong element count. Inference fails with F-LAYOUT-CONTRACT-001.
Affected Models
| Model |
Architecture |
Vocab |
Hidden |
Expected Elements |
Got Elements |
| SmolLM-135M |
LLaMA |
49152 |
576 |
28,311,552 |
7,077,889 |
| Qwen2-0.5B |
Qwen/GQA |
151936 |
896 |
136,134,656 |
34,033,665 |
Error Output
[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[49152, 576], expected [vocab=49152, hidden=576]
[APR-LOAD] Embedding dims=[49152, 576], using raw data (no transpose needed)
[APR-LOAD] ERROR: Token 0 embedding contains NaN/Inf - data corruption!
[APR-LOAD] Token 0 embedding sample: [0.0314, -10544936662718054851787026824429568.0000, -10707835954547578756780727681941504.0000, 0.0000, 0.0000]
[APR-LOAD] Embedding loaded: 7077889 elements (vocab=49152 x hidden=576)
error: Inference failed: Format error: [F-LAYOUT-CONTRACT-001] Tensor 'token_embedding': Shape mismatch: got 7077889 elements, expected 28311552 (49152x576)
Realizar panics at realizar/src/apr_transformer/mod.rs:2079:
range end index 576 out of range for slice of length 240
Root Cause Hypothesis
Int8 quantization is not correctly handling the embedding tensor during apr import. The element count (7,077,889) is approximately 1/4 of expected (28,311,552), suggesting the quantizer is storing quantized bytes as if they were f32 elements without accounting for the 4:1 packing ratio. The NaN/Inf values suggest reinterpretation of quantized int8 bytes as IEEE 754 floats.
Reproduction
cd tiny-model-ground-truth
apr pull hf://HuggingFaceTB/SmolLM-135M
apr import hf://HuggingFaceTB/SmolLM-135M --quantize int8 -o models/smollm-135m-int8.apr
apr run models/smollm-135m-int8.apr -p "Hello" -n 32 --json
# → F-LAYOUT-CONTRACT-001 shape mismatch
Environment
apr v0.2.16 (f39b7df)
- Oracle: transformers 5.1.0, torch 2.10.0, float32, CPU, greedy
- Platform: Linux x86_64
Contract Reference
contracts/tensor-layout-v1.yaml rule F-LAYOUT-CONTRACT-001
contracts/tensor-layout-v1.yaml rule F-DATA-QUALITY-001
Bug Report
Source:
tiny-model-ground-truthparity checker (0/59 passing)Severity: Critical — blocks ALL Int8 inference for LLaMA-style and Qwen architectures
Description
apr import --quantize int8produces APR files where the embedding tensor (model.embed_tokens.weight) contains NaN/Inf values and has the wrong element count. Inference fails withF-LAYOUT-CONTRACT-001.Affected Models
Error Output
Realizar panics at
realizar/src/apr_transformer/mod.rs:2079:Root Cause Hypothesis
Int8 quantization is not correctly handling the embedding tensor during
apr import. The element count (7,077,889) is approximately 1/4 of expected (28,311,552), suggesting the quantizer is storing quantized bytes as if they were f32 elements without accounting for the 4:1 packing ratio. The NaN/Inf values suggest reinterpretation of quantized int8 bytes as IEEE 754 floats.Reproduction
Environment
aprv0.2.16 (f39b7df)Contract Reference
contracts/tensor-layout-v1.yamlrule F-LAYOUT-CONTRACT-001contracts/tensor-layout-v1.yamlrule F-DATA-QUALITY-001