Skip to content

Int8 quantization corrupts embedding tensors (NaN/Inf + shape mismatch) #231

@noahgift

Description

@noahgift

Bug Report

Source: tiny-model-ground-truth parity checker (0/59 passing)
Severity: Critical — blocks ALL Int8 inference for LLaMA-style and Qwen architectures

Description

apr import --quantize int8 produces APR files where the embedding tensor (model.embed_tokens.weight) contains NaN/Inf values and has the wrong element count. Inference fails with F-LAYOUT-CONTRACT-001.

Affected Models

Model Architecture Vocab Hidden Expected Elements Got Elements
SmolLM-135M LLaMA 49152 576 28,311,552 7,077,889
Qwen2-0.5B Qwen/GQA 151936 896 136,134,656 34,033,665

Error Output

[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[49152, 576], expected [vocab=49152, hidden=576]
[APR-LOAD] Embedding dims=[49152, 576], using raw data (no transpose needed)
[APR-LOAD] ERROR: Token 0 embedding contains NaN/Inf - data corruption!
[APR-LOAD] Token 0 embedding sample: [0.0314, -10544936662718054851787026824429568.0000, -10707835954547578756780727681941504.0000, 0.0000, 0.0000]
[APR-LOAD] Embedding loaded: 7077889 elements (vocab=49152 x hidden=576)

error: Inference failed: Format error: [F-LAYOUT-CONTRACT-001] Tensor 'token_embedding': Shape mismatch: got 7077889 elements, expected 28311552 (49152x576)

Realizar panics at realizar/src/apr_transformer/mod.rs:2079:

range end index 576 out of range for slice of length 240

Root Cause Hypothesis

Int8 quantization is not correctly handling the embedding tensor during apr import. The element count (7,077,889) is approximately 1/4 of expected (28,311,552), suggesting the quantizer is storing quantized bytes as if they were f32 elements without accounting for the 4:1 packing ratio. The NaN/Inf values suggest reinterpretation of quantized int8 bytes as IEEE 754 floats.

Reproduction

cd tiny-model-ground-truth
apr pull hf://HuggingFaceTB/SmolLM-135M
apr import hf://HuggingFaceTB/SmolLM-135M --quantize int8 -o models/smollm-135m-int8.apr
apr run models/smollm-135m-int8.apr -p "Hello" -n 32 --json
# → F-LAYOUT-CONTRACT-001 shape mismatch

Environment

  • apr v0.2.16 (f39b7df)
  • Oracle: transformers 5.1.0, torch 2.10.0, float32, CPU, greedy
  • Platform: Linux x86_64

Contract Reference

  • contracts/tensor-layout-v1.yaml rule F-LAYOUT-CONTRACT-001
  • contracts/tensor-layout-v1.yaml rule F-DATA-QUALITY-001

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions