Skip to content

P0 CRITICAL: Format conversion introduces NaN/Inf corruption in tensor weights #177

@noahgift

Description

@noahgift

P0 CRITICAL: Format Conversion Introduces NaN/Inf Corruption

Status: OPEN
Severity: P0 (CRITICAL - Data Corruption)
Component: apr-rosetta / realizear
Discovered By: apr-model-qa-playbook (Popperian Falsification)
Date: 2026-01-30
Blocking: Model qualification certification


Executive Summary

Format conversion via apr rosetta convert introduces catastrophic numerical corruption including NaN values, Inf values, and tensor weight explosions (means exceeding 10^38). This affects ALL conversion paths and renders converted models unusable. This is a data integrity violation that blocks model certification.


Reproduction

Environment

Host: noah-Lambda-Vector
OS: Linux
apr version: (current main)
Model: Qwen/Qwen2.5-Coder-1.5B-Instruct
Source: /home/noah/.cache/pacha/models/c8490f8cd005ac4e.gguf (1.04 GB GGUF Q4_K_M)

Minimal Reproduction Commands

# 1. Direct conversion GGUF → APR (FAILS)
apr rosetta convert model.gguf model.apr
apr run model.apr "What is 2+2?" -n 32
# Result: Output differs from source by 84.6% (diff: 8.46e-1, ε: 1.00e-6)

# 2. Direct conversion APR → GGUF (FAILS)
apr rosetta convert model.apr model_back.gguf
apr run model_back.gguf "What is 2+2?" -n 32
# Result: Output differs from source by 63.4%

# 3. Round-trip GGUF → APR → SafeTensors → GGUF (CATASTROPHIC FAILURE)
apr rosetta convert model.gguf model.apr
apr rosetta convert model.apr model.safetensors
apr rosetta convert model.safetensors model_final.gguf
# Result: Validation fails with 75 errors - NaN/Inf values in ALL layers

Expected Behavior

  • Converted model produces identical output to source (within ε = 1e-6)
  • Round-trip conversion preserves bitwise identity for quantized tensors
  • No NaN, Inf, or numerical overflow introduced

Actual Behavior

  • Conversion introduces 84.6% output difference (GGUF → APR)
  • Round-trip introduces massive NaN corruption in tensor weights
  • Tensor means explode to 10^38 (expected range: [-0.1, 0.1])

Detailed Evidence

Test 1: F-CONV-G-A (GGUF → APR)

{
  "gate_id": "F-CONV-G-A",
  "outcome": "Falsified",
  "reason": "Conversion Gguf → Apr produced different output (diff: 8.46e-1, ε: 1.00e-6)",
  "output_hash": "951de74e85c8f75d",
  "timestamp": "2026-01-30T13:03:35.131895019Z"
}

Test 2: F-CONV-A-G (APR → GGUF)

{
  "gate_id": "F-CONV-A-G",
  "outcome": "Falsified",
  "reason": "Conversion Apr → Gguf produced different output (diff: 6.34e-1, ε: 1.00e-6)",
  "output_hash": "95f05bc3d19b1b6b",
  "timestamp": "2026-01-30T13:03:47.643921413Z"
}

Test 3: F-CONV-G-S (GGUF → SafeTensors)

{
  "gate_id": "F-CONV-G-S",
  "outcome": "Falsified",
  "reason": "Conversion infrastructure error: No tokenizer found for converted.safetensors",
  "error": "[PMAT-172] ERROR: No tokenizer found... config.json not found (required for SafeTensors inference)",
  "timestamp": "2026-01-30T13:03:59.608739501Z"
}

Test 4: F-CONV-RT-001 (Round-Trip) - CATASTROPHIC

Round-trip failed: Validation failed (75 errors)

SAMPLE OF 75 TENSOR CORRUPTION ERRORS:

Layer blk.0.attn_k.weight:
  - mean = 342663942034581145229192610889859072.0000 (expected: [-0.1, 0.1])
  - contains 322 NaN values

Layer blk.0.attn_output.weight:
  - mean = 600104090742549741985281036230066176.0000
  - contains 1880 NaN values

Layer blk.0.attn_q.weight:
  - mean = 358825298765053356133211673599148032.0000
  - contains 2194 NaN values

Layer blk.0.ffn_gate.weight:
  - mean = 248127115888904366664980114610061312.0000
  - contains 12864 NaN values

Layer blk.1.ffn_down.weight:
  - mean = 165087563774647571714992342227222528.0000
  - contains 7387 NaN values
  - contains 1 Inf values  <-- INFINITY INTRODUCED

[... 65 more tensor corruption errors across ALL 28 layers ...]

Five Whys Root Cause Analysis

Why # Question Answer
Why 1 Why do converted models produce different output? Because tensor weights are corrupted during conversion
Why 2 Why are tensor weights corrupted? Because NaN and Inf values are introduced in dequantization/requantization
Why 3 Why are NaN/Inf values introduced? Likely integer overflow or division by zero in quantization scaling factors
Why 4 Why does scaling overflow? Q4_K_M uses block-wise scaling; conversion may not preserve scale bounds
Why 5 Why aren't scale bounds preserved? ROOT CAUSE: Quantization metadata (scales, mins, block structure) not correctly transferred between formats

Hypothesis

The GGUF Q4_K_M format stores quantization parameters (scales, minimums) in a specific block structure. When converting to APR format, these parameters are either:

  1. Lost entirely (causing dequantization to fail)
  2. Misinterpreted (causing incorrect scaling)
  3. Truncated (causing overflow on large values)

Impact Assessment

Severity: P0 CRITICAL

Impact Description
Data Integrity Converted models produce corrupted output
Silent Corruption Users may not realize output is wrong without validation
Certification Blocked Models cannot pass MQS qualification (89.3% → should be 100%)
Trust Violation "Zero defect" philosophy violated - passing defects downstream

Affected Gates (All P0)

  • F-CONV-001: GGUF → APR ❌
  • F-CONV-002: APR → GGUF ❌
  • F-CONV-003: GGUF → SafeTensors ❌
  • F-CONV-004: SafeTensors → GGUF ❌
  • F-CONV-005: APR → SafeTensors ❌
  • F-CONV-006: SafeTensors → APR ❌
  • F-CONV-RT-001: Round-trip ❌
  • F-CONV-BE-001: Backend equivalence ❌

MQS Impact

Current Score:  41.1/100 (Grade F) - NOT QUALIFIED
Expected Score: 85+/100 (Grade B) - QUALIFIED
Lost Points:    ~44 points from conversion failures

Suggested Fix

Immediate (P0 Hotfix)

  1. Add tensor validation after every conversion step:

    fn validate_tensor_post_conversion(tensor: &Tensor) -> Result<()> {
        if tensor.has_nan() {
            return Err(ConversionError::NaNIntroduced);
        }
        if tensor.has_inf() {
            return Err(ConversionError::InfIntroduced);
        }
        if tensor.mean().abs() > 100.0 {
            return Err(ConversionError::NumericalExplosion);
        }
        Ok(())
    }
  2. Fail fast on corruption - do not write corrupted files

Short-term

  1. Audit quantization metadata transfer in realizear/src/convert/
  2. Add property-based tests for round-trip conversion
  3. Test all quantization types (Q4_K_M, Q5_K_M, Q8_0, F16, F32)

Long-term

  1. Implement checksum verification pre/post conversion
  2. Add --verify flag that runs inference comparison automatically
  3. Consider using GGUF as canonical intermediate format

Verification

Once fixed, verify with:

# Run apr-qa-playbook verification
cd ../apr-model-qa-playbook
cargo run -p apr-qa-cli -- run playbooks/verify/TICKET-177.yaml --subprocess

# Expected output:
# F-CONV-001 through F-CONV-006: CORROBORATED
# F-CONV-RT-001: CORROBORATED
# MQS Score: 85+/100

References

  • Falsification Gate: F-CONV-001 through F-CONV-RT-001
  • Playbook: playbooks/models/qwen2.5-coder-1.5b-ci.playbook.yaml
  • Evidence File: output/qwen-full/evidence.json
  • Specification: docs/specifications/apr-playbook-spec.md Section 4 (Format Conversion Testing)
  • Related: [P0] Format Conversion Testing: Lossy Conversions and NaN Corruption Detected #172 (P0 Format Conversion NaN protection) - partially fixed but regression detected

Appendix: Full Tensor Corruption Log

Click to expand full 75-error validation log
blk.0.attn_k.weight: mean=342663942034581145229192610889859072.0000, 322 NaN
blk.0.attn_output.weight: mean=600104090742549741985281036230066176.0000, 1880 NaN
blk.0.attn_q.weight: mean=358825298765053356133211673599148032.0000, 2194 NaN
blk.0.attn_v.weight: mean=104987109912991073240180736066060288.0000, 173 NaN
blk.0.ffn_down.weight: mean=182500428367235730661723423562006528.0000, 7219 NaN
blk.0.ffn_gate.weight: mean=248127115888904366664980114610061312.0000, 12864 NaN
blk.0.ffn_up.weight: mean=227714970169135588388223435105370112.0000, 13612 NaN
blk.1.attn_k.weight: mean=817014914460021388316716866687991808.0000, 291 NaN
blk.1.attn_output.weight: mean=232121838638585256506020333482934272.0000, 2414 NaN
blk.1.attn_q.weight: mean=658610800790589613328103111479263232.0000, 1917 NaN
blk.1.attn_v.weight: mean=93900614097166921650294139165605888.0000, 224 NaN
blk.1.ffn_down.weight: mean=165087563774647571714992342227222528.0000, 7387 NaN, 1 Inf
blk.1.ffn_gate.weight: mean=350491169507934119065240395147378688.0000, 11249 NaN
blk.1.ffn_up.weight: mean=138885721043945784138165446621790208.0000, 14399 NaN
blk.10.attn_k.weight: mean=546944489382597570786647242189045760.0000, 332 NaN
blk.10.attn_output.weight: mean=428728821534445123910189859016278016.0000, 2136 NaN
blk.10.attn_q.weight: mean=384735443208418250985103538765430784.0000, 2094 NaN
blk.10.attn_v.weight: mean=56025798981149455007586636760350720.0000, 235 NaN
blk.10.ffn_down.weight: mean=173893259055051710120320394271391744.0000, 7033 NaN
blk.10.ffn_gate.weight: mean=291361052663150758782063519324962816.0000, 12879 NaN
blk.10.ffn_up.weight: mean=159597864201074824329220080294428672.0000, 14240 NaN
blk.11.attn_k.weight: mean=512004711257481969379219173002969088.0000, 321 NaN
blk.11.attn_output.weight: mean=347848197234620775027457362508120064.0000, 2195 NaN
blk.11.attn_q.weight: mean=375089850177200396336946327303749632.0000, 2151 NaN
blk.11.attn_v.weight: mean=94796645001121994176308324471930880.0000, 377 NaN
blk.11.ffn_down.weight: mean=208739567755441108175472479982059520.0000, 13508 NaN
blk.11.ffn_gate.weight: mean=308981455427445033161120889037651968.0000, 12461 NaN
blk.11.ffn_up.weight: mean=174119356423826791973727970319663104.0000, 13798 NaN
blk.12.attn_k.weight: mean=493066407430884793442546394834403328.0000, 293 NaN
blk.12.attn_output.weight: mean=346576466384023061012574591789301760.0000, 1977 NaN
blk.12.attn_q.weight: mean=325108011187532853454851987566755840.0000, 2189 NaN
blk.12.attn_v.weight: mean=272807144074032164845310218017439744.0000, 437 NaN
blk.12.ffn_down.weight: mean=252127306200168315937909351890550784.0000, 12964 NaN
blk.12.ffn_gate.weight: mean=304284512847990014737309627871920128.0000, 12356 NaN
blk.12.ffn_up.weight: mean=164185155003610100909801876632895488.0000, 13808 NaN
blk.13.attn_k.weight: mean=379946378082999771702755384371445760.0000, 304 NaN
blk.13.attn_output.weight: mean=170910596034958457735105041945067520.0000, 2280 NaN
[... additional layers truncated for brevity ...]

Filed by: apr-model-qa-playbook automated falsification system
Ticket Template Version: 1.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions