P0 CRITICAL: Format conversion introduces NaN/Inf corruption in tensor weights

# P0 CRITICAL: Format Conversion Introduces NaN/Inf Corruption

**Status:** OPEN
**Severity:** P0 (CRITICAL - Data Corruption)
**Component:** apr-rosetta / realizear
**Discovered By:** apr-model-qa-playbook (Popperian Falsification)
**Date:** 2026-01-30
**Blocking:** Model qualification certification

---

## Executive Summary

Format conversion via `apr rosetta convert` introduces **catastrophic numerical corruption** including NaN values, Inf values, and tensor weight explosions (means exceeding 10^38). This affects ALL conversion paths and renders converted models unusable. This is a **data integrity violation** that blocks model certification.

---

## Reproduction

### Environment
```
Host: noah-Lambda-Vector
OS: Linux
apr version: (current main)
Model: Qwen/Qwen2.5-Coder-1.5B-Instruct
Source: /home/noah/.cache/pacha/models/c8490f8cd005ac4e.gguf (1.04 GB GGUF Q4_K_M)
```

### Minimal Reproduction Commands

```bash
# 1. Direct conversion GGUF → APR (FAILS)
apr rosetta convert model.gguf model.apr
apr run model.apr "What is 2+2?" -n 32
# Result: Output differs from source by 84.6% (diff: 8.46e-1, ε: 1.00e-6)

# 2. Direct conversion APR → GGUF (FAILS)
apr rosetta convert model.apr model_back.gguf
apr run model_back.gguf "What is 2+2?" -n 32
# Result: Output differs from source by 63.4%

# 3. Round-trip GGUF → APR → SafeTensors → GGUF (CATASTROPHIC FAILURE)
apr rosetta convert model.gguf model.apr
apr rosetta convert model.apr model.safetensors
apr rosetta convert model.safetensors model_final.gguf
# Result: Validation fails with 75 errors - NaN/Inf values in ALL layers
```

### Expected Behavior
- Converted model produces **identical output** to source (within ε = 1e-6)
- Round-trip conversion preserves **bitwise identity** for quantized tensors
- No NaN, Inf, or numerical overflow introduced

### Actual Behavior
- Conversion introduces **84.6% output difference** (GGUF → APR)
- Round-trip introduces **massive NaN corruption** in tensor weights
- Tensor means explode to **10^38** (expected range: [-0.1, 0.1])

---

## Detailed Evidence

### Test 1: F-CONV-G-A (GGUF → APR)
```json
{
  "gate_id": "F-CONV-G-A",
  "outcome": "Falsified",
  "reason": "Conversion Gguf → Apr produced different output (diff: 8.46e-1, ε: 1.00e-6)",
  "output_hash": "951de74e85c8f75d",
  "timestamp": "2026-01-30T13:03:35.131895019Z"
}
```

### Test 2: F-CONV-A-G (APR → GGUF)
```json
{
  "gate_id": "F-CONV-A-G",
  "outcome": "Falsified",
  "reason": "Conversion Apr → Gguf produced different output (diff: 6.34e-1, ε: 1.00e-6)",
  "output_hash": "95f05bc3d19b1b6b",
  "timestamp": "2026-01-30T13:03:47.643921413Z"
}
```

### Test 3: F-CONV-G-S (GGUF → SafeTensors)
```json
{
  "gate_id": "F-CONV-G-S",
  "outcome": "Falsified",
  "reason": "Conversion infrastructure error: No tokenizer found for converted.safetensors",
  "error": "[PMAT-172] ERROR: No tokenizer found... config.json not found (required for SafeTensors inference)",
  "timestamp": "2026-01-30T13:03:59.608739501Z"
}
```

### Test 4: F-CONV-RT-001 (Round-Trip) - CATASTROPHIC
```
Round-trip failed: Validation failed (75 errors)

SAMPLE OF 75 TENSOR CORRUPTION ERRORS:

Layer blk.0.attn_k.weight:
  - mean = 342663942034581145229192610889859072.0000 (expected: [-0.1, 0.1])
  - contains 322 NaN values

Layer blk.0.attn_output.weight:
  - mean = 600104090742549741985281036230066176.0000
  - contains 1880 NaN values

Layer blk.0.attn_q.weight:
  - mean = 358825298765053356133211673599148032.0000
  - contains 2194 NaN values

Layer blk.0.ffn_gate.weight:
  - mean = 248127115888904366664980114610061312.0000
  - contains 12864 NaN values

Layer blk.1.ffn_down.weight:
  - mean = 165087563774647571714992342227222528.0000
  - contains 7387 NaN values
  - contains 1 Inf values  <-- INFINITY INTRODUCED

[... 65 more tensor corruption errors across ALL 28 layers ...]
```

---

## Five Whys Root Cause Analysis

| Why # | Question | Answer |
|-------|----------|--------|
| **Why 1** | Why do converted models produce different output? | Because tensor weights are corrupted during conversion |
| **Why 2** | Why are tensor weights corrupted? | Because NaN and Inf values are introduced in dequantization/requantization |
| **Why 3** | Why are NaN/Inf values introduced? | Likely integer overflow or division by zero in quantization scaling factors |
| **Why 4** | Why does scaling overflow? | Q4_K_M uses block-wise scaling; conversion may not preserve scale bounds |
| **Why 5** | Why aren't scale bounds preserved? | **ROOT CAUSE:** Quantization metadata (scales, mins, block structure) not correctly transferred between formats |

### Hypothesis
The GGUF Q4_K_M format stores quantization parameters (scales, minimums) in a specific block structure. When converting to APR format, these parameters are either:
1. Lost entirely (causing dequantization to fail)
2. Misinterpreted (causing incorrect scaling)
3. Truncated (causing overflow on large values)

---

## Impact Assessment

### Severity: P0 CRITICAL

| Impact | Description |
|--------|-------------|
| **Data Integrity** | Converted models produce **corrupted output** |
| **Silent Corruption** | Users may not realize output is wrong without validation |
| **Certification Blocked** | Models cannot pass MQS qualification (89.3% → should be 100%) |
| **Trust Violation** | "Zero defect" philosophy violated - passing defects downstream |

### Affected Gates (All P0)
- F-CONV-001: GGUF → APR ❌
- F-CONV-002: APR → GGUF ❌
- F-CONV-003: GGUF → SafeTensors ❌
- F-CONV-004: SafeTensors → GGUF ❌
- F-CONV-005: APR → SafeTensors ❌
- F-CONV-006: SafeTensors → APR ❌
- F-CONV-RT-001: Round-trip ❌
- F-CONV-BE-001: Backend equivalence ❌

### MQS Impact
```
Current Score:  41.1/100 (Grade F) - NOT QUALIFIED
Expected Score: 85+/100 (Grade B) - QUALIFIED
Lost Points:    ~44 points from conversion failures
```

---

## Suggested Fix

### Immediate (P0 Hotfix)
1. Add tensor validation **after every conversion step**:
   ```rust
   fn validate_tensor_post_conversion(tensor: &Tensor) -> Result<()> {
       if tensor.has_nan() {
           return Err(ConversionError::NaNIntroduced);
       }
       if tensor.has_inf() {
           return Err(ConversionError::InfIntroduced);
       }
       if tensor.mean().abs() > 100.0 {
           return Err(ConversionError::NumericalExplosion);
       }
       Ok(())
   }
   ```

2. Fail fast on corruption - do not write corrupted files

### Short-term
1. Audit quantization metadata transfer in `realizear/src/convert/`
2. Add property-based tests for round-trip conversion
3. Test all quantization types (Q4_K_M, Q5_K_M, Q8_0, F16, F32)

### Long-term
1. Implement checksum verification pre/post conversion
2. Add `--verify` flag that runs inference comparison automatically
3. Consider using GGUF as canonical intermediate format

---

## Verification

Once fixed, verify with:

```bash
# Run apr-qa-playbook verification
cd ../apr-model-qa-playbook
cargo run -p apr-qa-cli -- run playbooks/verify/TICKET-177.yaml --subprocess

# Expected output:
# F-CONV-001 through F-CONV-006: CORROBORATED
# F-CONV-RT-001: CORROBORATED
# MQS Score: 85+/100
```

---

## References

- Falsification Gate: F-CONV-001 through F-CONV-RT-001
- Playbook: `playbooks/models/qwen2.5-coder-1.5b-ci.playbook.yaml`
- Evidence File: `output/qwen-full/evidence.json`
- Specification: `docs/specifications/apr-playbook-spec.md` Section 4 (Format Conversion Testing)
- Related: #172 (P0 Format Conversion NaN protection) - partially fixed but regression detected

---

## Appendix: Full Tensor Corruption Log

<details>
<summary>Click to expand full 75-error validation log</summary>

```
blk.0.attn_k.weight: mean=342663942034581145229192610889859072.0000, 322 NaN
blk.0.attn_output.weight: mean=600104090742549741985281036230066176.0000, 1880 NaN
blk.0.attn_q.weight: mean=358825298765053356133211673599148032.0000, 2194 NaN
blk.0.attn_v.weight: mean=104987109912991073240180736066060288.0000, 173 NaN
blk.0.ffn_down.weight: mean=182500428367235730661723423562006528.0000, 7219 NaN
blk.0.ffn_gate.weight: mean=248127115888904366664980114610061312.0000, 12864 NaN
blk.0.ffn_up.weight: mean=227714970169135588388223435105370112.0000, 13612 NaN
blk.1.attn_k.weight: mean=817014914460021388316716866687991808.0000, 291 NaN
blk.1.attn_output.weight: mean=232121838638585256506020333482934272.0000, 2414 NaN
blk.1.attn_q.weight: mean=658610800790589613328103111479263232.0000, 1917 NaN
blk.1.attn_v.weight: mean=93900614097166921650294139165605888.0000, 224 NaN
blk.1.ffn_down.weight: mean=165087563774647571714992342227222528.0000, 7387 NaN, 1 Inf
blk.1.ffn_gate.weight: mean=350491169507934119065240395147378688.0000, 11249 NaN
blk.1.ffn_up.weight: mean=138885721043945784138165446621790208.0000, 14399 NaN
blk.10.attn_k.weight: mean=546944489382597570786647242189045760.0000, 332 NaN
blk.10.attn_output.weight: mean=428728821534445123910189859016278016.0000, 2136 NaN
blk.10.attn_q.weight: mean=384735443208418250985103538765430784.0000, 2094 NaN
blk.10.attn_v.weight: mean=56025798981149455007586636760350720.0000, 235 NaN
blk.10.ffn_down.weight: mean=173893259055051710120320394271391744.0000, 7033 NaN
blk.10.ffn_gate.weight: mean=291361052663150758782063519324962816.0000, 12879 NaN
blk.10.ffn_up.weight: mean=159597864201074824329220080294428672.0000, 14240 NaN
blk.11.attn_k.weight: mean=512004711257481969379219173002969088.0000, 321 NaN
blk.11.attn_output.weight: mean=347848197234620775027457362508120064.0000, 2195 NaN
blk.11.attn_q.weight: mean=375089850177200396336946327303749632.0000, 2151 NaN
blk.11.attn_v.weight: mean=94796645001121994176308324471930880.0000, 377 NaN
blk.11.ffn_down.weight: mean=208739567755441108175472479982059520.0000, 13508 NaN
blk.11.ffn_gate.weight: mean=308981455427445033161120889037651968.0000, 12461 NaN
blk.11.ffn_up.weight: mean=174119356423826791973727970319663104.0000, 13798 NaN
blk.12.attn_k.weight: mean=493066407430884793442546394834403328.0000, 293 NaN
blk.12.attn_output.weight: mean=346576466384023061012574591789301760.0000, 1977 NaN
blk.12.attn_q.weight: mean=325108011187532853454851987566755840.0000, 2189 NaN
blk.12.attn_v.weight: mean=272807144074032164845310218017439744.0000, 437 NaN
blk.12.ffn_down.weight: mean=252127306200168315937909351890550784.0000, 12964 NaN
blk.12.ffn_gate.weight: mean=304284512847990014737309627871920128.0000, 12356 NaN
blk.12.ffn_up.weight: mean=164185155003610100909801876632895488.0000, 13808 NaN
blk.13.attn_k.weight: mean=379946378082999771702755384371445760.0000, 304 NaN
blk.13.attn_output.weight: mean=170910596034958457735105041945067520.0000, 2280 NaN
[... additional layers truncated for brevity ...]
```

</details>

---

**Filed by:** apr-model-qa-playbook automated falsification system
**Ticket Template Version:** 1.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P0 CRITICAL: Format conversion introduces NaN/Inf corruption in tensor weights #177

P0 CRITICAL: Format Conversion Introduces NaN/Inf Corruption

Executive Summary

Reproduction

Environment

Minimal Reproduction Commands

Expected Behavior

Actual Behavior

Detailed Evidence

Test 1: F-CONV-G-A (GGUF → APR)

Test 2: F-CONV-A-G (APR → GGUF)

Test 3: F-CONV-G-S (GGUF → SafeTensors)

Test 4: F-CONV-RT-001 (Round-Trip) - CATASTROPHIC

Five Whys Root Cause Analysis

Hypothesis

Impact Assessment

Severity: P0 CRITICAL

Affected Gates (All P0)

MQS Impact

Suggested Fix

Immediate (P0 Hotfix)

Short-term

Long-term

Verification

References

Appendix: Full Tensor Corruption Log

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Why #	Question	Answer
Why 1	Why do converted models produce different output?	Because tensor weights are corrupted during conversion
Why 2	Why are tensor weights corrupted?	Because NaN and Inf values are introduced in dequantization/requantization
Why 3	Why are NaN/Inf values introduced?	Likely integer overflow or division by zero in quantization scaling factors
Why 4	Why does scaling overflow?	Q4_K_M uses block-wise scaling; conversion may not preserve scale bounds
Why 5	Why aren't scale bounds preserved?	ROOT CAUSE: Quantization metadata (scales, mins, block structure) not correctly transferred between formats

Impact	Description
Data Integrity	Converted models produce corrupted output
Silent Corruption	Users may not realize output is wrong without validation
Certification Blocked	Models cannot pass MQS qualification (89.3% → should be 100%)
Trust Violation	"Zero defect" philosophy violated - passing defects downstream

P0 CRITICAL: Format conversion introduces NaN/Inf corruption in tensor weights #177

Description

P0 CRITICAL: Format Conversion Introduces NaN/Inf Corruption

Executive Summary

Reproduction

Environment

Minimal Reproduction Commands

Expected Behavior

Actual Behavior

Detailed Evidence

Test 1: F-CONV-G-A (GGUF → APR)

Test 2: F-CONV-A-G (APR → GGUF)

Test 3: F-CONV-G-S (GGUF → SafeTensors)

Test 4: F-CONV-RT-001 (Round-Trip) - CATASTROPHIC

Five Whys Root Cause Analysis

Hypothesis

Impact Assessment

Severity: P0 CRITICAL

Affected Gates (All P0)

MQS Impact

Suggested Fix

Immediate (P0 Hotfix)

Short-term

Long-term

Verification

References

Appendix: Full Tensor Corruption Log

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions