Skip to content

[P0] Format Conversion Testing: Lossy Conversions and NaN Corruption Detected #172

@noahgift

Description

@noahgift

Summary

P0 format conversion testing in apr-model-qa-playbook has detected critical issues with apr rosetta convert. Conversions between GGUF, APR, and SafeTensors formats are not lossless and round-trip conversion introduces NaN/Inf values.

Test Configuration

  • Model: Qwen/Qwen2.5-Coder-1.5B-Instruct (GGUF Q4_K_M)
  • Tool: apr-model-qa-playbook v0.1.0
  • Command: apr rosetta convert
  • Epsilon: 1e-6

Failures Detected

Conversion Status Evidence
GGUF → APR ❌ FALSIFIED diff=6.77e-1 (expected < 1e-6)
APR → GGUF ❌ FALSIFIED diff=4.16e-1
GGUF → SafeTensors ❌ FALSIFIED Missing tokenizer.json/config.json
SafeTensors → GGUF ❌ FALSIFIED diff=4.16e-1
SafeTensors → APR ❌ FALSIFIED diff=6.77e-1
Round-trip (GGUF→APR→ST→GGUF) ❌ FALSIFIED NaN/Inf values in tensors

Round-Trip Corruption Evidence

Validation failed (75 errors):
- blk.0.attn_k.weight: contains 322 NaN values
- blk.0.attn_output.weight: contains 1880 NaN values
- blk.0.attn_q.weight: contains 2194 NaN values
- blk.0.ffn_down.weight: contains 7219 NaN values
- blk.0.ffn_gate.weight: contains 12864 NaN values
- blk.1.ffn_down.weight: contains 1 Inf values
... (75 total validation errors)

Five Whys Analysis

  1. Why did round-trip fail? NaN values appeared in converted tensors
  2. Why NaN values? Dequantization → requantization precision loss accumulates
  3. Why precision loss? Q4_K_M → F32 → Q4_K_M conversion is not bit-exact
  4. Why different outputs? Inference on corrupted weights produces different results
  5. Why is this P0? Any NaN in weights corrupts ALL inference - silent data corruption

Expected Behavior

Format conversions should be lossless within epsilon tolerance:

  • apr rosetta convert model.gguf model.apr && apr rosetta convert model.apr model2.gguf
  • diff <(apr rosetta inspect model.gguf) <(apr rosetta inspect model2.gguf) should show identical tensor statistics
  • Round-trip conversion should never introduce NaN or Inf values

Reproduction Steps

# Install qa tool
cargo install --git https://github.com/paiml/apr-model-qa-playbook apr-qa-cli

# Run conversion tests
apr-qa run playbooks/models/qwen2.5-coder-1.5b-ci.playbook.yaml \
  --subprocess \
  --model-path /path/to/qwen2.5-coder-1.5b-instruct-q4_k_m.gguf \
  --no-gpu

# Or test manually:
apr rosetta convert model.gguf model.apr
apr rosetta convert model.apr model.gguf
apr rosetta verify model.gguf model.apr  # Should pass but doesn't

Suggested Fix

  1. Add bit-exact round-trip tests to apr rosetta CI
  2. Implement apr rosetta verify --strict with epsilon tolerance checking
  3. Add NaN/Inf detection as hard failure in conversion pipeline
  4. Consider storing original quantization parameters for lossless round-trip

Related

  • Spec: apr-model-qa-playbook/docs/specifications/apr-playbook-spec.md Section 4
  • Evidence: apr-model-qa-playbook/output/qwen-ci-conversion/evidence.json

Filed by apr-model-qa-playbook P0 conversion testing

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions