Summary
P0 format conversion testing in apr-model-qa-playbook has detected critical issues with apr rosetta convert. Conversions between GGUF, APR, and SafeTensors formats are not lossless and round-trip conversion introduces NaN/Inf values.
Test Configuration
- Model: Qwen/Qwen2.5-Coder-1.5B-Instruct (GGUF Q4_K_M)
- Tool: apr-model-qa-playbook v0.1.0
- Command:
apr rosetta convert
- Epsilon: 1e-6
Failures Detected
| Conversion |
Status |
Evidence |
| GGUF → APR |
❌ FALSIFIED |
diff=6.77e-1 (expected < 1e-6) |
| APR → GGUF |
❌ FALSIFIED |
diff=4.16e-1 |
| GGUF → SafeTensors |
❌ FALSIFIED |
Missing tokenizer.json/config.json |
| SafeTensors → GGUF |
❌ FALSIFIED |
diff=4.16e-1 |
| SafeTensors → APR |
❌ FALSIFIED |
diff=6.77e-1 |
| Round-trip (GGUF→APR→ST→GGUF) |
❌ FALSIFIED |
NaN/Inf values in tensors |
Round-Trip Corruption Evidence
Validation failed (75 errors):
- blk.0.attn_k.weight: contains 322 NaN values
- blk.0.attn_output.weight: contains 1880 NaN values
- blk.0.attn_q.weight: contains 2194 NaN values
- blk.0.ffn_down.weight: contains 7219 NaN values
- blk.0.ffn_gate.weight: contains 12864 NaN values
- blk.1.ffn_down.weight: contains 1 Inf values
... (75 total validation errors)
Five Whys Analysis
- Why did round-trip fail? NaN values appeared in converted tensors
- Why NaN values? Dequantization → requantization precision loss accumulates
- Why precision loss? Q4_K_M → F32 → Q4_K_M conversion is not bit-exact
- Why different outputs? Inference on corrupted weights produces different results
- Why is this P0? Any NaN in weights corrupts ALL inference - silent data corruption
Expected Behavior
Format conversions should be lossless within epsilon tolerance:
apr rosetta convert model.gguf model.apr && apr rosetta convert model.apr model2.gguf
diff <(apr rosetta inspect model.gguf) <(apr rosetta inspect model2.gguf) should show identical tensor statistics
- Round-trip conversion should never introduce NaN or Inf values
Reproduction Steps
# Install qa tool
cargo install --git https://github.com/paiml/apr-model-qa-playbook apr-qa-cli
# Run conversion tests
apr-qa run playbooks/models/qwen2.5-coder-1.5b-ci.playbook.yaml \
--subprocess \
--model-path /path/to/qwen2.5-coder-1.5b-instruct-q4_k_m.gguf \
--no-gpu
# Or test manually:
apr rosetta convert model.gguf model.apr
apr rosetta convert model.apr model.gguf
apr rosetta verify model.gguf model.apr # Should pass but doesn't
Suggested Fix
- Add bit-exact round-trip tests to
apr rosetta CI
- Implement
apr rosetta verify --strict with epsilon tolerance checking
- Add NaN/Inf detection as hard failure in conversion pipeline
- Consider storing original quantization parameters for lossless round-trip
Related
- Spec:
apr-model-qa-playbook/docs/specifications/apr-playbook-spec.md Section 4
- Evidence:
apr-model-qa-playbook/output/qwen-ci-conversion/evidence.json
Filed by apr-model-qa-playbook P0 conversion testing
Summary
P0 format conversion testing in
apr-model-qa-playbookhas detected critical issues withapr rosetta convert. Conversions between GGUF, APR, and SafeTensors formats are not lossless and round-trip conversion introduces NaN/Inf values.Test Configuration
apr rosetta convertFailures Detected
Round-Trip Corruption Evidence
Five Whys Analysis
Expected Behavior
Format conversions should be lossless within epsilon tolerance:
apr rosetta convert model.gguf model.apr && apr rosetta convert model.apr model2.ggufdiff <(apr rosetta inspect model.gguf) <(apr rosetta inspect model2.gguf)should show identical tensor statisticsReproduction Steps
Suggested Fix
apr rosettaCIapr rosetta verify --strictwith epsilon tolerance checkingRelated
apr-model-qa-playbook/docs/specifications/apr-playbook-spec.mdSection 4apr-model-qa-playbook/output/qwen-ci-conversion/evidence.jsonFiled by apr-model-qa-playbook P0 conversion testing