You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
REGRESSION: Format Conversion Still Failing After #177 Fix
Status: REGRESSION from closed #177 Severity: P0 (CRITICAL - Data Corruption) Component: apr-rosetta / realizear Discovered By: apr-model-qa-playbook requalification (2026-01-30) Blocking: Model qualification certification
Executive Summary
Issue #177 was closed, but requalification testing on 2026-01-30 shows format conversion still fails with large output differences. The Jidoka detection is working (diffs are flagged), but the root cause fix is incomplete.
Super-block (256 elements):
- Scale (fp16)
- Min (fp16)
- 32× Sub-blocks of 8 elements each
- Sub-scale (6-bit)
- 4-bit quantized weights
If the super-block scales are truncated or misaligned during conversion, all weights in that block will be off by a multiplicative factor, leading to the large cumulative diffs we observe.
Suggested Additional Fixes
1. Preserve Full Quantization Metadata
structQ4KMSuperBlock{d:f16,// Super-block scale - MUST preserve full precisiondmin:f16,// Super-block min - MUST preserve full precisionscales:[u8;12],// Sub-block scales - MUST preserve bit-exactqs:[u8;128],// Quantized values}// During conversion, ensure:// 1. d and dmin are NOT downcast to f32 then back to f16// 2. scales array is copied bit-exact, not recomputed// 3. Block alignment matches source format
REGRESSION: Format Conversion Still Failing After #177 Fix
Status: REGRESSION from closed #177
Severity: P0 (CRITICAL - Data Corruption)
Component: apr-rosetta / realizear
Discovered By: apr-model-qa-playbook requalification (2026-01-30)
Blocking: Model qualification certification
Executive Summary
Issue #177 was closed, but requalification testing on 2026-01-30 shows format conversion still fails with large output differences. The Jidoka detection is working (diffs are flagged), but the root cause fix is incomplete.
Regression Evidence
Test Environment
Test Results
Detailed Failures
Raw Evidence from evidence.json
{ "gate_id": "F-CONV-G-A", "outcome": "Falsified", "reason": "Conversion Gguf → Apr produced different output (diff: 6.77e-1, ε: 1.00e-6)", "output": "6de63189564fc936", "timestamp": "2026-01-30T14:07:23.xxx" } { "gate_id": "F-CONV-A-G", "outcome": "Falsified", "reason": "Conversion Apr → Gguf produced different output (diff: 4.16e-1, ε: 1.00e-6)", "output": "0356a3e657672e25", "timestamp": "2026-01-30T14:07:35.xxx" }Comparison: Before vs After #177 Fix
Conclusion: #177 fix improved detection and reduced diff magnitude, but diffs are still 400,000× to 700,000× above tolerance.
Root Cause Hypothesis
The #177 fix addressed:
But did NOT address:
Technical Detail
Q4_K_M uses a two-level quantization structure:
If the super-block scales are truncated or misaligned during conversion, all weights in that block will be off by a multiplicative factor, leading to the large cumulative diffs we observe.
Suggested Additional Fixes
1. Preserve Full Quantization Metadata
2. Add Tensor-Level Validation
3. Test Each Quantization Type Separately
MQS Impact
Verification Criteria
Issue is resolved when:
References
../apr-model-qa-playbook/output/qwen-requalify/evidence.json../apr-model-qa-playbook/output/qwen-requalify/mqs.json../apr-model-qa-playbook/playbooks/verify/TICKET-177.yamlFiled by: apr-model-qa-playbook requalification (automated)
Related: #177 (regression), #172 (original P0)