Summary
Converting GGUF to APR format does not embed the tokenizer, causing inference to fail or produce incorrect output.
Expected Behavior
APR files converted from GGUF should include an embedded tokenizer, allowing self-contained inference.
Actual Behavior
- Conversion completes successfully without error
- APR file is created but missing tokenizer
- Inference produces error:
[PMAT-172] ERROR: APR file missing embedded tokenizer.
- Even without error, inference produces completely different output than source GGUF
Reproduction Steps
MODEL=qwen2.5-coder-1.5b-instruct-q4_k_m.gguf
# Convert
apr rosetta convert $MODEL test.apr
# Run inference on GGUF - correct output
apr run $MODEL -p "What is 2+2? Answer with just the number:" --max-tokens 8 --no-gpu
# Output: 4
# Run inference on APR - wrong output
apr run test.apr -p "What is 2+2? Answer with just the number:" --max-tokens 8 --no-gpu
# Error: [PMAT-172] ERROR: APR file missing embedded tokenizer.
# Output: 1. What is the difference between a
Impact
- P0 BLOCKER: Format conversion testing cannot pass
- All 6 conversion gates failing (F-CONV-G-A, F-CONV-A-G, F-CONV-G-S, F-CONV-S-G, F-CONV-A-S, F-CONV-S-A)
- Round-trip verification failing (F-CONV-RT-001)
- Model qualification blocked
Five Whys Root Cause Analysis
-
Why does APR inference produce wrong output?
- Tokenizer is missing from APR file
-
Why is the tokenizer missing from APR file?
- GGUF → APR conversion doesn't extract/embed the tokenizer
-
Why doesn't conversion extract the tokenizer?
- GGUF stores tokenizer data in metadata fields, conversion only copies tensor data
-
Why does conversion only copy tensor data?
- Original design focused on weight format conversion, not full model packaging
-
Why wasn't tokenizer embedding required originally?
- Early APR usage may have relied on external tokenizer.json files
Suggested Fix
In src/format/converter.rs, the GGUF → APR conversion should:
-
Extract tokenizer vocabulary from GGUF metadata:
tokenizer.ggml.tokens - token strings
tokenizer.ggml.token_type - token types
tokenizer.ggml.scores - token scores/merges
tokenizer.ggml.bos_token_id / eos_token_id etc.
-
Embed tokenizer into APR format:
- Either as embedded JSON blob
- Or as native APR tokenizer section
-
Validate tokenizer presence in output APR file
Verification Test
# After fix, this should produce identical output:
apr rosetta convert model.gguf model.apr
apr run model.gguf -p "2+2=" --max-tokens 8 > gguf_out.txt
apr run model.apr -p "2+2=" --max-tokens 8 > apr_out.txt
diff gguf_out.txt apr_out.txt # Should be empty
Related Issues
Environment
- apr-cli version: 0.2.12
- OS: Linux 6.8.0-90-generic
- Model: Qwen2.5-Coder-1.5B-Instruct Q4_K_M
Summary
Converting GGUF to APR format does not embed the tokenizer, causing inference to fail or produce incorrect output.
Expected Behavior
APR files converted from GGUF should include an embedded tokenizer, allowing self-contained inference.
Actual Behavior
[PMAT-172] ERROR: APR file missing embedded tokenizer.Reproduction Steps
Impact
Five Whys Root Cause Analysis
Why does APR inference produce wrong output?
Why is the tokenizer missing from APR file?
Why doesn't conversion extract the tokenizer?
Why does conversion only copy tensor data?
Why wasn't tokenizer embedding required originally?
Suggested Fix
In
src/format/converter.rs, the GGUF → APR conversion should:Extract tokenizer vocabulary from GGUF metadata:
tokenizer.ggml.tokens- token stringstokenizer.ggml.token_type- token typestokenizer.ggml.scores- token scores/mergestokenizer.ggml.bos_token_id/eos_token_idetc.Embed tokenizer into APR format:
Validate tokenizer presence in output APR file
Verification Test
Related Issues
Environment