SafeTensors Inference Fails After GGUF Conversion
Severity: P1 (Blocks conversion testing)
Component: apr-rosetta / realizear / inference
Discovered By: apr-model-qa-playbook conversion tests
Date: 2026-01-30
Summary
When converting GGUF → SafeTensors, the converted file cannot be used for inference because:
tokenizer.json is not copied/generated alongside the converted model
config.json is not copied/generated alongside the converted model
This blocks all SafeTensors conversion tests (F-CONV-003, F-CONV-005).
Reproduction
# Start with working GGUF model
MODEL="/home/noah/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct-GGUF/snapshots/f86cb2c1fa58255f8052cc32aeede1b7482d4361/qwen2.5-coder-1.5b-instruct-q4_k_m.gguf"
# Verify source works
apr run "$MODEL" "What is 2+2?" -n 32
# ✅ Works fine
# Convert to SafeTensors
apr rosetta convert "$MODEL" converted.safetensors
# Converts successfully
# Try to run inference on converted model
apr run converted.safetensors "What is 2+2?" -n 32
Expected
2+2 = 4
Generated 32 tokens in ...
Actual
[PMAT-172] ERROR: No tokenizer found for converted.safetensors.
Expected sibling file: tokenizer.json
For SafeTensors models, tokenizer.json must be in same directory.
error: Inference failed: Operation 'safetensors_convert' not supported: config.json not found (required for SafeTensors inference)
Root Cause Analysis
Why SafeTensors Needs These Files
Unlike GGUF (which embeds all metadata including tokenizer), SafeTensors is a pure tensor storage format. It requires companion files:
| File |
Purpose |
Required For |
tokenizer.json |
BPE/tokenizer vocabulary and rules |
Encoding input text |
config.json |
Model architecture config (layers, dims, etc.) |
Building model graph |
model.safetensors |
Tensor weights only |
Inference |
What Happens During Conversion
GGUF (self-contained) SafeTensors (file trio)
┌──────────────────────┐ ┌─────────────────────┐
│ Header │ │ model.safetensors │
│ Tokenizer (embedded) │ ──► │ (weights only) │
│ Config (embedded) │ ├─────────────────────┤
│ Tensor weights │ │ tokenizer.json │ ← NOT CREATED
└──────────────────────┘ │ config.json │ ← NOT CREATED
└─────────────────────┘
Suggested Fix
Option A: Extract and Write Companion Files
// In apr-rosetta convert:
fn convert_gguf_to_safetensors(gguf_path: &Path, output_path: &Path) -> Result<()> {
let gguf = GgufFile::load(gguf_path)?;
// 1. Write tensor weights
write_safetensors(output_path, &gguf.tensors)?;
// 2. Extract and write tokenizer
let tokenizer = gguf.extract_tokenizer()?;
let tokenizer_path = output_path.with_file_name("tokenizer.json");
std::fs::write(&tokenizer_path, serde_json::to_string_pretty(&tokenizer)?)?;
// 3. Extract and write config
let config = gguf.extract_config()?;
let config_path = output_path.with_file_name("config.json");
std::fs::write(&config_path, serde_json::to_string_pretty(&config)?)?;
Ok(())
}
Option B: Copy From HuggingFace Cache
If the model has a known HuggingFace repo, copy tokenizer/config from the cached repo:
fn find_companion_files(model_id: &str) -> Option<(PathBuf, PathBuf)> {
let hf_cache = dirs::cache_dir()?.join("huggingface/hub");
let repo_dir = hf_cache.join(format!("models--{}--{}", org, name));
let tokenizer = repo_dir.join("tokenizer.json");
let config = repo_dir.join("config.json");
if tokenizer.exists() && config.exists() {
Some((tokenizer, config))
} else {
None
}
}
Option C: Error with Actionable Message
At minimum, provide a clear error with instructions:
Error: SafeTensors inference requires companion files.
Missing:
- tokenizer.json (tokenizer vocabulary)
- config.json (model architecture)
To fix, copy these files from the HuggingFace model directory:
cp ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct/tokenizer.json ./
cp ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct/config.json ./
Evidence from Test Run
{
"gate_id": "F-CONV-G-S",
"outcome": "Falsified",
"reason": "Conversion infrastructure error: Execution error: Inference failed: \n[PMAT-172] ERROR: No tokenizer found for .../qwen2.5-coder-1.5b-instruct-q4_k_m.converted.safetensors.\n Expected sibling file: .../tokenizer.json\n For SafeTensors models, tokenizer.json must be in same directory.\n\nerror: Inference failed: Inference failed: Operation 'safetensors_convert' not supported: config.json not found (required for SafeTensors inference)\n",
"output": "N/A"
}
Impact
Blocked Tests
- F-CONV-003: GGUF → SafeTensors
- F-CONV-005: APR → SafeTensors
- Any round-trip involving SafeTensors
MQS Impact
- 2 conversion gates blocked (10 points)
- Round-trip tests incomplete
Verification
Once fixed:
# Convert GGUF to SafeTensors
apr rosetta convert model.gguf model.safetensors
# Verify companion files exist
ls -la model.safetensors tokenizer.json config.json
# All three files should exist
# Verify inference works
apr run model.safetensors "What is 2+2?" -n 32
# Should produce valid output
# Run conversion test suite
cd ../apr-model-qa-playbook
cargo run --bin apr-qa -- run playbooks/models/qwen2.5-coder-1.5b-ci.playbook.yaml \
--subprocess --model-path <model.gguf> --no-gpu
# F-CONV-003 and F-CONV-005 should PASS
References
Filed by: apr-model-qa-playbook conversion test infrastructure
SafeTensors Inference Fails After GGUF Conversion
Severity: P1 (Blocks conversion testing)
Component: apr-rosetta / realizear / inference
Discovered By: apr-model-qa-playbook conversion tests
Date: 2026-01-30
Summary
When converting GGUF → SafeTensors, the converted file cannot be used for inference because:
tokenizer.jsonis not copied/generated alongside the converted modelconfig.jsonis not copied/generated alongside the converted modelThis blocks all SafeTensors conversion tests (F-CONV-003, F-CONV-005).
Reproduction
Expected
Actual
Root Cause Analysis
Why SafeTensors Needs These Files
Unlike GGUF (which embeds all metadata including tokenizer), SafeTensors is a pure tensor storage format. It requires companion files:
tokenizer.jsonconfig.jsonmodel.safetensorsWhat Happens During Conversion
Suggested Fix
Option A: Extract and Write Companion Files
Option B: Copy From HuggingFace Cache
If the model has a known HuggingFace repo, copy tokenizer/config from the cached repo:
Option C: Error with Actionable Message
At minimum, provide a clear error with instructions:
Evidence from Test Run
{ "gate_id": "F-CONV-G-S", "outcome": "Falsified", "reason": "Conversion infrastructure error: Execution error: Inference failed: \n[PMAT-172] ERROR: No tokenizer found for .../qwen2.5-coder-1.5b-instruct-q4_k_m.converted.safetensors.\n Expected sibling file: .../tokenizer.json\n For SafeTensors models, tokenizer.json must be in same directory.\n\nerror: Inference failed: Inference failed: Operation 'safetensors_convert' not supported: config.json not found (required for SafeTensors inference)\n", "output": "N/A" }Impact
Blocked Tests
MQS Impact
Verification
Once fixed:
References
../apr-model-qa-playbook/output/qwen-requalify/evidence.jsonFiled by: apr-model-qa-playbook conversion test infrastructure