SafeTensors inference fails: tokenizer.json and config.json not found after GGUF conversion

# SafeTensors Inference Fails After GGUF Conversion

**Severity:** P1 (Blocks conversion testing)
**Component:** apr-rosetta / realizear / inference
**Discovered By:** apr-model-qa-playbook conversion tests
**Date:** 2026-01-30

---

## Summary

When converting GGUF → SafeTensors, the converted file cannot be used for inference because:
1. `tokenizer.json` is not copied/generated alongside the converted model
2. `config.json` is not copied/generated alongside the converted model

This blocks all SafeTensors conversion tests (F-CONV-003, F-CONV-005).

---

## Reproduction

```bash
# Start with working GGUF model
MODEL="/home/noah/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct-GGUF/snapshots/f86cb2c1fa58255f8052cc32aeede1b7482d4361/qwen2.5-coder-1.5b-instruct-q4_k_m.gguf"

# Verify source works
apr run "$MODEL" "What is 2+2?" -n 32
# ✅ Works fine

# Convert to SafeTensors
apr rosetta convert "$MODEL" converted.safetensors
# Converts successfully

# Try to run inference on converted model
apr run converted.safetensors "What is 2+2?" -n 32
```

### Expected
```
2+2 = 4
Generated 32 tokens in ...
```

### Actual
```
[PMAT-172] ERROR: No tokenizer found for converted.safetensors.
           Expected sibling file: tokenizer.json
           For SafeTensors models, tokenizer.json must be in same directory.

error: Inference failed: Operation 'safetensors_convert' not supported: config.json not found (required for SafeTensors inference)
```

---

## Root Cause Analysis

### Why SafeTensors Needs These Files

Unlike GGUF (which embeds all metadata including tokenizer), SafeTensors is a pure tensor storage format. It requires companion files:

| File | Purpose | Required For |
|------|---------|--------------|
| `tokenizer.json` | BPE/tokenizer vocabulary and rules | Encoding input text |
| `config.json` | Model architecture config (layers, dims, etc.) | Building model graph |
| `model.safetensors` | Tensor weights only | Inference |

### What Happens During Conversion

```
GGUF (self-contained)          SafeTensors (file trio)
┌──────────────────────┐       ┌─────────────────────┐
│ Header               │       │ model.safetensors   │
│ Tokenizer (embedded) │  ──►  │ (weights only)      │
│ Config (embedded)    │       ├─────────────────────┤
│ Tensor weights       │       │ tokenizer.json      │ ← NOT CREATED
└──────────────────────┘       │ config.json         │ ← NOT CREATED
                               └─────────────────────┘
```

---

## Suggested Fix

### Option A: Extract and Write Companion Files

```rust
// In apr-rosetta convert:
fn convert_gguf_to_safetensors(gguf_path: &Path, output_path: &Path) -> Result<()> {
    let gguf = GgufFile::load(gguf_path)?;
    
    // 1. Write tensor weights
    write_safetensors(output_path, &gguf.tensors)?;
    
    // 2. Extract and write tokenizer
    let tokenizer = gguf.extract_tokenizer()?;
    let tokenizer_path = output_path.with_file_name("tokenizer.json");
    std::fs::write(&tokenizer_path, serde_json::to_string_pretty(&tokenizer)?)?;
    
    // 3. Extract and write config
    let config = gguf.extract_config()?;
    let config_path = output_path.with_file_name("config.json");
    std::fs::write(&config_path, serde_json::to_string_pretty(&config)?)?;
    
    Ok(())
}
```

### Option B: Copy From HuggingFace Cache

If the model has a known HuggingFace repo, copy tokenizer/config from the cached repo:

```rust
fn find_companion_files(model_id: &str) -> Option<(PathBuf, PathBuf)> {
    let hf_cache = dirs::cache_dir()?.join("huggingface/hub");
    let repo_dir = hf_cache.join(format!("models--{}--{}", org, name));
    
    let tokenizer = repo_dir.join("tokenizer.json");
    let config = repo_dir.join("config.json");
    
    if tokenizer.exists() && config.exists() {
        Some((tokenizer, config))
    } else {
        None
    }
}
```

### Option C: Error with Actionable Message

At minimum, provide a clear error with instructions:

```
Error: SafeTensors inference requires companion files.

Missing:
  - tokenizer.json (tokenizer vocabulary)
  - config.json (model architecture)

To fix, copy these files from the HuggingFace model directory:
  cp ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct/tokenizer.json ./
  cp ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct/config.json ./
```

---

## Evidence from Test Run

```json
{
  "gate_id": "F-CONV-G-S",
  "outcome": "Falsified",
  "reason": "Conversion infrastructure error: Execution error: Inference failed: \n[PMAT-172] ERROR: No tokenizer found for .../qwen2.5-coder-1.5b-instruct-q4_k_m.converted.safetensors.\n           Expected sibling file: .../tokenizer.json\n           For SafeTensors models, tokenizer.json must be in same directory.\n\nerror: Inference failed: Inference failed: Operation 'safetensors_convert' not supported: config.json not found (required for SafeTensors inference)\n",
  "output": "N/A"
}
```

---

## Impact

### Blocked Tests
- F-CONV-003: GGUF → SafeTensors
- F-CONV-005: APR → SafeTensors
- Any round-trip involving SafeTensors

### MQS Impact
- 2 conversion gates blocked (10 points)
- Round-trip tests incomplete

---

## Verification

Once fixed:

```bash
# Convert GGUF to SafeTensors
apr rosetta convert model.gguf model.safetensors

# Verify companion files exist
ls -la model.safetensors tokenizer.json config.json
# All three files should exist

# Verify inference works
apr run model.safetensors "What is 2+2?" -n 32
# Should produce valid output

# Run conversion test suite
cd ../apr-model-qa-playbook
cargo run --bin apr-qa -- run playbooks/models/qwen2.5-coder-1.5b-ci.playbook.yaml \
  --subprocess --model-path <model.gguf> --no-gpu

# F-CONV-003 and F-CONV-005 should PASS
```

---

## References

- Test evidence: `../apr-model-qa-playbook/output/qwen-requalify/evidence.json`
- Related: #181 (conversion regression), #177 (original conversion issue)
- Spec: Section 4.4 (Conversion Falsification Gates)

---

**Filed by:** apr-model-qa-playbook conversion test infrastructure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SafeTensors inference fails: tokenizer.json and config.json not found after GGUF conversion #182

SafeTensors Inference Fails After GGUF Conversion

Summary

Reproduction

Expected

Actual

Root Cause Analysis

Why SafeTensors Needs These Files

What Happens During Conversion

Suggested Fix

Option A: Extract and Write Companion Files

Option B: Copy From HuggingFace Cache

Option C: Error with Actionable Message

Evidence from Test Run

Impact

Blocked Tests

MQS Impact

Verification

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

File	Purpose	Required For
`tokenizer.json`	BPE/tokenizer vocabulary and rules	Encoding input text
`config.json`	Model architecture config (layers, dims, etc.)	Building model graph
`model.safetensors`	Tensor weights only	Inference

SafeTensors inference fails: tokenizer.json and config.json not found after GGUF conversion #182

Description

SafeTensors Inference Fails After GGUF Conversion

Summary

Reproduction

Expected

Actual

Root Cause Analysis

Why SafeTensors Needs These Files

What Happens During Conversion

Suggested Fix

Option A: Extract and Write Companion Files

Option B: Copy From HuggingFace Cache

Option C: Error with Actionable Message

Evidence from Test Run

Impact

Blocked Tests

MQS Impact

Verification

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions