Skip to content

SafeTensors inference fails: tokenizer.json and config.json not found after GGUF conversion #182

@noahgift

Description

@noahgift

SafeTensors Inference Fails After GGUF Conversion

Severity: P1 (Blocks conversion testing)
Component: apr-rosetta / realizear / inference
Discovered By: apr-model-qa-playbook conversion tests
Date: 2026-01-30


Summary

When converting GGUF → SafeTensors, the converted file cannot be used for inference because:

  1. tokenizer.json is not copied/generated alongside the converted model
  2. config.json is not copied/generated alongside the converted model

This blocks all SafeTensors conversion tests (F-CONV-003, F-CONV-005).


Reproduction

# Start with working GGUF model
MODEL="/home/noah/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct-GGUF/snapshots/f86cb2c1fa58255f8052cc32aeede1b7482d4361/qwen2.5-coder-1.5b-instruct-q4_k_m.gguf"

# Verify source works
apr run "$MODEL" "What is 2+2?" -n 32
# ✅ Works fine

# Convert to SafeTensors
apr rosetta convert "$MODEL" converted.safetensors
# Converts successfully

# Try to run inference on converted model
apr run converted.safetensors "What is 2+2?" -n 32

Expected

2+2 = 4
Generated 32 tokens in ...

Actual

[PMAT-172] ERROR: No tokenizer found for converted.safetensors.
           Expected sibling file: tokenizer.json
           For SafeTensors models, tokenizer.json must be in same directory.

error: Inference failed: Operation 'safetensors_convert' not supported: config.json not found (required for SafeTensors inference)

Root Cause Analysis

Why SafeTensors Needs These Files

Unlike GGUF (which embeds all metadata including tokenizer), SafeTensors is a pure tensor storage format. It requires companion files:

File Purpose Required For
tokenizer.json BPE/tokenizer vocabulary and rules Encoding input text
config.json Model architecture config (layers, dims, etc.) Building model graph
model.safetensors Tensor weights only Inference

What Happens During Conversion

GGUF (self-contained)          SafeTensors (file trio)
┌──────────────────────┐       ┌─────────────────────┐
│ Header               │       │ model.safetensors   │
│ Tokenizer (embedded) │  ──►  │ (weights only)      │
│ Config (embedded)    │       ├─────────────────────┤
│ Tensor weights       │       │ tokenizer.json      │ ← NOT CREATED
└──────────────────────┘       │ config.json         │ ← NOT CREATED
                               └─────────────────────┘

Suggested Fix

Option A: Extract and Write Companion Files

// In apr-rosetta convert:
fn convert_gguf_to_safetensors(gguf_path: &Path, output_path: &Path) -> Result<()> {
    let gguf = GgufFile::load(gguf_path)?;
    
    // 1. Write tensor weights
    write_safetensors(output_path, &gguf.tensors)?;
    
    // 2. Extract and write tokenizer
    let tokenizer = gguf.extract_tokenizer()?;
    let tokenizer_path = output_path.with_file_name("tokenizer.json");
    std::fs::write(&tokenizer_path, serde_json::to_string_pretty(&tokenizer)?)?;
    
    // 3. Extract and write config
    let config = gguf.extract_config()?;
    let config_path = output_path.with_file_name("config.json");
    std::fs::write(&config_path, serde_json::to_string_pretty(&config)?)?;
    
    Ok(())
}

Option B: Copy From HuggingFace Cache

If the model has a known HuggingFace repo, copy tokenizer/config from the cached repo:

fn find_companion_files(model_id: &str) -> Option<(PathBuf, PathBuf)> {
    let hf_cache = dirs::cache_dir()?.join("huggingface/hub");
    let repo_dir = hf_cache.join(format!("models--{}--{}", org, name));
    
    let tokenizer = repo_dir.join("tokenizer.json");
    let config = repo_dir.join("config.json");
    
    if tokenizer.exists() && config.exists() {
        Some((tokenizer, config))
    } else {
        None
    }
}

Option C: Error with Actionable Message

At minimum, provide a clear error with instructions:

Error: SafeTensors inference requires companion files.

Missing:
  - tokenizer.json (tokenizer vocabulary)
  - config.json (model architecture)

To fix, copy these files from the HuggingFace model directory:
  cp ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct/tokenizer.json ./
  cp ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct/config.json ./

Evidence from Test Run

{
  "gate_id": "F-CONV-G-S",
  "outcome": "Falsified",
  "reason": "Conversion infrastructure error: Execution error: Inference failed: \n[PMAT-172] ERROR: No tokenizer found for .../qwen2.5-coder-1.5b-instruct-q4_k_m.converted.safetensors.\n           Expected sibling file: .../tokenizer.json\n           For SafeTensors models, tokenizer.json must be in same directory.\n\nerror: Inference failed: Inference failed: Operation 'safetensors_convert' not supported: config.json not found (required for SafeTensors inference)\n",
  "output": "N/A"
}

Impact

Blocked Tests

  • F-CONV-003: GGUF → SafeTensors
  • F-CONV-005: APR → SafeTensors
  • Any round-trip involving SafeTensors

MQS Impact

  • 2 conversion gates blocked (10 points)
  • Round-trip tests incomplete

Verification

Once fixed:

# Convert GGUF to SafeTensors
apr rosetta convert model.gguf model.safetensors

# Verify companion files exist
ls -la model.safetensors tokenizer.json config.json
# All three files should exist

# Verify inference works
apr run model.safetensors "What is 2+2?" -n 32
# Should produce valid output

# Run conversion test suite
cd ../apr-model-qa-playbook
cargo run --bin apr-qa -- run playbooks/models/qwen2.5-coder-1.5b-ci.playbook.yaml \
  --subprocess --model-path <model.gguf> --no-gpu

# F-CONV-003 and F-CONV-005 should PASS

References


Filed by: apr-model-qa-playbook conversion test infrastructure

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions