Skip to content

PMAT-232: GGUF weights-only files fail conversion to APR #216

@noahgift

Description

@noahgift

Summary

GGUF files without embedded tokenizer vocabulary cannot be converted to APR format. This blocks many conversion test scenarios.

Evidence (Source of Truth)

From fresh runs on 2026-02-06 with apr-cli 0.2.12:

Error Message

[PMAT-232] ERROR: GGUF file 'output/workspace/Qwen/Qwen2.5-Coder-0.5B-Instruct/gguf/model.gguf' 
has no embedded tokenizer vocabulary. This is a 'weights-only' GGUF that cannot produce a 
working APR file. Solutions: 
  (1) Use a GGUF with embedded tokenizer, or 
  (2) Provide --tokenizer /path/to/tokenizer.json, or 
  (3) Use SafeTensors format with sibling tokenizer.json, or 
  (4) Import from HuggingFace source: apr import hf://ORG/REPO -o model.apr

Affected Tests (10 failures per model)

F-CONV-G-A: Conversion GGUF → APR fails
F-CONV-G-S: Conversion GGUF → SafeTensors fails
F-CONV-RT-001: Round-trip fails (involves GGUF)
F-CONV-IDEM-001: Idempotency test fails (involves GGUF)
F-CONV-COM-001: Commutativity test fails (involves GGUF)

Context

The workspace preparation phase creates GGUF files via:

apr rosetta convert safetensors_model.safetensors model.gguf

This produces a "weights-only" GGUF without tokenizer. When we later try:

apr rosetta convert model.gguf model.apr

It fails because APR format requires tokenizer vocabulary.

Expected Behavior

Either:

  1. apr rosetta convert should preserve/embed tokenizer when creating GGUF from SafeTensors
  2. Or it should auto-detect sibling tokenizer.json and use it when converting GGUF → APR
  3. Or workspace preparation should pass --tokenizer to conversions

Impact

  • 10/47 scenarios fail for 0.5B model
  • 10/44 scenarios fail for 1.5B model
  • Blocks ~20% of conversion test matrix

Links

  • Evidence: apr-model-qa-playbook/output/mvp-0.5b/evidence.json
  • Evidence: apr-model-qa-playbook/output/mvp-1.5b/evidence.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions