Skip to content

[BUG-APR-002] APR format missing tokenizer - shows placeholder text #156

@noahgift

Description

@noahgift

Summary

APR format inference shows "[N tokens generated, tokenizer not found]" instead of decoded text.

Evidence

apr run model.apr --prompt "What is 2+2?"
Output: "[10 tokens generated, tokenizer not found]"

Root Cause

APR format doesn't bundle the tokenizer. The AprV2Model::load_tokenizer() returns None.

Acceptance Criteria

  • APR files include embedded tokenizer OR load from sibling tokenizer.json
  • apr run model.apr shows decoded text output
  • QA matrix CPU×APR passes (15/15 points)
  • QA matrix GPU×APR passes (15/15 points)

Proposed Fix

  1. Option A: Embed tokenizer.json in APR format (increases file size)
  2. Option B: Load tokenizer from sibling file (requires tokenizer.json next to .apr)
  3. Option C: Download tokenizer from HuggingFace if model has hf_repo metadata

Files to Modify

  • realizar/src/apr/mod.rs - AprV2Model::load_tokenizer
  • realizar/src/infer/mod.rs - run_apr_inference
  • aprender/src/format/apr.rs - APR format spec

Labels

bug, P1, tokenizer, apr-format

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions