Skip to content

Mixed-quant GGUF investigation from capstone workflow: Q5 dequant bug and APR prompt templating bug #1623

@MyatKaung

Description

@MyatKaung

Context

I am currently learning production ML with Hugging Face through Coursera and working on a capstone project. While following the project instructions, I ran into broken APR/SafeTensors behavior in paiml/aprender, which led me to investigate the conversion and inference path.

At this point, I found two separate issues.

Findings

  1. There is a real GGUF Q5 dequantization bug in:
    crates/aprender-core/src/format/gguf/dequant.rs

Affected functions:

  • dequantize_q5_0
  • dequantize_q5_1

Problem:

  • values were emitted in the wrong order
  • Q5 high bits were indexed incorrectly

Fix:

  • match GGML / llama.cpp layout exactly
  1. There is also an APR prompt-preparation bug in:
    crates/aprender-serve/src/infer/mod.rs

Affected function:

  • prepare_tokens_apr()

Problem:

  • base APR models were being auto-wrapped in chat template form based on architecture family (qwen2, llama, etc.) or presence of ChatML special tokens in vocab
  • this is too broad for base completion models like qwen2.5-coder-0.5b

Fix:

  • only apply APR chat wrapping from explicit filename hints such as instruct or -chat

Important conclusion

After the Q5 fix, I verified that the fallback conversion path itself was numerically clean:

  • GGUF vs APR tensors matched
  • GGUF vs SafeTensors tensors matched

So the remaining apparent “garbage inference” was not conversion corruption. It came from APR prompt templating.

Validation

  • cargo build -p aprender-core passed after the Q5 fix
  • direct APR-loaded inference was coherent
  • apr run ... '2+2=' --temperature 0 --no-gpu stopped producing garbage-looking output after the APR prompt-prep fix

Contribution status

I already split the work into two clean branches on my fork:

  • fix-q5-dequant
  • fix-apr-prompt-template

Because PRs from non-authorized contributors are auto-closed here, I’m opening this issue first to ask for the preferred contribution path.

If useful, I can provide:

  • the exact patch
  • branch links from my fork
  • a more detailed repro and investigation summary

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions