Context
I am currently learning production ML with Hugging Face through Coursera and working on a capstone project. While following the project instructions, I ran into broken APR/SafeTensors behavior in paiml/aprender, which led me to investigate the conversion and inference path.
At this point, I found two separate issues.
Findings
- There is a real GGUF Q5 dequantization bug in:
crates/aprender-core/src/format/gguf/dequant.rs
Affected functions:
dequantize_q5_0
dequantize_q5_1
Problem:
- values were emitted in the wrong order
- Q5 high bits were indexed incorrectly
Fix:
- match GGML / llama.cpp layout exactly
- There is also an APR prompt-preparation bug in:
crates/aprender-serve/src/infer/mod.rs
Affected function:
Problem:
- base APR models were being auto-wrapped in chat template form based on architecture family (
qwen2, llama, etc.) or presence of ChatML special tokens in vocab
- this is too broad for base completion models like
qwen2.5-coder-0.5b
Fix:
- only apply APR chat wrapping from explicit filename hints such as
instruct or -chat
Important conclusion
After the Q5 fix, I verified that the fallback conversion path itself was numerically clean:
- GGUF vs APR tensors matched
- GGUF vs SafeTensors tensors matched
So the remaining apparent “garbage inference” was not conversion corruption. It came from APR prompt templating.
Validation
cargo build -p aprender-core passed after the Q5 fix
- direct APR-loaded inference was coherent
apr run ... '2+2=' --temperature 0 --no-gpu stopped producing garbage-looking output after the APR prompt-prep fix
Contribution status
I already split the work into two clean branches on my fork:
fix-q5-dequant
fix-apr-prompt-template
Because PRs from non-authorized contributors are auto-closed here, I’m opening this issue first to ask for the preferred contribution path.
If useful, I can provide:
- the exact patch
- branch links from my fork
- a more detailed repro and investigation summary
Context
I am currently learning production ML with Hugging Face through Coursera and working on a capstone project. While following the project instructions, I ran into broken APR/SafeTensors behavior in
paiml/aprender, which led me to investigate the conversion and inference path.At this point, I found two separate issues.
Findings
crates/aprender-core/src/format/gguf/dequant.rsAffected functions:
dequantize_q5_0dequantize_q5_1Problem:
Fix:
crates/aprender-serve/src/infer/mod.rsAffected function:
prepare_tokens_apr()Problem:
qwen2,llama, etc.) or presence of ChatML special tokens in vocabqwen2.5-coder-0.5bFix:
instructor-chatImportant conclusion
After the Q5 fix, I verified that the fallback conversion path itself was numerically clean:
So the remaining apparent “garbage inference” was not conversion corruption. It came from APR prompt templating.
Validation
cargo build -p aprender-corepassed after the Q5 fixapr run ... '2+2=' --temperature 0 --no-gpustopped producing garbage-looking output after the APR prompt-prep fixContribution status
I already split the work into two clean branches on my fork:
fix-q5-dequantfix-apr-prompt-templateBecause PRs from non-authorized contributors are auto-closed here, I’m opening this issue first to ask for the preferred contribution path.
If useful, I can provide: