Mixed-quant GGUF investigation from capstone workflow: Q5 dequant bug and APR prompt templating bug

## Context

I am currently learning production ML with Hugging Face through Coursera and working on a capstone project. While following the project instructions, I ran into broken APR/SafeTensors behavior in `paiml/aprender`, which led me to investigate the conversion and inference path.

At this point, I found two separate issues.

## Findings

1. There is a real GGUF Q5 dequantization bug in:
`crates/aprender-core/src/format/gguf/dequant.rs`

Affected functions:
- `dequantize_q5_0`
- `dequantize_q5_1`

Problem:
- values were emitted in the wrong order
- Q5 high bits were indexed incorrectly

Fix:
- match GGML / llama.cpp layout exactly

2. There is also an APR prompt-preparation bug in:
`crates/aprender-serve/src/infer/mod.rs`

Affected function:
- `prepare_tokens_apr()`

Problem:
- base APR models were being auto-wrapped in chat template form based on architecture family (`qwen2`, `llama`, etc.) or presence of ChatML special tokens in vocab
- this is too broad for base completion models like `qwen2.5-coder-0.5b`

Fix:
- only apply APR chat wrapping from explicit filename hints such as `instruct` or `-chat`

## Important conclusion

After the Q5 fix, I verified that the fallback conversion path itself was numerically clean:
- GGUF vs APR tensors matched
- GGUF vs SafeTensors tensors matched

So the remaining apparent “garbage inference” was not conversion corruption. It came from APR prompt templating.

## Validation

- `cargo build -p aprender-core` passed after the Q5 fix
- direct APR-loaded inference was coherent
- `apr run ... '2+2=' --temperature 0 --no-gpu` stopped producing garbage-looking output after the APR prompt-prep fix

## Contribution status

I already split the work into two clean branches on my fork:

- `fix-q5-dequant`
- `fix-apr-prompt-template`

Because PRs from non-authorized contributors are auto-closed here, I’m opening this issue first to ask for the preferred contribution path.

If useful, I can provide:
- the exact patch
- branch links from my fork
- a more detailed repro and investigation summary



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed-quant GGUF investigation from capstone workflow: Q5 dequant bug and APR prompt templating bug #1623

Context

Findings

Important conclusion

Validation

Contribution status

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Mixed-quant GGUF investigation from capstone workflow: Q5 dequant bug and APR prompt templating bug #1623

Description

Context

Findings

Important conclusion

Validation

Contribution status

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions