I suspect this has to do with the changes in #170 and I believe it might have to do with EOS markers, but I can be wrong:
apr chat qwen2.5-1.5b-instruct-q4_k_m.apr
=== Model Chat (APR Format) ===
Using APR v2 format with mmap (Native Library Mandate)
Model: qwen2.5-1.5b-instruct-q4_k_m.apr
Chat Template: ChatML
Temperature: 0.7
Top-P: 0.9
Max Tokens: 512
Commands:
/quit Exit the chat
/clear Clear conversation history
/system Set system prompt
/help Show help
════════════════════════════════════════════════════════════
Loading model...
Loaded APR format in 0.15s (1113.2 MB)
Loaded tokenizer: tokenizer.json (151936 tokens)
Detected Raw chat template
You: hey
[AprV2ModelCuda] Built indexed weights for 28 layers
[AprV2ModelCuda] Pre-cached 5596 MB of weights on GPU (28 layers, 197 quantized, 112 F32 tensors)
[AprV2ModelCuda] Cached embedding table: 890 MB
[APR CUDA: NVIDIA GeForce RTX 4090 (24077 MB VRAM)]
Assistant: ,VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLEVILLEVILLEVILLEVILLEVILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE:VILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLEVILLE
You:
I suspect this has to do with the changes in #170 and I believe it might have to do with EOS markers, but I can be wrong: