`apr chat` no longer works with qwen2.5-1.5b-instruct

After importing the model and using `apr chat` the model seems to be correctly loaded on the GPU but never returns an answer. The GPU cycles a few times between 0% and 70% usage for about 2 minutes, and then finally the return is empty. 

If I try again, the same cycle repeats. This whole interaction took about 6 minutes:

```
apr chat qwen2.5-1.5b-instruct-q4_k_m.apr

=== Model Chat (APR Format) ===

Using APR v2 format with mmap (Native Library Mandate)

  Model: qwen2.5-1.5b-instruct-q4_k_m.apr
  Chat Template: ChatML
  Temperature: 0.7
  Top-P: 0.9
  Max Tokens: 512

Commands:
  /quit     Exit the chat
  /clear    Clear conversation history
  /system   Set system prompt
  /help     Show help

════════════════════════════════════════════════════════════

Loading model...
Loaded APR format in 0.15s (1113.2 MB)
Loaded tokenizer: tokenizer.json (151936 tokens)
Detected Raw chat template
You: hey
[AprV2ModelCuda] Pre-cached 5596 MB of weights on GPU (28 layers, 0 quantized, 308 F32 tensors)
[AprV2ModelCuda] Cached embedding table: 125 MB
[APR CUDA: NVIDIA GeForce RTX 4090 (24077 MB VRAM)]
Assistant:

You: hey
[AprV2ModelCuda] Pre-cached 5596 MB of weights on GPU (28 layers, 0 quantized, 308 F32 tensors)
[AprV2ModelCuda] Cached embedding table: 125 MB
[APR CUDA: NVIDIA GeForce RTX 4090 (24077 MB VRAM)]
Assistant:

You:
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`apr chat` no longer works with qwen2.5-1.5b-instruct #170

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

apr chat no longer works with qwen2.5-1.5b-instruct #170

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`apr chat` no longer works with qwen2.5-1.5b-instruct #170