Skip to content

apr chat no longer works with qwen2.5-1.5b-instruct #170

@alfredodeza

Description

@alfredodeza

After importing the model and using apr chat the model seems to be correctly loaded on the GPU but never returns an answer. The GPU cycles a few times between 0% and 70% usage for about 2 minutes, and then finally the return is empty.

If I try again, the same cycle repeats. This whole interaction took about 6 minutes:

apr chat qwen2.5-1.5b-instruct-q4_k_m.apr

=== Model Chat (APR Format) ===

Using APR v2 format with mmap (Native Library Mandate)

  Model: qwen2.5-1.5b-instruct-q4_k_m.apr
  Chat Template: ChatML
  Temperature: 0.7
  Top-P: 0.9
  Max Tokens: 512

Commands:
  /quit     Exit the chat
  /clear    Clear conversation history
  /system   Set system prompt
  /help     Show help

════════════════════════════════════════════════════════════

Loading model...
Loaded APR format in 0.15s (1113.2 MB)
Loaded tokenizer: tokenizer.json (151936 tokens)
Detected Raw chat template
You: hey
[AprV2ModelCuda] Pre-cached 5596 MB of weights on GPU (28 layers, 0 quantized, 308 F32 tensors)
[AprV2ModelCuda] Cached embedding table: 125 MB
[APR CUDA: NVIDIA GeForce RTX 4090 (24077 MB VRAM)]
Assistant:

You: hey
[AprV2ModelCuda] Pre-cached 5596 MB of weights on GPU (28 layers, 0 quantized, 308 F32 tensors)
[AprV2ModelCuda] Cached embedding table: 125 MB
[APR CUDA: NVIDIA GeForce RTX 4090 (24077 MB VRAM)]
Assistant:

You:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions