forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 630
Closed
Description
Describe the Issue
I have an AMD and NVIDA card so I often use Vulkan to load a larger model, unfortunately it stopped working properly for me:
- llama 3.1 models (moe divided into both cards or AMD only) immediately/or after the first sentence display nonsense. Single 1080 ti works but much slower than before, RADEON - sometimes works sometimes doesn't (no errors in console)
- mistral small 24b: divided into two cards, in the first utterance it loops and stays on a word which it repeats endlessly.
- generally splitting a model into two cards often breaks the generation immediately or after some time. The AMD card seems to be affected more than the GTX. (AMD card on Vulcan sometimes doesn't work properly even alone??? I don't know why, clearing the cache and reinstalling the drivers didn't change anything)
- the problem does not occur in the previous version.
Additional Information:
Windows 10 (64bit, updated)
New NVIDIA and AMD drivers
GPU - Radeon 6900xt
GPU - GTX 1080ti
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels