-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Closed
Labels
Nvidia GPUIssues specific to Nvidia GPUsIssues specific to Nvidia GPUsbugSomething isn't workingSomething isn't working
Description
As of right now it is already possible on master to quantize the K cache via e.g. -ctk q8_0. However, this is currently broken on master for batch size 1. Disabling CUDA graphs via the environment variable GGML_CUDA_DISABLE_GRAPHS=1 fixes the issue.
cc: @agray3
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Nvidia GPUIssues specific to Nvidia GPUsIssues specific to Nvidia GPUsbugSomething isn't workingSomething isn't working