Fixed OpenLLaMA 3b CUDA mul_mat_vec_q#2144
Conversation
|
The generation looks fine, but |
f437f6a to
e6b7a4f
Compare
|
Thank you for pointing this out, I should have checked it. The value for |
e6b7a4f to
52f90f2
Compare
|
I added another change: the padding is now memset to 0. Though unlikely, it is possible for the unset memory to encode a NaN which could make the sum over the entire row NaN. |
|
So, if I understand correctly, the code depends on the value of |
|
I see your point. How about just adding another define that controls the size to which the vector and the last row are extended? I would prefer not to increase |
|
Sure, that sounds even better. |
52f90f2 to
518c822
Compare
518c822 to
a7ce53f
Compare
Fixes #2136 . The issue was that the weight tensors had row sizes that are not multiples of 128. I fixed it by padding the quantized vector and the last row of the weight tensors to a multiple of 128. This is preferable over adding checks to the CUDA kernels since it has better performance.