CUDA: avoid mul + bias fusion when buffers are split#16935
CUDA: avoid mul + bias fusion when buffers are split#16935am17an merged 1 commit intoggml-org:masterfrom
Conversation
IMbackK
left a comment
There was a problem hiding this comment.
Yes, unsurprisingly since this is just disabling fusion in this case, this fixes the issue.
Yes unlikely fusion would help in this case anyway |
|
we should probably have some multigpu unit tests to catch this sort of thing. |
|
@am17an should we merge this? |
|
I'm wondering whether we should just disable fusion outright if we detect any buffer is split or |
|
At least for |
|
If there are issues with any llama.cpp/ggml/src/ggml-cuda/mmvq.cu Lines 656 to 665 in 070ff4d |
|
I think we already check this with |
|
|
|
To be clear there have been no crashes reported with |
What I mean is that the padding is being cleared for More generally, |
Works fine here (present bug aside). I think the perception that it dosent work is that rocr has had multiple bugs relating to handing various p2p scenarios. |
|
I am merging this as it solves quite a bunch of |
Fix #16799. When fusing just a mul-mat + bias, we don't check if the buffer is split. We check this when fusing gate + up. Tested on 3x 4090 with gpt-oss-120b