Conversation
Fuse adds that have the same shape, which are common in MoE models. It will currently fuse up to 6 adds, because we assume no more than 8 descriptors per dispatch. But this could be changed.
0cc4m
left a comment
There was a problem hiding this comment.
Looks good on AMD and Nvidia, but I can't get it to run on Intel.
terminate called after throwing an instance of 'vk::DeviceLostError'
what(): vk::Device::waitForFences: ErrorDeviceLost
I'll investigate further later.
Strange. Any validation failures? Does the backend test fail, or just in real models? |
Yeah, the test fails too on Intel:
Edit: No validation failures. Probably a driver bug. |
Shall I just disable the optimization for Intel? |
Yeah, I don't see why it's failing. |
Hi @0cc4m. I wanted to test the crashing you were seeing on Intel GPU but so far haven't been able to reproduce it. How were you testing this exactly? The test I ran was the following:
diff --git a/ggml/src/ggml-vulkan/ggml-vulkan.cpp b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
index 7ef93806..24ede177 100644
--- a/ggml/src/ggml-vulkan/ggml-vulkan.cpp
+++ b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
@@ -3575,7 +3575,7 @@ static vk_device ggml_vk_get_device(size_t idx) {
device->multi_add = vk12_props.shaderRoundingModeRTEFloat16 &&
device->properties.limits.maxPushConstantsSize >= sizeof(vk_op_multi_add_push_constants) &&
vk12_features.runtimeDescriptorArray &&
- device->vendor_id != VK_VENDOR_ID_INTEL &&
+ // device->vendor_id != VK_VENDOR_ID_INTEL &&
getenv("GGML_VK_DISABLE_MULTI_ADD") == nullptr;
if (device->subgroup_size_control) {Execution log as follows |
Hi @rillomas, I ran this on Linux, from past reports I have already gathered that the Linux ANV driver is more unstable than the proprietary Windows driver. I can reproduce the crash with your diff like this: Crash logIt works if I disable multi_add using Also, test-backend-ops fails in the test that was added in this PR: Can you run this with Environment:CPU: AMD EPYC 7302 Let me know if you need more info. |
|
@0cc4m Log output |
|
The proper way to report this is directly in the Mesa issues or do you have a more direct connection to the driver team? |
|
I'm a Windows guy so don't have connections with the Linux driver team. I can check but probably better to first report to Mesa. |
|
If this is working reliably on the windows driver, we could change the check to use VkDriverId rather than vendor ID. |
That will be great @jeffbolznv! Test resultsI tested on 5 Intel platforms (AlderLake, MeteorLake, LunarLake, Alchemist, Battlemage) and they all worked fine with multi-add enabled. Performance wise I saw slight improvement on some platforms (though seeing a different issue on Battlemage where pp512 is slow with gpt-oss). At least we should be OK on enabling multi-add for Windows. llama-bench results
|
I was able to connect with the Linux driver team and made an issue in Mesa. They are currently having a look. |
|
Cool, thank you. I'll monitor the issue and help if needed. |
|
@jeffbolznv Can you look into this? https://gitlab.freedesktop.org/mesa/mesa/-/issues/13742#note_3170199 |
|
I agree this is a bug, I'll fix it soon. |
* vulkan: fuse adds Fuse adds that have the same shape, which are common in MoE models. It will currently fuse up to 6 adds, because we assume no more than 8 descriptors per dispatch. But this could be changed. * check runtimeDescriptorArray feature * disable multi_add for Intel due to likely driver bug
Fuse adds that have the same shape, which are common in MoE models. It will currently fuse up to 6 adds, because we assume no more than 8 descriptors per dispatch. But this could be changed.