llama : build windows releases with dl backends#13220
Conversation
|
Can you provide more details on the clang Vulkan issue and how to reproduce it (or maybe file an issue)? Did you end up just using msvc for Vulkan instead? |
|
Yes, it is still building the Vulkan release with msvc, same as before, but (at least) it also has the multiple CPU variants which should give it better compatibility with different CPUs. When I tried to build Vulkan with clang it failed with this error: Here is the full log: https://github.com/slaren/llama.cpp/actions/runs/14762355462/job/41445824619 |
|
OK, I've heard about this before, I think clang puts the exe in a different folder, we probably need some small change to the cmake file. I'll try to reproduce this soon. |
|
I couldn't reproduce it locally. I suspect that it has something to do with this message while configuring cmake: It seems that it thinks that it is cross-compiling and uses a different compiler to build the shader-gen? Not sure what's going on there. |
|
I tried building locally and while it eventually failed on some curl issue, it did get past the vulkan-shaders-gen part of the build. Looking at the log again, I noticed this mismatch of Debug vs Release: Maybe this issue is specific to the ninja multi-config? |
|
One possible issue with this change that I didn't realize at first is that the examples that are not compatible with
The most impactful of these are likely to be the llava and the rpc server. cc @ngxson @rgerganov Fixing this wouldn't be complicated. Essentially:
|
I am traveling and won't be able to address this in the next few days, sorry. You can exclude rpc-server as stop-gap solution |
* origin/master: (27 commits) llama : fix build_ffn without gate (ggml-org#13336) CUDA: fix bad asserts for partial offload (ggml-org#13337) convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331) CUDA: fix --split-mode row for MMQ (ggml-org#13323) gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036) CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320) sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264) server : Webui - change setText command from parent window to also send the message. (ggml-org#13309) mtmd : rename llava directory to mtmd (ggml-org#13311) clip : fix confused naming ffn_up and ffn_down (ggml-org#13290) convert : bailingmoe : set yarn metadata if present (ggml-org#13312) SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308) mtmd : add C public API (ggml-org#13184) rpc : use backend registry, support dl backends (ggml-org#13304) ggml : activate s390x simd for Q3_K (ggml-org#13301) llava/mtmd : fixes to fully support dl backends (ggml-org#13303) llama : build windows releases with dl backends (ggml-org#13220) CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299) CUDA: fix race condition in MMQ ids_dst (ggml-org#13294) vulkan: Additional type support for unary, binary, and copy (ggml-org#13266) ...
Changes:
GGML_BACKEND_DLandGGML_CPU_ALL_VARIANTSto build the windows releases to enable dynamic loading of backendsevict-old-filestest-quantize-stats.cppwithGGML_BACKEND_DL-march=nativefrom llvm cmake toolchain fileNotes:
Test run: https://github.com/slaren/llama.cpp/actions/runs/14762791544/job/41447243958
Test release: https://github.com/slaren/llama.cpp/releases/tag/b5235