ci : refactor + optimize#3847
Conversation
Extract self-hosted runner jobs from build.yml into a dedicated build-self-hosted.yml following the llama.cpp pattern: - gpu-cuda (NVIDIA Linux) - gpu-vulkan-nvidia-cm (NVIDIA Linux) - gpu-vulkan-nvidia-cm2 (NVIDIA Linux + COOPMAT2) - gpu-metal (macOS ARM64) - gpu-vulkan (macOS ARM64) GitHub-hosted CPU jobs remain in build.yml. Assisted-by: llama.cpp:local pi
Extract release-related jobs from build.yml into a dedicated release.yml following the llama.cpp pattern: - determine-tag - windows (Win32/x64, SDL2) - windows-blas (Win32/x64, OpenBLAS) - windows-cublas (x64, CUDA 11.8/12.4) - ios-xcode-build - bindings-java (depends on windows) - release (artifact aggregation + GitHub release) CoreML job stays in build.yml with its own local tag calculation. Assisted-by: llama.cpp:local pi
Assisted-by: llama.cpp:local pi
This commit adds the --fail option to the model download scripts so that if the model download returns a server error this is picked up. This is then detected in run.sh and a error message is displayed and the script stops and returns an error. The motivation for this is that currently it is possible for the model download to fail but this script proceeds and instead of a model file the contents will be an html page probably with the error. This will then cause the model to not be able to load due to a missing magic number. I'm not sure we can do much about the downloading failing, perhaps a retry but at least this will give a clearer error message. Refs: https://github.com/danbev/whisper.cpp/actions/runs/26866349389/job/79230794512
This commit adds curl retry options to the model download script. The motivation is that currently when CI jobs run huggingface rate limit the requests and return: ```console curl: (22) The requested URL returned error: 429 ``` This is an attempt to work around this and if it does not work then we can an authorization token.
This job has been commented out as it has been flaky in the past. I'll monitor this and if it continues to be unreliable we can disable it in the github actions GUI instead of commenting it out like we did before.
The ccache will only be saved on pushed to master.
The motivation for this is that the save parameter does not seem to work with the current version.
This commit remove build-linux.yml as the same jobs are also run by build-gcc.yml, with the exception that build-gcc.yml also run ctest). So keeping build-gcc.yml and removing the redundant build-linux.yml.
This is currently causing the following failure: ```console sccache C:\PROGRA~1\NVIDIA~1\CUDA\v\bin\nvcc.exe -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -DCMAKE_INTDIR=\"Release\" -ID:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\.. -ID:\a\whisper.cpp\whisper.cpp\ggml\src\..\include -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include" -Xcompiler="-MD -O2 -Ob2" -DNDEBUG -std=c++17 -arch=native -use_fast_math -extended-lambda -Xcompiler /Zc:preprocessor -MD -MT ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -MF ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj.d -x cu -c D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu -o ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -Xcompiler=-Fdggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\,-FS sccache: encountered fatal error sccache: error: Could not parse shell line sccache: caused by: Could not parse shell line ``` Refs: https://github.com/danbev/whisper.cpp/actions/runs/26883673904/job/79290017353
This commit removes the tag form the linux release artifacts to be consistent with the existing artifacts. If we want to include the tag then we can do that in a follow-up PR.
|
As mentioned in 70a62ea I think we should add a HF_TOKEN to avoid the issue with rate limits for model downloads. This can happen to any of the jobs that download models and just having the retry does not seem to be enough. |
This is to avoid the HR rate limiting when downloading model.
|
I've skipped the freebsd job as this as not enabled previously (it was commented out). Instead of commenting it out I think we can just disable that job once this is merged. |
|
There are some naming inconsistencies at the moment regarding the CI jobs but I'll open a follow-up PR to fix that. |
|
@ggerganov I don't seem to be able to add you as a reviewer as you opened this PR. |
|
Some follow-up tasks:
|
|
I've disabled the freebsd job: https://github.com/ggml-org/whisper.cpp/actions/workflows/build-freebsd.yml |
|
Also, we have to do something about the Docker images. It does not make sense to build them on every commit. It's probably best to do it like in |
|
Pinning to commit hashes of 3rd-party repos should be OK. |
WIP
Started some refactoring of the CI workflows. The goal is to restructure similar to how the workflows are organized now in llama.cpp.
build.ymlinto separate workflow. Using fork.ccacheusage. Using fork.