ci : refactor + optimize by ggerganov · Pull Request #3847 · ggml-org/whisper.cpp

ggerganov · 2026-05-31T14:23:33Z

WIP

Started some refactoring of the CI workflows. The goal is to restructure similar to how the workflows are organized now in llama.cpp.

Split build.yml into separate workflow. Using fork.
Optimize ccache usage. Using fork.
Remove obsolete/redundant jobs. Using fork.
Add linux build artifacts to release. Example release
Add HF_TOKEN as secret and use in model download script to avoid huggingface rate limiting. Done in fork and in this repo.

Extract self-hosted runner jobs from build.yml into a dedicated build-self-hosted.yml following the llama.cpp pattern: - gpu-cuda (NVIDIA Linux) - gpu-vulkan-nvidia-cm (NVIDIA Linux) - gpu-vulkan-nvidia-cm2 (NVIDIA Linux + COOPMAT2) - gpu-metal (macOS ARM64) - gpu-vulkan (macOS ARM64) GitHub-hosted CPU jobs remain in build.yml. Assisted-by: llama.cpp:local pi

Extract release-related jobs from build.yml into a dedicated release.yml following the llama.cpp pattern: - determine-tag - windows (Win32/x64, SDL2) - windows-blas (Win32/x64, OpenBLAS) - windows-cublas (x64, CUDA 11.8/12.4) - ios-xcode-build - bindings-java (depends on windows) - release (artifact aggregation + GitHub release) CoreML job stays in build.yml with its own local tag calculation. Assisted-by: llama.cpp:local pi

Assisted-by: llama.cpp:local pi

This commit adds the --fail option to the model download scripts so that if the model download returns a server error this is picked up. This is then detected in run.sh and a error message is displayed and the script stops and returns an error. The motivation for this is that currently it is possible for the model download to fail but this script proceeds and instead of a model file the contents will be an html page probably with the error. This will then cause the model to not be able to load due to a missing magic number. I'm not sure we can do much about the downloading failing, perhaps a retry but at least this will give a clearer error message. Refs: https://github.com/danbev/whisper.cpp/actions/runs/26866349389/job/79230794512

This commit adds curl retry options to the model download script. The motivation is that currently when CI jobs run huggingface rate limit the requests and return: ```console curl: (22) The requested URL returned error: 429 ``` This is an attempt to work around this and if it does not work then we can an authorization token.

This job has been commented out as it has been flaky in the past. I'll monitor this and if it continues to be unreliable we can disable it in the github actions GUI instead of commenting it out like we did before.

The ccache will only be saved on pushed to master.

The motivation for this is that the save parameter does not seem to work with the current version.

This commit remove build-linux.yml as the same jobs are also run by build-gcc.yml, with the exception that build-gcc.yml also run ctest). So keeping build-gcc.yml and removing the redundant build-linux.yml.

This is currently causing the following failure: ```console sccache C:\PROGRA~1\NVIDIA~1\CUDA\v\bin\nvcc.exe -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -DCMAKE_INTDIR=\"Release\" -ID:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\.. -ID:\a\whisper.cpp\whisper.cpp\ggml\src\..\include -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include" -Xcompiler="-MD -O2 -Ob2" -DNDEBUG -std=c++17 -arch=native -use_fast_math -extended-lambda -Xcompiler /Zc:preprocessor -MD -MT ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -MF ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj.d -x cu -c D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu -o ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -Xcompiler=-Fdggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\,-FS sccache: encountered fatal error sccache: error: Could not parse shell line sccache: caused by: Could not parse shell line ``` Refs: https://github.com/danbev/whisper.cpp/actions/runs/26883673904/job/79290017353

This commit removes the tag form the linux release artifacts to be consistent with the existing artifacts. If we want to include the tag then we can do that in a follow-up PR.

danbev · 2026-06-03T14:02:50Z

As mentioned in 70a62ea I think we should add a HF_TOKEN to avoid the issue with rate limits for model downloads. This can happen to any of the jobs that download models and just having the retry does not seem to be enough.

This is to avoid the HR rate limiting when downloading model.

danbev · 2026-06-04T04:11:39Z

I've skipped the freebsd job as this as not enabled previously (it was commented out). Instead of commenting it out I think we can just disable that job once this is merged.

danbev · 2026-06-04T04:37:49Z

There are some naming inconsistencies at the moment regarding the CI jobs but I'll open a follow-up PR to fix that.

danbev · 2026-06-04T05:49:49Z

@ggerganov I don't seem to be able to add you as a reviewer as you opened this PR.
Can you take a look when you get a chance and see if this is what you had in mind?

ggerganov · 2026-06-04T06:35:51Z

Some follow-up tasks:

For all github actions, make sure we are using ggml-org forks. This is for security reasons, to avoid 3rd-parties being highjacked.
Decide which jobs to remove. Mainly keep track of the largest caches and see what makes sense to keep. For example, the ~1.7GB msys caches seem excessive and would most likely have to be removed. If after removing the cache, the jobs become very slow (more than 20 mins), then we will disable these jobs. etc.
Identify slow linux jobs that don't currently utilize ccache and update them to start using it

danbev · 2026-06-04T06:38:36Z

I've disabled the freebsd job: https://github.com/ggml-org/whisper.cpp/actions/workflows/build-freebsd.yml

ggerganov · 2026-06-04T06:39:07Z

Also, we have to do something about the Docker images. It does not make sense to build them on every commit. It's probably best to do it like in llama.cpp - once per day: https://github.com/ggml-org/llama.cpp/actions/runs/26933499193/workflow

danbev · 2026-06-08T05:52:32Z

I've pinned actions to specific commit sha's to avoid versions or tags in PRs that touched github actions. The remaining are covered by ci : pin github actions to commit sha's #3865.
The "For example, the ~1.7GB msys caches seem excessive and would most likely have to be removed. " has been resolved.
Docker images are now scheduled.
Most jobs have been updated to now use ccache. For example ci : add ccache to build-sycl [no ci] #3859. I'll take another look at if there are still jobs that are very slow.
~~Go through the github actions we use and updated them to use ggml-org forks (creating new forks for the ones that we are using).~~ Pinning will en enough for these third-party github actions.

ggerganov · 2026-06-08T06:30:41Z

Go through the github actions we use and updated them to use ggml-org forks

Pinning to commit hashes of 3rd-party repos should be OK.

ggerganov added 6 commits May 31, 2026 16:23

ci : add ccache clear action

dcae803

ci : remove bindings-java job from release.yml

884aca3

Assisted-by: llama.cpp:local pi

cont : add manual trigger for build.yml

e54a215

cont : remove obsolete ifs

3868724

ggerganov mentioned this pull request Jun 2, 2026

Please provide compiled binary for Linux and all, just like you did in llama.cpp #3852

Open

danbev added 12 commits June 2, 2026 16:03

Merge remote-tracking branch 'upstream/master' into ci-optimize [no ci]

7773476

ci : extract sanitizer job to bild-sanitize.yml

baf4456

ci : extract linux jobs into build-linux.yml

db91b2f

ci : extract macos jobs to build-macos.yml

23e0354

ci : extract gcc jobs to build-gcc.yml

e5eae54

ci : extract clang jobs to build-clang.yml

f01be36

ci : extract sycl jobs to build-sycl.yml

90138a2

ci : extract windows jobs to build-windows.yml

c227f7e

ci : extract emscripten job to build-wasm.yml

08114fd

ci : extract android jobs into build-android.yml

9911f7c

ci : extract quantize job to quantize.yml

6b756d4

ci : extract coreml job into coreml.yml

f9a9d39

danbev self-assigned this Jun 3, 2026

danbev added 10 commits June 3, 2026 07:37

ci : extract vad job to vad.yml

0b127ca

ci : extract cpu jobs to build-cpu.yml

b3cfe41

ci : make naming of yml files consistent

00674a7

ci : enable command traces to see download command in use

2d7731d

ci : extract freebsd job to build-freebsd.yml

21ea4ef

This job has been commented out as it has been flaky in the past. I'll monitor this and if it continues to be unreliable we can disable it in the github actions GUI instead of commenting it out like we did before.

ci : add ccache to jobs (non-docker builds)

f0499bc

The ccache will only be saved on pushed to master.

ci : bump ccache-action version to v1.2.21

cee242e

The motivation for this is that the save parameter does not seem to work with the current version.

ci : add ccache to docker jobs in build-linux.yml

20e9471

danbev added 9 commits June 3, 2026 11:47

ci : add debug statements to linux docker build

07fa853

ci : set CCACHE_DIR for build-linux.yml

64ab584

ci : add ccache to the remaining docker jobs

0adf226

ci : remove build-linux.yml

d7cfb69

This commit remove build-linux.yml as the same jobs are also run by build-gcc.yml, with the exception that build-gcc.yml also run ctest). So keeping build-gcc.yml and removing the redundant build-linux.yml.

ci : add linux build artifacts to release

060ecbb

ci : make static linux artifacts

d36af67

ci : make linux release artifact names consistent

3fca682

This commit removes the tag form the linux release artifacts to be consistent with the existing artifacts. If we want to include the tag then we can do that in a follow-up PR.

ci : fix linux zip files to have a directory

8a4dec5

ci : add HF_TOKEN secret for HF download authorization

bd69518

This is to avoid the HR rate limiting when downloading model.

danbev marked this pull request as ready for review June 4, 2026 04:11

ggerganov merged commit 02d5316 into master Jun 4, 2026
45 of 46 checks passed

ggerganov deleted the gg/ci-optimize branch June 4, 2026 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci : refactor + optimize#3847

ci : refactor + optimize#3847
ggerganov merged 38 commits into
masterfrom
gg/ci-optimize

ggerganov commented May 31, 2026 •

edited by danbev

Loading

Uh oh!

danbev commented Jun 3, 2026

Uh oh!

danbev commented Jun 4, 2026

Uh oh!

danbev commented Jun 4, 2026

Uh oh!

danbev commented Jun 4, 2026

Uh oh!

ggerganov commented Jun 4, 2026

Uh oh!

Uh oh!

danbev commented Jun 4, 2026

Uh oh!

ggerganov commented Jun 4, 2026

Uh oh!

danbev commented Jun 8, 2026 •

edited

Loading

Uh oh!

ggerganov commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented May 31, 2026 • edited by danbev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Jun 3, 2026

Uh oh!

danbev commented Jun 4, 2026

Uh oh!

danbev commented Jun 4, 2026

Uh oh!

danbev commented Jun 4, 2026

Uh oh!

ggerganov commented Jun 4, 2026

Uh oh!

Uh oh!

danbev commented Jun 4, 2026

Uh oh!

ggerganov commented Jun 4, 2026

Uh oh!

danbev commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented May 31, 2026 •

edited by danbev

Loading

danbev commented Jun 8, 2026 •

edited

Loading