Skip to content

ci : refactor + optimize#3847

Merged
ggerganov merged 38 commits into
masterfrom
gg/ci-optimize
Jun 4, 2026
Merged

ci : refactor + optimize#3847
ggerganov merged 38 commits into
masterfrom
gg/ci-optimize

Conversation

@ggerganov

@ggerganov ggerganov commented May 31, 2026

Copy link
Copy Markdown
Member

WIP

Started some refactoring of the CI workflows. The goal is to restructure similar to how the workflows are organized now in llama.cpp.

  • Split build.yml into separate workflow. Using fork.
  • Optimize ccache usage. Using fork.
  • Remove obsolete/redundant jobs. Using fork.
  • Add linux build artifacts to release. Example release
  • Add HF_TOKEN as secret and use in model download script to avoid huggingface rate limiting. Done in fork and in this repo.

ggerganov added 6 commits May 31, 2026 16:23
Extract self-hosted runner jobs from build.yml into a dedicated
build-self-hosted.yml following the llama.cpp pattern:
  - gpu-cuda (NVIDIA Linux)
  - gpu-vulkan-nvidia-cm (NVIDIA Linux)
  - gpu-vulkan-nvidia-cm2 (NVIDIA Linux + COOPMAT2)
  - gpu-metal (macOS ARM64)
  - gpu-vulkan (macOS ARM64)

GitHub-hosted CPU jobs remain in build.yml.

Assisted-by: llama.cpp:local pi
Extract release-related jobs from build.yml into a dedicated
release.yml following the llama.cpp pattern:
  - determine-tag
  - windows (Win32/x64, SDL2)
  - windows-blas (Win32/x64, OpenBLAS)
  - windows-cublas (x64, CUDA 11.8/12.4)
  - ios-xcode-build
  - bindings-java (depends on windows)
  - release (artifact aggregation + GitHub release)

CoreML job stays in build.yml with its own local tag calculation.

Assisted-by: llama.cpp:local pi
Assisted-by: llama.cpp:local pi
@danbev danbev self-assigned this Jun 3, 2026
danbev added 10 commits June 3, 2026 07:37
This commit adds the --fail option to the model download scripts so that
if the model download returns a server error this is picked up. This is
then detected in run.sh and a error message is displayed and the script
stops and returns an error.

The motivation for this is that currently it is possible for the model
download to fail but this script proceeds and instead of a model file
the contents will be an html page probably with the error. This will
then cause the model to not be able to load due to a missing magic
number. I'm not sure we can do much about the downloading failing,
perhaps a retry but at least this will give a clearer error message.

Refs: https://github.com/danbev/whisper.cpp/actions/runs/26866349389/job/79230794512
This commit adds curl retry options to the model download script.

The motivation is that currently when CI jobs run huggingface rate limit
the requests and return:
```console
curl: (22) The requested URL returned error: 429
```
This is an attempt to work around this and if it does not work then we
can an authorization token.
This job has been commented out as it has been flaky in the past. I'll
monitor this and if it continues to be unreliable we can disable it in
the github actions GUI instead of commenting it out like we did before.
The ccache will only be saved on pushed to master.
The motivation for this is that the save parameter does not seem to work
with the current version.
danbev added 9 commits June 3, 2026 11:47
This commit remove build-linux.yml as the same jobs are also run by
build-gcc.yml, with the exception that build-gcc.yml also run ctest).
So keeping build-gcc.yml and removing the redundant build-linux.yml.
This is currently causing the following failure:
```console
sccache C:\PROGRA~1\NVIDIA~1\CUDA\v\bin\nvcc.exe -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -DCMAKE_INTDIR=\"Release\" -ID:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\.. -ID:\a\whisper.cpp\whisper.cpp\ggml\src\..\include -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include" -Xcompiler="-MD -O2 -Ob2" -DNDEBUG -std=c++17 -arch=native -use_fast_math -extended-lambda -Xcompiler /Zc:preprocessor -MD -MT ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -MF ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj.d -x cu -c D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu -o ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -Xcompiler=-Fdggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\,-FS
sccache: encountered fatal error
sccache: error: Could not parse shell line
sccache: caused by: Could not parse shell line
```

Refs: https://github.com/danbev/whisper.cpp/actions/runs/26883673904/job/79290017353
This commit removes the tag form the linux release artifacts to be
consistent with the existing artifacts.

If we want to include the tag then we can do that in a follow-up PR.
@danbev

danbev commented Jun 3, 2026

Copy link
Copy Markdown
Member

As mentioned in 70a62ea I think we should add a HF_TOKEN to avoid the issue with rate limits for model downloads. This can happen to any of the jobs that download models and just having the retry does not seem to be enough.

This is to avoid the HR rate limiting when downloading model.
@danbev

danbev commented Jun 4, 2026

Copy link
Copy Markdown
Member

I've skipped the freebsd job as this as not enabled previously (it was commented out). Instead of commenting it out I think we can just disable that job once this is merged.

@danbev danbev marked this pull request as ready for review June 4, 2026 04:11
@danbev

danbev commented Jun 4, 2026

Copy link
Copy Markdown
Member

There are some naming inconsistencies at the moment regarding the CI jobs but I'll open a follow-up PR to fix that.

@danbev

danbev commented Jun 4, 2026

Copy link
Copy Markdown
Member

@ggerganov I don't seem to be able to add you as a reviewer as you opened this PR.
Can you take a look when you get a chance and see if this is what you had in mind?

@ggerganov

Copy link
Copy Markdown
Member Author

Some follow-up tasks:

  • For all github actions, make sure we are using ggml-org forks. This is for security reasons, to avoid 3rd-parties being highjacked.
  • Decide which jobs to remove. Mainly keep track of the largest caches and see what makes sense to keep. For example, the ~1.7GB msys caches seem excessive and would most likely have to be removed. If after removing the cache, the jobs become very slow (more than 20 mins), then we will disable these jobs. etc.
  • Identify slow linux jobs that don't currently utilize ccache and update them to start using it

@ggerganov ggerganov merged commit 02d5316 into master Jun 4, 2026
45 of 46 checks passed
@ggerganov ggerganov deleted the gg/ci-optimize branch June 4, 2026 06:36
@danbev

danbev commented Jun 4, 2026

Copy link
Copy Markdown
Member

I've disabled the freebsd job: https://github.com/ggml-org/whisper.cpp/actions/workflows/build-freebsd.yml

@ggerganov

Copy link
Copy Markdown
Member Author

Also, we have to do something about the Docker images. It does not make sense to build them on every commit. It's probably best to do it like in llama.cpp - once per day: https://github.com/ggml-org/llama.cpp/actions/runs/26933499193/workflow

@danbev

danbev commented Jun 8, 2026

Copy link
Copy Markdown
Member
  • I've pinned actions to specific commit sha's to avoid versions or tags in PRs that touched github actions. The remaining are covered by ci : pin github actions to commit sha's #3865.
  • The "For example, the ~1.7GB msys caches seem excessive and would most likely have to be removed. " has been resolved.
  • Docker images are now scheduled.
  • Most jobs have been updated to now use ccache. For example ci : add ccache to build-sycl [no ci] #3859. I'll take another look at if there are still jobs that are very slow.
  • Go through the github actions we use and updated them to use ggml-org forks (creating new forks for the ones that we are using). Pinning will en enough for these third-party github actions.

@ggerganov

Copy link
Copy Markdown
Member Author

Go through the github actions we use and updated them to use ggml-org forks

Pinning to commit hashes of 3rd-party repos should be OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants